Potential use

C-ORAL-ROM corpora are relevant for future research in several fields such as:

- Speech technologies: Spoken corpora can play an important role in the validation of speech technologies. Natural speech providing a broad sample of variations of spoken language is an indispensable resource in order to (a) evaluate speech recognition systems and text-to-speech synthesisers, (b) help the development of prosodic and phonological models for speech technologies, and (c) build multilingual speech recognisers based on the same sets of phonetic models.

- Lexicography and dialectology: Spoken corpora are a source in dictionary building, in particular dictionaries of use, of idioms or jargons, which enable lexicographers to study and access new words or meaning changes. Corpora of spontaneous speech also have the potential for the representative quantification of a whole language variety, above all, for those endangered varieties without available written records.

- Conversation analysis and grammar: Truly conversational and informal corpora can contribute to the understanding of how language is used in real life situations and, therefore, to the study of grammar of conversation (completely different from a text grammar): constitution of conversation, gestures role, modalities and specific forms of conversation, linguistic rules, typology, etc.

- Speech therapy and logopedics: Spoken corpora could be used as a very useful tool of analysis and rehabilitation of language pathologies. Besides, this kind of resource can serve as an indispensable help for the evaluation and comparison with the speech of linguistic impaired people.

- Comparative Linguistics: Multilingual spoken corpora provide real data for the study of comparative issues concerning, for instance, morpho-phonology: description of process of diphtongization, nasalization, palatalization and others dealing with the main Romance languages.

- Research on the interface between linguistic levels. If a corpus is annotated at multiple linguistic levels, it is easier to ask and answer to questions about how a given structure at one level maps to a structure at another level. C-ORAL-ROM corpora will provide annotation for phonetic, prosodic, syntactic, semantic and discourse information, as well as sociolinguistic variables (age, sex, education, dialect, profession).