Project full title: Integrated reference corpora for spoken romance languages. Multimedia edition; tools of analysis; standard linguistic measures for validation in HLT.

Project no.: IST-2000-26228

Start and Duration: January 2001, 36 months


Informal Spontaneous Spoken Language is under-represented in present linguistic resources. More specifically sound information, which is essential to HLT validation procedures, is almost unavailable.

C-ORAL-ROM Project aims to provide the linguistic community and the HLT community with a comparable set of Spoken Language Corpora for the main Romance Languages, namely French, Italian, Portuguese and Spanish.

The main purpose of this is to allow the validation of human language technologies based on spoken Romance languages. The project aims at establishing a set of spoken language corpora of spontaneous speech for the Romance languages, where textual information and audio is associated and stored on DVD.

The resulting multilingual corpus will be tagged with respect to prosodic parsing and integrated with tools for acoustic and textual analysis.

C-ORAL-ROM plans to exploit the resulting corpus on DVD by ensuring that it conforms with EU Standard validation procedures established by EAGLES and MATE.

The multilingual corpus resulting from C-ORAL-ROM will be made available on DVD. The outcome of the project will include: