|
|
|
>
EU-funded projects
|
 |
The list below presents all resources produced by ELDA, as well as resources for which ELDA closely participated in their production within EU-funded projects (sorted by project).
- CHIL (Computers In the Human Interaction Loop)
- Video Annotations : The video recordings consist of seminars that took place and were recorded at ISL, Universität Karlsruhe (Germany) in October to December 2003. The total amount of annotations correspond to 3 hours for the 2005 evaluation campaign and 6 hours for the 2006 evaluation campaign.
- Audio transcriptions : 30 hours of audio recordings from the same seminars as above (same amount of hours for 2005 and 2006 evaluation campaigns)
- Approximately 60 questions for the QA (Question-Answering) task, as well as summaries of 25 seminars for the summarization task
- INTERA (Integrated European language data Repository Area)
- Bilingual parallel corpora : The texts belong to the domains of education, health, law, tourism and environment. The parallel texts have been aligned as well as annotated at the structural (sentence level) and linguistic levels (PoS tagging and lemmatisation). These are :
- Bulgarian - English parallel corpora
1 million words per language
- Greek - English parallel corpora
2 million words per language
- Serbian - English parallel corpora
1 million words per language
- Slovenian - English parallel corpora
2 million words per language
- Terminological Lexicons : The terms were automatically extracted from the English components of the bilingual parallel corpora mentioned above, with an automatic identification of candidate translators in the target languages. Those lexicons are :
Greek lexicon with 4,163 terms, Bulgarian lexicon with 825 terms, Serbian lexicon with 1883, Slovene lexicon with 2,052 terms, English lexicon with 2,052 terms. They are distributed into several domains : Law, Law-Politics, Politics, Education, Environment, Health, Tourism, Finance.
- LILA (Speech databases for ASR in the Asian Pacific area)
- LILA India Hindi as a first language (ongoing)
The LILA Hindi as a first language database will involve 2000 native Hindi speakers from northern India. The speech material is SALA and SpeechDat compliant and is recorded only over the mobile network. Each speaker reads a set of 59 utterances in 5 different environments : office/home, street, public place, moving vehicle and car kit.
- LILA Korea Korean (ongoing)
The LILA Korean database will include 1000 native Korean speakers from South Korea. The speech material is SALA and SpeechDat compliant and is recorded only over the mobile network. Each speaker is reads a set of 59 utterances in 5 different environments : office/home, street, public place, moving vehicle and car kit.
|
 |
|
|
|
|
|
|
|
|