ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources Catalogue of Language Resources

    The language resources available in this catalogue are distributed into 4 categories : "Speech and Related Resources", "Written Resources", "Terminological Resources", and "Multimodal/Multimedia Resources".

    1/ Spoken LRs

    a - Telephone recordings
    The databases catalogued in this section have been produced with speaker recordings made over the telephone (fixed or mobile) network, or through a microphone. You will find speech resources recorded in various environments, and covering a large number of European and non-European languages, e.g. the databases produced in the framework of the SpeechDat project.

    b - Desktop/Microphone recordings
    The databases catalogued in this section have been produced with speaker recordings made over a microphone, e.g. the databases produced in the framework of the BABEL project databases.

    c - Broadcast Resources
    The databases catalogued in this section have been produced with speaker recordings made over radio, television or internet, such as the Italian Broadcast News Corpus.

    d - Speech Related Resources
    You will find in this section pronunciation and phonetic lexicons, such as BDLEX, PHONOLEX, and MHATLEX databases.

    2/ Written LRs

    a - Corpora
    This section contains monolingual and multilingual corpora, parallel or not, which may also be annotated. A few examples of the kind of resources you will find in this section are e.g. the corpora developed in the framework of the MULTEXT project, the Multilingual and Parallel Corpora (MLCC), French scientific corpora, newspaper corpora in Arabic, etc.

    b - Monolingual lexicons
    The section dedicated to monolingual lexicons contains various types of dictionaries, e.g. a dictionary of French verbs, the Japanese word dictionary, some PAROLE lexicons in many languages, etc.

    c - Multilingual lexicons
    Here you can find either bilingual or multilingual dictionaries and lexicons, such as the EuroWordNet databases.

    3/ Terminological LRs

    Monolingual, bilingual and multilingual terminological databases are available. They cover a large number of specialised domains, e.g. automobile engineering, insurance, linguistics, finance, etc., in a wide variety of languages.

    4/ Multimodal/Multimedia LRs

    The resources you will find in this section have been produced using different modalities, including the speech. An example of such resources is the database produced in the framework of the M2VTS project.

    New Resources
  • L0085 : euLEX (Lexical Database for Basque)
    euLEX is a general lexicon which
    contains 115,000 entries, divided into
    94,000 dictionary entries or lemmas,
    12,000 allomorphs, 7,500 verb forms and
    about 1,200 dependent morphemes. All
    entries include linguistic information
    such as morphology and usage. The
    lexicon is in XML.

  • S0242 : SALA II US English database
    The SALA II US English database
    comprises 4,090 US English speakers
    (2,017 males, 2,073 females, including
    some speakers with Hispanic accents)
    recorded over the United States mobile
    telephone network.

  • M0043 : Russian => English MT optimized lexicon in OLIF XML
    This lexicon is provided in structured
    XML of OLIF (Open Lexicon Interchange
    Format) format. It comprises 99,211
    entries in its source language (Russian)
    and 134,828 entries in its target
    language (English). The source entries
    are distributed as follows: 64,487
    nouns, 11,470 adjectives, 19,724 verbs,
    1,762 adverbs, and 1,768 closed-class
    elements (interjections, special
    prefixes, suffixes, etc.). Nouns contain
    gender and number information and verbs
    provide details on aspect and
    reflexivity. The entries contain
    semantic information in terms of domain
    specification or style information
    (e.g., colloquial, regional use, etc.).
    Moreover, definitions are available for
    59,775 entries, as well as collocational
    information for 39,148 entries.

  • M0045 : Cebuano => English Bilingual Lexicon
    This lexicon is provided in structured
    XML of OLIF (Open Lexicon Interchange
    Format) format. It comprises 1,988
    entries in Cebuano and 1,990 in English.
    The source entries are distributed as
    follows: 1,052 nouns, 462 adjectives,
    405 verbs and 69 closed-class entries.
    The entries contain semantic information
    in terms of domain specification or
    style information (e.g., colloquial,
    regional use, etc.). Collocational
    information is also available for 500
    entries.

  • M0044 : English => Swahili Bilingual Lexicon
    This lexicon is provided in structured
    XML of OLIF (Open Lexicon Interchange
    Format) format. It comprises 58,247
    entries in English and 58,300 in
    Swahili. The source entries are
    distributed as follows: 36,046 nouns,
    3,013 adjectives, 18,308 verbs and 880
    closed-class entries. The entries
    contain semantic information in terms of
    domain specification or style
    information (e.g., colloquial, regional
    use, etc.). Collocational information is
    also available for 17,570 entries.

  • (last update: July 2008)

    Copyright © 2006 ELRA
    ELRACatalogue 0.8.0