The term Language Resource refers to a set of speech or language data and descriptions in machine readable form, used for building, improving or evaluating natural language and speech algorithms or systems, or, as core resources for the software localisation and language services industries, for language studies, electronic publishing, international transactions, subject-area specialists and end users.
Examples of Language Resources are written and spoken corpora, computational lexica, terminology databases, speech collection, etc. Basic software tools are also important for the acquisition, preparation, collection, management, customisation and use of these Language Resources and other resources.
Language Resources are to be used in two different ways: systems development and systems evaluation.
Development of systems: spoken or written language processing systems are based on the use of corpora. For example, the performance of the systems available for text retrieval and filtering or machine-assisted translation tools mainly depends on the amount of linguistic data that are to be used to train the system. Corpora also allow you to build indirect language resources, i.e. specialised lexicons based on a group of technical texts.
Evaluation of systems: language resources, e.g. large-size corpora, are used to evaluate and compare each other the systems which have already been developed: information filtering, orthographic and grammatical check, text retrieval, etc. Efficient and useful evaluations, which are based on appropriate and large corpora, are especially important to measure the changes and progress which have been made, and to disseminate and increase the value of the search results.
More information about the evaluation activity, and ELRA’s involvement in this area is available in the HLT Evaluation section.
Benefits for a Company
Sectors like telecommunication, information & communication, international & multilingual business, public interest and education & training, work systematically with human languages for translation, automated translation, terminology, text recognition, extraction, etc purposes.
By using language-equipped products, the benefits for those companies and institutions are numerous:
- They can extend their market segments, reaching a wider audience, and making their products more accessible.
- They can increase cost-effectiveness.
- They can develop better concept understanding for customers.
- They can improve accuracy rates.
- They can improve internal and external communication.
- They can improve services and information.
- They can reduce reaction time.
As early as 1995, the need to capitalise on all the investments made in the production and packaging of Language Resources in order to assure of their reusability was asserted and led to the creation of the association.
ELRA has a twofold mission to promote Language Resources for the Human Language Technologies field and to boost the evaluation of language engineering technologies.
More than 1100 Languages Resources are available in the ELRA catalogue. The quality and variety of those resources, available in a large number of languages and modalities, allow the users (R&D labs or companies, development companies) to train and/or evaluate a broad range of LT systems.
Promoting the Language Resources also entails the organization of the Language Resource and Evaluation Conference (LREC) organized every other year since 1998.
See also the details of ELRA Services.