The EVALDA project is financed by the French Ministry of Research in the context of its Technolangue programme. The aim of the project is to establish a permanent evaluation infrastructure for the language engineering sector in France and for the French language.
The aim of such a project is to assemble reuseable components ie organisation, logistics, language resources, evaluation protocols, methodologies and metrics as well as major actors in the field (scientific advisory boards, panels of experts, partners etc). This guarantees the possibility to capitalise on the results of previous experiments, but also to favour collaborative research and the setting up of new and improved evaluation campaigns. It is imperative that the evaluations envisaged in this project are reproducible by third parties, using the resources assembled over the course of the project. This enables a true comparison of system performance and benchmarking of the state in the art of language engineering. All evaluation resources are to be made available at the end of the project in the form of an evaluation package.
A second aim of the project is the setting up of evaluation campaigns involving several linguistic technologies involving both the written and spoken media. Industrial and academic partners are to take part. The campaigns are largely based around black box evaluation protocols and quantitative methods, drawing and expanding upon previous evaluation campaigns, such as ARC-AUPELF, GRACE, TREC etc.
Each evaluation campaign is largely independent, however a certain amount of synergy between the campaigns is envisaged. This may involve the sharing of know-how, resources or even personnel.
The choice of linguistic technologies to evaluate was made on the basis of those that appear to be the most crucial or important in the field. The following were chosen :
ARCADEII : Evaluation of bilingual text and vocabulary alignment systems. Following the success of ARCADEI, this follow up campaign aims to evaluate alignments between more distant or ’exotic’ languages ie Greek, Russian, Japanse, Chinese.
CESART : Evaluation of terminology extraction tools, including tools for extracting ontologies and semantic relations. Evaluation is to take place with reference to a predetermined list of terms/relations.
CESTA : Evaluation of Machine Translation Systems. French is to be the pivotal language, however, several languages from and into French are envisaged (English, Spanish, German, Arabic) according to the capabilities of the participants’ systems.
EASY : An evaluation camapign designed to test syntactic parsers. A side effect of the campaign is the creation of a syntactically parsed reference text composed of several genres of text (newpapers, literary texts, electronic texts etc).
EQUER : Evaluation of Question/Answering systems. Three reference corpora are envisaged : a large general corpus (newspapers, general texts), a web corpus and a corpus made up of medical texts.
ESTER : Evaluation of automatic broadcast news transcriptions systems. This campaign includes the evaluation of segmentation tasks and identification of named entities.
EVASY : Evaluation of Speech synthesis systems. This campaign is to feature a novel method for the evaluation of prosody in sythesised speech.
MEDIA : Evaluation of Man-Machine dialogue systems. In this case, the task of hotel room reservation (including some local touristic information) is envisaged.
|