Summary of the paper

Title A Multilingual Named Entities Corpus for Arabic, English and French
Authors Djamel Mostefa, Mariama Laïb, Stéphane Chaudiron, Khalid Choukri and Gaël de Chalendar
Abstract This paper presents the semi-automatic annotation with Named Entities (NE) of a multilingual corpus. The languages are Arabic, English and French. The text corpus is made of comparable newswires from the Agence France Presse covering the period 2004-2006. Our method for producing the corpus is iterative. First the automatic tagging is produced by a state-of-the-art named entity tagger. Then the annotations are checked manually and corrected if necessary. The AFP corpus and annotation scheme are described. The paper presents also the statistics of the corpus and compare the annotation results for the three languages. The final corpus is made of 30, 000 tagged documents for the three languages, including 10,000 documents per language. The corpus is publicly available through ELRA's catalog of language resources.
Topics Ontologies and knowledge representation,
Monolingual and multilingual LRs,
Taggers and Parsers
Full paper A Multilingual Named Entities Corpus for Arabic, English and French
Bibtex @InProceedings{MOSTEFA09.77,
  author = {Djamel Mostefa, Mariama Laïb, Stéphane Chaudiron, Khalid Choukri and Gaël de Chalendar},
  title = {A Multilingual Named Entities Corpus for Arabic, English and French},
  booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
  year = {2009},
  month = {April},
  date = {22-23},
  address = {Cairo, Egypt},
  editor = {Khalid Choukri and Bente Maegaard},
  publisher = {The MEDAR Consortium},
  isbn = {2-9517408-5-9},
  language = {english}
  }

Powered by ELDA © 2009 The MEDAR Consortium