Summary of the paper

Title Lexicon-Driven Approach to the Recognition of Arabic Named Entities
Authors Jack Halpern
Abstract Various factors contribute to the difficulties in processing Arabic personal names and named entities, posing special challenges to developers of NLP applications in the areas of named entity recognition (NER), machine translation (MT), morphological analysis (MA) and information retrieval (IR). These include the complexity of the Arabic orthography, the high level of orthographical and morphological ambiguity, the multitude of highly irregular romanization systems, and the vast number of romanized variants that are difficult to detect and disambiguate. This paper focuses on the orthographic variation of Arabic personal names, with special emphasis on the ambiguity resulting from transcribing names into the Roman script (romanization). It describes the techniques used to compile the Database of Arabic Names (DAN), a large-scale lexical resource containing millions of Arabic names and their variants in both romanized and fully vocalized Arabic, and argues that linguistic knowledge in the form of a rule-driven lexicon can enhance the accuracy of statistical methods to achieve high accuracy in the recognition of Arabic names.
Topics Exploitation of LRs in different types of applications (information extraction, information retrieval, speech dictation, translation, summarisation, web services, semantic web, etc.),
Multilingual document retrieval,
Multilingual information retrieval
Full paper Lexicon-Driven Approach to the Recognition of Arabic Named Entities
Bibtex @InProceedings{HALPERN09.7,
  author = {Jack Halpern},
  title = {Lexicon-Driven Approach to the Recognition of Arabic Named Entities},
  booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
  year = {2009},
  month = {April},
  date = {22-23},
  address = {Cairo, Egypt},
  editor = {Khalid Choukri and Bente Maegaard},
  publisher = {The MEDAR Consortium},
  isbn = {2-9517408-5-9},
  language = {english}

Powered by ELDA © 2009 The MEDAR Consortium