Summary of the paper

Title Classification of Arabic Information Extraction Methods
Authors Abd El Salam Alhajjar, Mohammad Hajjar and Khaldoun Zreik
Abstract The performance of information retrieval in arabic language is very problematic due to the specific morphological and structural changes in the language. To extract information from an arabic document, the involved methods must answer the following question: "How can we find the root of the word we search". To find a word in an arabic dictionary, first you must extract the root of this word and then find this root in the dictionary. This is because the vocabulary of the arabic language is essentially built from the roots derivation. The roots are words composed of three to five consonants letters. To address these problems, several methods have been proposed. The aim of this paper is to propose a preliminary classification of arabic information extraction methods. These methods can be classified into two main categories. The first one is called "Stemmer". This category includes the following subcategories: Stemmer based on affixes, Stemmer based on translation and Stemmer based on pattern and affixes. The second is called "N-gram". This category regroups the subcategories: N-gram based on Dice's similarity coefficient and N-gram based on "Manhattan distance" dissimilarity coefficient. However, we find methods which implement the two approaches "Stemmer" and "N-gram". This work contributes to decide on the more appropriate arabic information extraction method.
Topics Exploitation of LRs in different types of applications (information extraction, information retrieval, speech dictation, translation, summarisation, web services, semantic web, etc.),
Multilingual document retrieval,
Extraction and acquisition of knowledge (e.g. terms, lexical information, language modelling) from LRs
Full paper Classification of Arabic Information Extraction Methods
Bibtex @InProceedings{ALHAJJAR09.47,
  author = {Abd El Salam Alhajjar, Mohammad Hajjar and Khaldoun Zreik},
  title = {Classification of Arabic Information Extraction Methods},
  booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
  year = {2009},
  month = {April},
  date = {22-23},
  address = {Cairo, Egypt},
  editor = {Khalid Choukri and Bente Maegaard},
  publisher = {The MEDAR Consortium},
  isbn = {2-9517408-5-9},
  language = {english}
  }

Powered by ELDA © 2009 The MEDAR Consortium