Summary of the paper

Title A Framework for the Rapid Development of List Based Domain Specific Arabic Stemmers
Authors Samhaa R. El-Beltagy and Ahmed Rafea
Abstract Increased interest in the field of text mining, has also witnessed increased interest in the development of more accurate stemmers required by various text mining applications. The goal of this work is to present an approach for stemming that falls somewhere in between aggressive and light stemming. Not only does the presented work address the removal of suffixes and prefixes, it also introduces a set of rules for removing infixes without the use of a morphological analyzer. The basic premise on which this work is based, is that in any reasonably sized corpus, a word and its stem are likely to both appear in the corpus. By capitalizing on this observation, the work aims to present a method for rapidly building stem lists from a small set of documents as well as make use of the local context of a document when carrying out stemming. The evaluation of the work shows that it significantly improves stemming accuracy. It also shows that by improving stemming accuracy, a task such as automatic annotation can also be significantly improved.
Topics Methods, tools and procedures for acquisition, creation, management, access, distribution and use of Arabic LRs,
Terminology, term extraction, domain-specific dictionaries,
Open architectures for LRs and tools
Full paper A Framework for the Rapid Development of List Based Domain Specific Arabic Stemmers
Bibtex @InProceedings{ELBELTAGY09.13,
  author = {Samhaa R. El-Beltagy and Ahmed Rafea},
  title = {A Framework for the Rapid Development of List Based Domain Specific Arabic Stemmers},
  booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
  year = {2009},
  month = {April},
  date = {22-23},
  address = {Cairo, Egypt},
  editor = {Khalid Choukri and Bente Maegaard},
  publisher = {The MEDAR Consortium},
  isbn = {2-9517408-5-9},
  language = {english}
  }

Powered by ELDA © 2009 The MEDAR Consortium