Summary of the paper

Title Arabic Part-Of-Speech Tagging using Transformation-Based Learning
Authors Shabib AlGahtani, William Black and John McNaught
Abstract Corpus-based methods have been widely used to tackle NLP tasks after the advent of annotated corpora with a notable success. Inevitably, shifting from classical rule-based to corpus-based method has a major drawback. That is, most of corpus-based ones produce mathematical models that are hard to interpret and modify along with their higher complexity in terms of required processing power and memory allocation. Luckily, Transformation-based learning technique is one corpus-based method that embraces the power of both worlds; overcoming obscurity and complexity without relinquishing state-of-the-art accuracy. This paper examines the application of TBL to the task of tagging Modern Standard Arabic text. For unknown words guessing, an n-gram technique has been adopted to select best tag from a list of candidates outputted from a morphological analyzer exploiting previous context. The developed tagger achieved an accuracy of 98.6% when evaluated on the train set and 96.9% on the test set. Furthermore, the same unknown words module has been slightly modified and successfully applied to the task of word-tokenization with an accuracy of 99.6%.
Topics Taggers and Parsers
Full paper Arabic Part-Of-Speech Tagging using Transformation-Based Learning
Bibtex @InProceedings{ALGAHTANI09.43,
  author = {Shabib AlGahtani, William Black and John McNaught},
  title = {Arabic Part-Of-Speech Tagging using Transformation-Based Learning},
  booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
  year = {2009},
  month = {April},
  date = {22-23},
  address = {Cairo, Egypt},
  editor = {Khalid Choukri and Bente Maegaard},
  publisher = {The MEDAR Consortium},
  isbn = {2-9517408-5-9},
  language = {english}

Powered by ELDA © 2009 The MEDAR Consortium