Summary of the paper

Title Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank
Authors Mohamed Maamouri, Ann Bies and Seth Kulick
Abstract The LDC Arabic Treebank team has significantly revised and enhanced its annotation guidelines and annotation procedures over the last two years, with the goal of reducing inconsistency in annotation in the Treebank. We have now completed automatic and significant manual revisions to 738,845 tokens/words in total, bringing them into line as far as possible with the new annotation guidelines and greatly improving the annotation consistency. We created a methodology for large-scale correction of Treebank annotation during the course of this revision process, balancing the need for consistency with tight time constraints for correcting and updating a large amount of data annotated according to previous guidelines. The combination and interleaving of automatic and manual corrections were crucial to the success of the overall revision. We also demonstrate the success of the revision by reporting on an improvement in parsing results.
Topics Evaluation, validation, quality assurance of Arabic LRs,
Monolingual and multilingual LRs,
Guidelines, standards, specifications, models and best practices for Arabic LRs
Full paper Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank
Bibtex @InProceedings{MAAMOURI09.68,
  author = {Mohamed Maamouri, Ann Bies and Seth Kulick},
  title = {Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank},
  booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
  year = {2009},
  month = {April},
  date = {22-23},
  address = {Cairo, Egypt},
  editor = {Khalid Choukri and Bente Maegaard},
  publisher = {The MEDAR Consortium},
  isbn = {2-9517408-5-9},
  language = {english}
  }

Powered by ELDA © 2009 The MEDAR Consortium