Proceedings of the Second International Conference on Arabic Language Resources and Tools

Summary of the paper

Title	Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank
Authors	Mohamed Maamouri, Ann Bies and Seth Kulick
Abstract	The LDC Arabic Treebank team has significantly revised and enhanced its annotation guidelines and annotation procedures over the last two years, with the goal of reducing inconsistency in annotation in the Treebank. We have now completed automatic and significant manual revisions to 738,845 tokens/words in total, bringing them into line as far as possible with the new annotation guidelines and greatly improving the annotation consistency. We created a methodology for large-scale correction of Treebank annotation during the course of this revision process, balancing the need for consistency with tight time constraints for correcting and updating a large amount of data annotated according to previous guidelines. The combination and interleaving of automatic and manual corrections were crucial to the success of the overall revision. We also demonstrate the success of the revision by reporting on an improvement in parsing results.
Topics	Evaluation, validation, quality assurance of Arabic LRs, Monolingual and multilingual LRs, Guidelines, standards, specifications, models and best practices for Arabic LRs
Full paper	Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank
Bibtex	@InProceedings{MAAMOURI09.68, author = {Mohamed Maamouri, Ann Bies and Seth Kulick}, title = {Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank}, booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools}, year = {2009}, month = {April}, date = {22-23}, address = {Cairo, Egypt}, editor = {Khalid Choukri and Bente Maegaard}, publisher = {The MEDAR Consortium}, isbn = {2-9517408-5-9}, language = {english} }