W0024 : PAROLE Portuguese Corpus

The parole Portuguese corpus contains approximately 3 million running words of European Portuguese distributed by Medium, as follows:

  • Newspaper: about 65%, covering the period 1996-1997 of 3 titles;
  • Book: about 20%, concerning 12 titles from 3 editing houses;
  • Periodical: about 5%, concerning 7 weekly issues of 1 title, 1996;
  • Miscellaneous: about 10%, concerning several files distributed by 8 titles.

The corpus was classified and encoded according to the common core parole encoding standard. The file format of this corpus is SGML.

A subcorpus of the PAROLE Portuguese Corpus, which reproduces approximately the whole Corpus distribution by Medium (Newspaper: about 65%, Book: ab. 20%, Periodical: ab. 5%, Miscellaneous: ab. 10%) is also available.

It has about 250,000 words morpho-syntactically tagged accordingly to the parole common tagset and morpho-syntactic annotation standards. Disambiguation was manually checked.



Click here to view the prices
and browse other ressources
belonging to this category

Copyright © 1996-2001 ELRA/ELDA - Webmaster