;; ;; TC-STAR Hansard Text corpus v08jan2007 ;; The archive contains debates from the British Parliament from Nov 1999 to May 2006. The corpus contains 48 M words. The files are splitted into 8 directories, one per year. The file nomenclature is YYYYMMDD_NN.xml where: YYYY 4 digits representing the year MM 2 digits for the month DD 2 digits for the day NN 2 digits for number of the document Example 20041201_11.xml contains the 11th document from the debate of the 1st of December 2004. The XML structure is:
Questions/Comments: Djamel Mostefa (mostefa@elda.org) Olivier Hamon (hamon@elda.org)