TC-STAR WP4 - SLT Evaluation Run #2

This section only summarises the more complete SLT Evaluation Plan document to be found here.

TC-STAR Evaluation Run #2 for SLT will take place from November 18, 2005 to March 15, 2006. The complete schedule can be seen here, but we can outline the important dates for SLT:

November 18, 2005: ELDA disseminates Spanish Parliament development data set through this web site
November 25, 2005: ELDA disseminates EPPS English & Spanish development data set through this web site
February 12, 2006: ASR team prepare word graphs for SLT team
February 14, 2006: End of SLT development phase
February 15, 2006: Beginning of SLT run - ELDA sends source files to participants
March 01, 2006: End of SLT run - translations are sent back to ELDA
March 05, 2006: End of automatic scoring phase by ELDA - initial results
March 06, 2006: Beginning of adjudication phase
March 15, 2006: End of adjudication phase - results are definitive

Before the proper evaluation run, participants have access to training data and development data composed of parallel texts and transcriptions. See below training data and development data.

SLT evaluation will be run in 3 translation directions: English to Spanish, Spanish to English and Chinese Mandarin to English.

For each translation direction, three kinds of text data were used as input:

The first one is the output of the automatic speech recognition systems. The text is in true case and punctuation marks are provided. The data are automatically segmented at syntactic or semantic breaks.
The second type of data is the verbatim transcriptions. These are manual transcriptions produced by ELDA . These transcriptions include spontaneous speech phenomena, such as hesitations, corrections, false-starts, etc. The annotations are produced for English, Spanish and Mandarin. As for the ASR output, the text data is provided in true case and with punctuation marks.
The last one is the text data input. Final Text Editions (FTE) provided by the European Parliament are used for the EPPS task and the clean transcriptions were used for the VOA task. These text transcriptions differ slightly from the verbatim ones. Some sentences are rewritten. The text data does not include transcription of spontaneous speech phenomena.

An example of the three kinds of inputs is shown below:

Text
I am starting to know what Frank Sinatra must have felt like,
Verbatim
I'm I'm I'm starting to know what Frank Sinatra must have felt like
ASR output
and i'm times and starting to know what frank sinatra must have felt like

English to Spanish and Spanish to English are run on recording transcriptions from the European Parliament Plenary Sessions (EPPS), while Chinese to English is run from recording transcriptions of Voice of America.

For the participants, a submission guideline is available.

SLT Participants

Direction	Input	Participants
Zh-->En (VoA)	Single-best ASR	IBM?, IRST, RWTH, UKA
Zh-->En (VoA)	Verbatim	IBM?, IRST, RWTH, UKA
Es-->En (EPPS + PARL)	Text	IBM, IRST, RWTH, UKA, UPC
	Single-best ASR	IBM, IRST, LIMSI, RWTH, UKA, UPC
	Verbatim	IBM, IRST, LIMSI, RWTH, UKA, UPC
En-->Es (EPPS)	Text	IBM, IRST, RWTH, UKA, UPC
	Single-best ASR	IBM, IRST, RWTH, UKA, UPC
	Verbatim	IBM, IRST, RWTH, UKA, UPC

There is no text condition for Mandarin.

IBM: International Business Machines, Germany

IRST: Il Centro per la Ricerca Scientifica e Tecnologica, Italy

LIMSI: Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur, France

RWTH: Rheinisch-Westfaelische Technische Hochschule, Germany

UKA: Universitaet Karlsruhe, Germany

UPC: Universitat Politècnica de Catalunya, Spain

External Participants

Direction	Input	Participants
Zh-->En (VoA)	Single-best ASR	ICT, NLPR
Zh-->En (VoA)	Verbatim	ICT, NRC, NLPR
Es-->En (EPPS + PARL)	Text	SYSTRAN, UED, UPV, UW
	Single-best ASR	UED
	Verbatim	UED, UW
En-->Es (EPPS)	Text	DFKI, SYSTRAN, UED, UPV, UW
	Single-best ASR	DFKI, UED
	Verbatim	DFKI, UED, UW

DFKI: Deutches Forschungszentrum für Künstliche Intelligenz, Germany

ICT: Institute of Computing Technology, China

NLPR: National Laboratory of Pattern Recognition, China

NRC: National Research Council, Canada

SYSTRAN : System Language Translation Technologies

UED: University of Edinburgh

UPV: Universitat Politècnica de Valencia, Spain

UW: University of Washington, United States

John Hopkins University and Google have not given their commitments yet.

SLT Resources

Training data are the same than for run #1.

Direction	Description	Reference	Amount	IPR-owner	IPR-distrib	Comment
Training
Zh->En	FBIS Multilanguage Texts	LDC2003E14		LDC	research	LDC membership 03 required
	UN Chinese English Parallel Text Version 2	LDC2004E12		LDC	research	LDC membership 04 required
	Hong Kong Parallel Text	LDC2004T08		LDC	research	LDC membership 04 required
	English Translation of Chinese Treebank	LDC2002E17		LDC	research	LDC membership 02 required
	Xinhua Chinese-English Parallel News Text Version 1.0 beta 2	LDC2002E18		LDC	research	LDC membership 02 required
	Chinese English Translation Lexicon version 3.0	LDC2002L27		LDC	research	LDC membership 02 required
	Chinese-English Name Entity Lists version 1.0 beta	LDC2003E01		LDC	research	LDC membership 03 required
	Chinese English News Magazine Parallel Text	LDC2005E47		LDC	research	LDC membership 05 required
	Multiple-Translation Chinese (MTC) Corpus	LDC2002T01		LDC	research	LDC membership 02 required
	Multiple Translation Chinese (MTC) Part 2	LDC2003T17		LDC	research	LDC membership 03 required
	Multiple Translation Chinese (MTC) Part 3	LDC2004T07		LDC	research	LDC membership 04 required
	Chinese News Translation Text Part 1	LDC2005T06		LDC	research	LDC membership 05 required
	Chinese Treebank 5.0	LDC2005T01		LDC	research	LDC membership 05 required
	Chinese Treebank English Parallel Corpus	LDC2003E07		LDC	research	LDC membership 03 required
Es->En	EPPS Spanish verbatim transcriptions May - Jan 2005		100h transcribed	UPC	ELRA	Transcribed by UPC
	EPPS Spanish final text edition May 2004- Jan 2005			EC	ELRA	English and Spanish parallel texts are aligned. Verbatim transcriptions are also aligned with FTE by RWTH.
	EPPS Spanish final text edition April 1996 to Jan 2005			EC	RWTH	Provided to TCSTAR by RWTH
En->Es	EPPS English verbatim transcriptions May 2004- Jan 2005		100h transcribed	RWTH	ELRA	Transcribed by RWTH
	EPPS English final text edition May 2004- Jan 2005			EC	ELRA	English and Spanish parallel texts are aligned. Verbatim transcriptions are also aligned with FTE by RWTH.
	EPPS English final text edition April 1996 to Jan 2005			EC	RWTH	Provided to TCSTAR by RWTH

SLT Download area

Training Data

English and Spanish

You can use any of the training resources listed in the table above in addition to the EPPS training sets. To get these last sets on DVD, please contact Christian Gollan at RWTH.

Chinese

You can use any of the training resources listed in the table above excepted TDT3 audio files and transcriptions for the month of December 1998 (development and test sets will be built from these files).

Development Data

Verbatim transcriptions of EPPS are common with ASR evaluation. The difference is that only 2 files are used for SLT (instead of 3 for ASR) as only 25,000 words are needed.

You can also find development data of the 2005 SLT evaluation on the SLT Run #1 page.

Direction	Input	Files
Es-->En (cortes)	Final Text Edition	(New version of the data updated on the 28th August 2005) Complete archive (zip) contains: Spanish source set (NIST MT format) English reference translations (NIST MT format)
	Verbatim	(New version of the data updated on the 28th August 2006) Complete archive (zip) contains: Spanish source set (NIST MT format) English reference translations (NIST MT format)
	ASR	(New version of the data updated on the 15th February 2006) Complete CTM archive (zip) contains (without punctuation marks): Spanish ASR output from LIMSI system (CTM format) Complete XML archive (zip) contains (without punctuation marks): Spanish source set from LIMSI system (NIST MT format) English reference translations (NIST MT format) Complete archive (zip) with the same data containing punctuation marks automatically added in post-processing by ELDA.
En-->Es (EPPS)	Final Text Edition	(New version of the data updated on the 25th January 2005) Complete archive ( zip) contains: English source set (NIST MT format) Spanish reference translations (NIST MT format)
	Verbatim	(New version of the data updated on the 25th January 2005) Complete archive (zip) contains: English source set (NIST MT format) Spanish reference translations (NIST MT format
	ASR	English ASR output from ROVER system (CTM format, same system as TEST evaluation, with punctuation marks)
	ASR	Individual system output of the data (New version of the data updated on the 15th February 2006): Complete CTM archive (zip) contains (without punctuation marks): English ASR output from IBM05 system (CTM format) English ASR output from IRST-p1 system (CTM format) English ASR output from IRST-p2 system (CTM format) English ASR output from IRST-p1 + LIMSI05 system (CTM format) English ASR output from IRST-p2 + LIMSI05 system (CTM format) English ASR output from LIMSI05 system (CTM format) English ASR output from RWTH system (CTM format) English ASR output from UKA system (CTM format) Complete XML archive (zip) contains (without punctuation marks): English source set from IBM05system (NIST MT format) English source set from IRST-p1 system (NIST MT format) English source set from IRST-p2 system (NIST MT format) English source set from IRST-p1 + LIMSI05 system (NIST MT format) English source set from IRST-p2 + LIMSI05 system (NIST MT format) English source set from LIMSI05 system (NIST MT format) English source set from RWTH system (NIST MT format) English source set from UKA system (NIST MT format) English reference translations (NIST MT format) Complete archive (zip) with the same data containing punctuation marks automatically added in post-processing by ELDA.
Es-->En (EPPS)	Final Text Edition	(New version of the data updated on the first December 2005) Complete archive (zip) contains: Spanish source set (NIST MT format) English reference translations (NIST MT format)
	Verbatim	(New version of the data updated on the 19th January 2005) Complete archive (zip contains: Spanish source set (NIST MT format) English reference translations (NIST MT format)
	ASR	Spanish ASR output from ROVER system (CTM format, same system as TEST evaluation, with punctuation marks)
	ASR	Individual system output of the data (New version of the data updated on the 15th February 2006): Complete CTM archive (zip) contains (without punctuation marks): Spanish ASR output from IRST system (CTM format) Spanish ASR output from LIMSI system (CTM format) Spanish ASR output from RWTH system (CTM format) Complete XML archive (zip) contains (without punctuation marks): Spanish source set from IRST system (NIST MT format) Spanish source set from LIMSI system (NIST MT format) Spanish source set from RWTH system (NIST MT format) English reference translations (NIST MT format) Complete archive (zip) with the same data containing punctuation marks automatically added in post-processing by ELDA.

Zh-->En (VoA)	Verbatim	(New version of the data updated on the 23rd January 2006) Complete archive (zip) contains: Chinese source set (NIST MT format, GB2312 encoded) English reference translations (NIST MT format, UTF-8 encoded)
	ASR	(New version of the data updated on the 28th February 2006) Chinese ASR output from ROVER system (CTM format, same system as TEST evaluation, with punctuation marks)
	ASR	(New version of the data updated on the 23rd January 2006) Complete archive (zip) contains: Chinese source set (NIST MT format, GB2312 encoded) English reference translations (NIST MT format, UTF-8 encoded)

The translation guidelines for the translation agencies are available here (MS Word document).

Validation report from SPEX for ENES development files (FTE & Verbatim)

Validation report from SPEX for ESEN development files (FTE & Verbatim)

Word graphs/lattices are regularly posted on WP2's own web page.

You can find on the LIMSI web page the single best results dev06 for English and Spanish.

Scoring Tools

(updated on January 10)

The scoring tools proposed by ELDA are Perl scripts. You can download them in this zip containing:

BLEU/IBM v1.04b
BLEU/NIST v11b
mWER: multiple reference word error rate
mPER: multiple reference position-independent word error rate
mCER: multiple reference character error rate
WNM: Weighted N-gram Model

In this package, we propose sample files to check proper installation:

procedure_sample_file.txt: examples of evaluation, with the following files
TC-STAR_SAMPLE_REF.TXT: reference file (in Spanish)
TC-STAR_SAMPLE_SRC.TXT: source file (in English)
TC-STAR_SAMPLE_TGT.TXT: target file (in Spanish)

For the ASR task the alignment tool from RWTH is available here.

Test Data

Submissions will have to be sent by email to hamon@elda.org before Wednesday March 1st.

Submission guidelines:

Submitted files should use the NIST MT format for TSTSET:

<TSTSET SetID="..." SrcLang="..." TrgLang="...">
<DOC DocID="..." SysID="...">
<SEG id="1">
TRANSLATED TEXT
</SEG>
...
</DOC>
...
</TSTSET>

Output file and source file formats are the same with the following exceptions:

"SRCSET" is replaced by "TSTSET", at the beginning and the end of the file
the "TrgLang" attribute is added in the "TSTSET" tag of the output file (representing target language - Spanish or English)
the "SysID" attribute is added, corresponding to the identifier of the organisation (please see below for the assignment of the identifier)

Recommendations:

Please pay attention to tags, uppercases and carriage returns
Don't change either the DocID attribute or segmentation number
File must be utf-8 encoded
Each segment must be written in one line, and please avoid blank lines.

Submit one test set per file, i.e. one file for English verbatim transcripts, one file for English ASR output, one file for English FTE, etc.

The SysID attribute must identify the organisation, the condition, and the system. For instance, if the organisation ORG submits one primary condition and two secondary conditions for the English verbatim transcripts (one with the same system as for the primary condition and another with a different system or system version), then 3 files will be sent for this setid, with the following SysID:

ORG-PRIMARY-system1
ORG-SECONDARY-system1
ORG-SECONDARY-system2

In the same manner, output files must identify the organisation, with the same constraints. For instance, if the organisation ORG translates the file "TC-STAR_RUN2_TEST06_EPPS_FTE_ENES_SRC.TXT", the translated file should be renamed "TC-STAR_RUN2_TEST06_EPPS_FTE_ENES_ORG-PRIMARY-system1.TXT".

(the use of "system1" can be omitted if there is only one system by condition)

About the ASR task, the SetID attribute should be:

"tctar-run2-epps-test-enes-asr" for English to Spanish direction
"tctar-run2-epps-test-esen-asr" for Spanish to English direction
"tctar-run2-epps-test-zhen-asr" for Chinese to Engliish direction

Systems description:

For each experiment, a one page system description must be provided describing the data used, the approaches (algorithms), the configuration, the processing time, etc. The document should also contain references. The file should be named as "<SysID>.txt" .

Submission:

Submissions must be sent by email at the following address: hamon@elda.org

with the subject: "[TC-STAR] Submission <SysID>"

and with the archived files in attachment.
The deadline is Wednesday 1st of March, 23h59 CET. (5h59 pm for Pittsburgh and Yorktown)
A return receipt will be sent within 24 hours.

Source files

Direction	Input	Files
En-->Es (EPPS)	Final Text Edition	English source set (NIST MT format)
	Verbatim	English source set (NIST MT format)
	ASR	English ASR output (CTM file format)
Es-->En (EPPS + PARL)	Final Text Edition	Spanish source set (NIST MT format)
	Verbatim	Spanish source set (NIST MT format)
	ASR	Spanish ASR output (CTM file format)

Zh-->En (VoA)	Verbatim	Chinese source set (NIST MT format, GB2312 encoded)
Zh-->En (VoA)	ASR	Chinese ASR output (CTM file format)

Reference files

Direction	Input	Files
En-->Es (EPPS)	Final Text Edition	Spanish reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
	Verbatim	Spanish reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
	ASR	Spanish reference translations set (NIST MTformat, utf-8 encoded, 2 reference translations)
Es-->En (EPPS + PARL)	Final Text Edition	English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations) - updated on the 31st March 2006
	Verbatim	English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations) - updated on the 28 August 2006
	ASR	English reference translations set (NIST MTformat, utf-8 encoded, 2 reference translations) - updated on the 28 August 2006
EPPS	Final Text Edition	English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
	Verbatim	English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
	ASR	English reference translations set (NIST MTformat, utf-8 encoded, 2 reference translations)
PARL	Final Text Edition	English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
	Verbatim	English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations) - updated on the 28 August 2006
	ASR	English reference translations set (NIST MTformat, utf-8 encoded, 2 reference translations) - updated on the 28 August 2006

Zh-->En (VoA)	Verbatim	English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
Zh-->En (VoA)	ASR	English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations) - updated on the 06th December 2006