TC-Star Evaluation Information (WP4)

subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link
subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link
subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link
subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link
subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link
subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link
subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link
subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link

SLT Evaluation - Run #3

 

Update History

2007-03-21 10:33 Validation of Chinese translation updated, Chinese reference files updated, Chinese scores updated
2007-02-23 9:59 Validation of translations updated
2007-02-19 20:35 Reference files available
2007-02-19 20:35 Results updated
2007-02-09 20:40 Preliminary results available
2007-01-31 10:00 Test data available
2007-01-31 10:37 Chinese Verbatim test file updated
2007-02-01 13:27 Spanish ASR test file updated
2007-02-02 12:30 English ASR test file updated

This section only summarises the more complete SLT Evaluation Plan document to be found here.

TC-STAR Evaluation Run #3 for SLT will take place from Jan 31, 2007 to Fev 07, 2007. The development data is already available (see developement data section). The complete schedule can be seen here, but we can outline the important dates for SLT:

  • September 06, 2006: ELDA development data set through this web site
  • January 28, 2007: ASR team prepare word graphs for SLT team
  • January 31, 2007: End of SLT development phase
  • January 31, 2007: Beginning of SLT run - ELDA sends source files to participants
  • February 07, 2007: End of SLT run - translations are sent back to ELDA
  • February 09, 2007: End of automatic scoring phase by ELDA - initial results
  • February 09, 2007: Beginning of adjudication phase
  • February 09, 2007: Beginning of human evaluation phase
  • February 23, 2007: End of adjudication phase - automatic results are definitive
  • March 16, 2007: End of human evaluation phase, results are sent to participants

Before the proper evaluation run, participants have access to training data and development data composed of parallel texts and transcriptions. See below training data and development data.

SLT evaluation will be run in 3 translation directions: English to Spanish, Spanish to English and Chinese Mandarin to English.

For each translation direction, three kinds of text data were used as input:

  1. The first one is the output of the automatic speech recognition systems. The text is in true case and punctuation marks are provided. The data are automatically segmented at syntactic or semantic breaks.
  2. The second type of data is the verbatim transcriptions. These are manual transcriptions produced by ELDA . These transcriptions include spontaneous speech phenomena, such as hesitations, corrections, false-starts, etc. The annotations are produced for English, Spanish and Mandarin. As for the ASR output, the text data is provided in true case and with punctuation marks.
  3. The last one is the text data input. Final Text Editions (FTE) provided by the European Parliament are used for the EPPS task and the clean transcriptions were used for the VOA task. These text transcriptions differ slightly from the verbatim ones. Some sentences are rewritten. The text data does not include transcription of spontaneous speech phenomena.

An example of the three kinds of inputs is shown below:

Text
I am starting to know what Frank Sinatra must have felt like,
Verbatim
I'm I'm I'm starting to know what Frank Sinatra must have felt like
ASR output
and i'm times and starting to know what frank sinatra must have felt like

English to Spanish and Spanish to English are run on recording transcriptions from the European Parliament Plenary Sessions (EPPS), while Chinese to English is run from recording transcriptions of Voice of America.

For the Spanish to English direction, test data from the European Parliament (EPPS) and from the Spanish Parliament (Cortes) is used. However no distinction using document ID tags is allowed.

We propose to have the following tracks, which determine the training data allowed:
  • EPPS-Only Track:
    Only the EPPS bilingual data available in the RWTH web-page is allowed.
    This comprises the data from April 1996 to September 2004, December 2004 to
    May 2005 and December 2005 to May 2006. No additional bilingual data is
    allowed. Monolingual tools (e.g. POS-taggers) and publicly available
    monolingual data can be used.
  • Public Data Track:
    Any publicly available data can be used, with the exception of the
    data generated after May 2006 (cutoff date for test data). Some
    additional corpora have already been made available on the RWTH
    TC-Star web page. These are the EU Bulletin Corpus, the JRC-Acquis
    Corpus and the UN Corpus. Participants are however not restricted
    to this additional data.
For the Chinese-English translation direction a similar setup as in the NIST 2006 MT Evaluation is proposed. There are two conditions:
  • Verbatim Condition
  • ASR Condition
    Again, single best and lattice recognizer output is provided, but
    they are not evaluated separately
Two tracks are proposed:
  • Public Track:
    All public data available from LDC is allowed (except the corpus
    the test data is extracted from, LDC2002T01). Monolingual tools
    are allowed.
  • Open Track:
    All data is available (except the corpus the test data is
    extracted from, LDC2002T01)

Participants should be encouraged to participate in the Public Track.

For the participants, a submission guideline is available.

Back to Top

SLT Participants

 

Direction Input Participants
Zh-->En (VoA) Single-best ASR IRST, RWTH, UKA
Verbatim IRST, RWTH, UKA
Es-->En (EPPS + PARL) Text IBM, IRST, RWTH, UKA, UPC
Single-best ASR IBM, IRST, LIMSI, RWTH, UKA, UPC
Verbatim IBM, IRST, LIMSI, RWTH, UKA, UPC
En-->Es (EPPS) Text IBM, IRST, RWTH, UKA, UPC
Single-best ASR IBM, IRST, LIMSI, RWTH, UKA, UPC
Verbatim IBM, IRST, LIMSI, RWTH, UKA, UPC

There is no text condition for Mandarin.

IBM: International Business Machines, Germany
IRST: Il Centro per la Ricerca Scientifica e Tecnologica, Italy
LIMSI: Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur, France
RWTH: Rheinisch-Westfaelische Technische Hochschule, Germany
UKA: Universitaet Karlsruhe, Germany
UPC: Universitat Politècnica de Catalunya, Spain

External Participants

Direction Input Participants
Zh-->En (VoA) Single-best ASR ICT, NICT, UDS XMU
Verbatim ICT, NICT, UDS XMU
Es-->En (EPPS + PARL) Text JHU, NICT, Translendium, UDS
Single-best ASR JHU, NICT, UDS
Verbatim JHU, NICT, UDS
En-->Es (EPPS) Text UDS
Single-best ASR UDS
Verbatim UDS
ICT: Institute of Computing Technologies - Chinese Academy of Sciences, China
JHU: The John Hopkins University, United States
XMU: Institute of Artificial Intelligence - Xiamen University, China
Translendium: Translendium SL, Spain
UDS: Universität Des Saarlandes, Germany
NiCT - ATR: National Institute of Information and Communications Technology - Advanced Telecommunications Research Institute International, Japan

Back to Top

SLT Resources

Training data are the same than for run #1 and run #2.

Direction Description Reference Amount IPR-owner IPR-distrib Comment
Training
Zh->En FBIS Multilanguage Texts LDC2003E14   LDC research LDC membership 03 required
  UN Chinese English Parallel Text Version 2 LDC2004E12   LDC research LDC membership 04 required
  Hong Kong Parallel Text LDC2004T08   LDC research LDC membership 04 required
  English Translation of Chinese Treebank LDC2002E17   LDC research LDC membership 02 required
  Xinhua Chinese-English Parallel News Text Version 1.0 beta 2 LDC2002E18   LDC research LDC membership 02 required
  Chinese English Translation Lexicon version 3.0 LDC2002L27   LDC research LDC membership 02 required
  Chinese-English Name Entity Lists version 1.0 beta LDC2003E01   LDC research LDC membership 03 required
  Chinese English News Magazine Parallel Text LDC2005E47   LDC research LDC membership 05 required
  Multiple-Translation Chinese (MTC) Corpus LDC2002T01   LDC research LDC membership 02 required
  Multiple Translation Chinese (MTC) Part 2 LDC2003T17   LDC research LDC membership 03 required
  Multiple Translation Chinese (MTC) Part 3 LDC2004T07   LDC research LDC membership 04 required
  Chinese News Translation Text Part 1 LDC2005T06   LDC research LDC membership 05 required
  Chinese Treebank 5.0 LDC2005T01   LDC research LDC membership 05 required
  Chinese Treebank English Parallel Corpus LDC2003E07   LDC research LDC membership 03 required
Es->En EPPS Spanish verbatim transcriptions May - Jan 2005     UPC ELRA Transcribed by UPC
  EPPS Spanish final text edition April 1996 to Sept 2004     EC RWTH Provided to TCSTAR by RWTH
  EPPS Spanish final text edition Dec 2004 - May 2005     EC ELRA

English and Spanish parallel texts are aligned. Verbatim transcriptions are also aligned with FTE by RWTH.

  EPPS Spanish final text edition Dec 2005 - May 2006     EC ELRA

English and Spanish parallel texts are aligned. Verbatim transcriptions are also aligned with FTE by RWTH.

En->Es EPPS English verbatim transcriptions May 2004- Jan 2005     RWTH ELRA Transcribed by RWTH
  EPPS English final text edition April 1996 to Sept 2004     EC RWTH Provided to TCSTAR by RWTH
  EPPS English final text edition Dec 2004 - May 2005     EC ELRA

English and Spanish parallel texts are aligned. Verbatim transcriptions are also aligned with FTE by RWTH.

  EPPS English final text edition Dec 2005 - May 2006     EC ELRA

English and Spanish parallel texts are aligned. Verbatim transcriptions are also aligned with FTE by RWTH.

En<->Es EU Bulletin Corpus       ELRA

The Bulletin of the European Union provides an insight into the activities of the European Commission and the other Community institutions." It is published in a monthly basis and parallel versions in Spanish and English are available up to 2004. The corpus is available in a raw version, with the html documents as downloaded from the pages of the European Union, or as a sentence aligned version.

Sentence alignment provided by RWTH

  JRC-Acquis Multilingual Parallel Corpus       research

Before joining the European Union (EU), the new Member States (NMS) needed to translate and approve the existing EU legislation, consisting of selected texts written between the 1950s and 2005. This body of legislative text, which consists of approximately eight thousand documents and which covers a variety of domains, is called the Acquis Communautaire (AC)." The original version of the corpus (in several languages) can be downloaded directly from the link above, as well as tool for paragraph alignment. The RWTH conducted an additional sentence level alignment of the corpus.

Provided by RWTH

  UN Parallel Corpus       LDC

The text files published in this corpus were provided to the LDC by the United Nations in New York, for use by the research community in developing machine translation technology. This material has been drawn from the UN's electronic text archives covering the period between 1988 and (portions of) 1993." We are not allowed to distribute this data, therefore a set of tools has been made available for carrying out the sentence alignment by each partner.

Alignment tools provided by RWTH

Back to Top

SLT Download area

Training Data

English and Spanish

You can use any of the training resources listed in the table above in addition to the EPPS training sets. To get these last sets on DVD, please contact Christian Gollan at RWTH.

Chinese

You can use any of the training resources listed in the table above excepted TDT3 audio files and transcriptions for the month of December 1998 (development and test sets will be built from these files).

Back to Top

Development Data

Verbatim transcriptions of EPPS are common with ASR evaluation. The difference is that only 2 files are used for SLT (instead of 3 for ASR) as only 25,000 words are needed.

You can also find development data of the 2005 SLT evaluation on the SLT Run #1 page.

Direction Files
Es-->En (cortes+EPPS)
  • Complete archive (zip) contains:
    • DEV05 VERBATIM Spanish source set (NIST MT format)
    • DEV05 VERBATIM English reference translations (NIST MT format)
    • TEST05 VERBATIM Spanish source set (NIST MT format)
    • TEST05 VERBATIM English reference translations (NIST MT format)
    • TEST05 FTE Spanish source set (NIST MT format)
    • TEST05 FTE English reference translations (NIST MT format)
    • TEST05 ASR Spanish ASR output from ROVER system (NIST MT format)
    • TEST05 ASR English reference translations (NIST MT format)
    • DEV06 VERBATIM Spanish source set (NIST MT format)
    • DEV06 VERBATIM English reference translations (NIST MT format)
    • DEV06 FTE Spanish source set (NIST MT format)
    • DEV06 FTE English reference translations (NIST MT format)
    • DEV06 ASR Spanish ASR output from ROVER system (CTM format)
    • DEV06 ASR English reference translations (NIST MT format)
    • TEST06 VERBATIM Spanish source set (NIST MT format)
    • TEST06 VERBATIM English reference translations (NIST MT format)
    • TEST06 FTE Spanish source set (NIST MT format)
    • TEST06 FTE English reference translations (NIST MT format)
    • TEST06 ASR Spanish ASR output from ROVER system (CTM format)
    • TEST06 ASR English reference translations (NIST MT format)
En-->Es (EPPS)
  • Complete archive ( zip) contains:
    • DEV05 VERBATIM English source set (NIST MT format)
    • DEV05 VERBATIM Spanish reference translations (NIST MT format)
    • TEST05 VERBATIM English source set (NIST MT format)
    • TEST05 VERBATIM Spanish reference translations (NIST MT format)
    • TEST05 FTE English source set (NIST MT format)
    • TEST05 FTE Spanish reference translations (NIST MT format)
    • TEST05 ASR English ASR output from ROVER system (NIST MT format)
    • TEST05 ASR Spanish reference translations (NIST MT format)
    • DEV06 VERBATIM English source set (NIST MT format)
    • DEV06 VERBATIM Spanish reference translations (NIST MT format)
    • DEV06 FTE English source set (NIST MT format)
    • DEV06 FTE Spanish reference translations (NIST MT format
    • DEV06 ASR English ASR output from ROVER system (CTM format)
    • DEV06 ASR Spanish reference translations (NIST MT format)
    • TEST06 VERBATIM English source set (NIST MT format)
    • TEST06 VERBATIM Spanish reference translations (NIST MT format)
    • TEST06 FTE English source set (NIST MT format)
    • TEST06 FTE Spanish reference translations (NIST MT format
    • TEST06 ASR English ASR output from ROVER system (CTM format)
    • TEST06 ASR Spanish reference translations (NIST MT format)
Zh-->En (VoA)
  • Complete archive (zip) contains:
    • DEV05 VERBATIM Chinese source set (NIST MT format, GB2312 encoded)
    • DEV05 VERBATIM English reference translations (NIST MT format, UTF-8 encoded)
    • DEV05 FTE Chinese source set (NIST MT format, GB2312 encoded)
    • DEV05 FTE English reference translations (NIST MT format, UTF-8 encoded)
    • TEST05 VERBATIM Chinese source set (NIST MT format, GB2312 encoded)
    • TEST05 VERBATIM English reference translations (NIST MT format, UTF-8 encoded)
    • TEST05 ASR Chinese ASR output from LIMSI/UKA system(NIST MT format, GB2312 encoded)
    • TEST05 ASR English reference translations (NIST MT format, UTF-8 encoded)
    • TEST05 VERBATIM Chinese source set (NIST MT format, GB2312 encoded)
    • TEST05 VERBATIM English reference translations (NIST MT format, UTF-8 encoded)
    • DEV06 ASR Chinese ASR output from LIMSI/UKA system (CTM format)
    • DEV06 ASR English reference translations (NIST MT format, UTF-8 encoded)
    • DEV06 VERBATIM Chinese source set (NIST MT format, GB2312 encoded)
    • DEV06 VERBATIM English reference translations (NIST MT format, UTF-8 encoded)
    • TEST06 ASR Chinese ASR output from LIMSI/UKA system (CTM format)
    • TEST06 ASR English reference translations (NIST MT format, UTF-8 encoded)
    • TEST06 VERBATIM Chinese source set (NIST MT format, GB2312 encoded)
    • TEST06 VERBATIM English reference translations (NIST MT format, UTF-8 encoded)

The translation guidelines for the translation agencies are available here (MS Word document).

Word graphs/lattices are regularly posted on WP2's own web page.

Back to Top

Scoring Tools

The scoring tools proposed by ELDA are Perl scripts. You can download them in this archive (zip) containing:

  • BLEU/IBM v1.04b
  • BLEU/NIST v11b
  • mWER: multiple reference word error rate
  • mPER: multiple reference position-independent word error rate
  • mCER: multiple reference character error rate
  • WNM: Weighted N-gram Model

In this package, we propose sample files to check proper installation:

  • procedure_sample_file.txt: examples of evaluation, with the following files
  • TC-STAR_SAMPLE_REF.TXT: reference file (in Spanish)
  • TC-STAR_SAMPLE_SRC.TXT: source file (in English)
  • TC-STAR_SAMPLE_TGT.TXT: target file (in Spanish)

For the ASR task the alignment tool from RWTH is available here.

Back to Top

Test Data

Source files

Direction Input Files
En-->Es (EPPS) Final Text Edition

English source set (NIST MT format)

Verbatim
English source set (NIST MT format)

ASR

English ASR output (CTM file format) (updated 2007/02/02 - 12:30)

Es-->En (EPPS + PARL) Final Text Edition
Spanish source set (NIST MT format)
Verbatim
Spanish source set (NIST MT format)

ASR

Spanish ASR output (CTM file format) (updated 2007/02/01 - 13:27)

     
Zh-->En (VoA) Verbatim
Chinese source set (NIST MT format, GB2312 encoded) (updated 2007/01/31 - 10:37)

ASR

Chinese ASR output (CTM file format)

Chinese ASR output automatically segmented by RWTH (NIST file format)

Submissions will have to be sent by email to hamon@elda.org before Wednesday February 07th, 2007, 23h59 CET

Submission guidelines:

Submitted files should use the NIST MT format for TSTSET:

<TSTSET SetID="..." SrcLang="..." TrgLang="...">
<DOC DocID="..." SysID="...">
<SEG id="1">
TRANSLATED TEXT
</SEG>
...
</DOC>
...
</TSTSET>

Output file and source file formats are the same with the following exceptions:

  • "SRCSET" is replaced by "TSTSET", at the beginning and the end of the file
  • the "TrgLang" attribute is added in the "TSTSET" tag of the output file (representing target language - Spanish or English)
  • the "SysID" attribute is added, corresponding to the identifier of the organisation (please see below for the assignment of the identifier)

Recommendations:

  • Please pay attention to tags, uppercases and carriage returns
  • Don't change either the DocID attribute or segmentation number
  • File must be utf-8 encoded
  • Each segment must be written in one line, and please avoid blank lines.

Submit one test set per file, i.e. one file for English verbatim transcripts, one file for English ASR output, one file for English FTE, etc.

The SysID attribute must identify the organisation, the condition, and the system. For instance, if the organisation ORG submits one primary condition and two secondary conditions for the English verbatim transcripts (one with the same system as for the primary condition and another with a different system or system version), then 3 files will be sent for this setid, with the following SysID:

  • ORG-PRIMARY-system1
  • ORG-SECONDARY-system1
  • ORG-SECONDARY-system2

In the same manner, output files must identify the organisation, with the same constraints. For instance, if the organisation ORG translates the file "TC-STAR_RUN2_TEST06_EPPS_FTE_ENES_SRC.TXT", the translated file should be renamed "TC-STAR_RUN2_TEST06_EPPS_FTE_ENES_ORG-PRIMARY-system1.TXT".

(the use of "system1" can be omitted if there is only one system by condition)

About the ASR task, the SetID attribute should be:

  • "tctar-run2-epps-test-enes-asr" for English to Spanish direction
  • "tctar-run2-epps-test-esen-asr" for Spanish to English direction
  • "tctar-run2-epps-test-zhen-asr" for Chinese to Engliish direction

Systems description:

For each experiment, a one page system description must be provided describing the data used, the approaches (algorithms), the configuration, the processing time, etc. The document should also contain references. The file should be named as "<SysID>.txt" .

Submission:

Submissions must be sent by email at the following address: hamon@elda.org

with the subject: "[TC-STAR] Submission <SysID>"

and with the archived files in attachment.
The deadline is Wednesday 07th of February, 23h59 CET. (5h59 pm for Pittsburgh and Yorktown)
A return receipt will be sent within 24 hours.

Reference files

Direction Input Files Validation
Ref1 Ref2
En-->Es (EPPS) Final Text Edition

Spanish reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)

OK OK
Verbatim / ASR
Spanish reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
OK OK
         
Es-->En (EPPS + PARL) Final Text Edition
English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
OK OK
Verbatim / ASR
English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
OK OK
EPPS Final Text Edition
English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
OK OK
Verbatim / ASR
English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
OK OK
PARL Final Text Edition
English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
OK OK
Verbatim / ASR
English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations)
OK OK
         
Zh-->En (VoA) Verbatim / ASR
English reference translations set (NIST MT format, utf-8 encoded, 2 reference translations) (updated 2007/03/21)
OK OK

Back to Top

Results

Preliminary results are available: