From 100 to 200 human-computer
dialogues will be collected for each language. The ITC-Irst speech recognition
technology will be adapted in order to handle mixed initiative interactions
in several languages. After the recordings, the corpora will be available
with texts and waveform files. Log files of the interactions, containing
recognised texts and recognition grammars, will also be provided. Recordings
will be done, during the first six months of 2002, by an automatic service
available through a free telephone number.
For each language a set of tasks
will be defined according to a given semantic domain. For Italian,
the chosen domain is accessing tourism information. For this domain,
a preliminary collection has just been performed. Each caller was
given two or three tasks consisting in asking for information about:
hotel availability, available services, sports ground, localities
in general, etc... . Up to now about 200 dialogues have been collected.
|