Search the ELRA catalogue     

> Projects > Completed projects  >  Database 2, 3 & 4
|

1 - AURORA Project Database 2.0 (AURORA/CD0002)

The Aurora project is releasing a revised version of the Noisy TI digits database to follow on the work of ETSI. This CD set is a replacement for the previous set (version 1.0 consisted of 2 CDs while version 2.0 now consists of 4 CDs) .

This database is intended for the evaluation of algorithms for front-end feature extraction algorithms in background noise but may also be used more widely by speech researchers to evaluate and compare the performance of noise robust speech recognition algorithms.

Compared to version 1.0 the changes are as follows :

-  The files are restored to the energy level of the original speech in the TI digits database.
-  One of the noise types added to the speech has been changed (the babble one)
-  There is an additional test sets where the noises are mismatched to those used in the training set.
-  There is a convolutional distortion test.
-  There is a clean training set
-  The CD ROM will be used for the next round of ETSI Aurora standards evaluation.

Two original copies of the contract (postscript | rtf) must be sent to ELDA. To be valid these contracts must be initialled and signed. The user should annex to the contract the proof that he obtained the right to use the TI digits from LDC (ref. LDC93S10). This may be a signed licence agreement or a proof of membership payment for 1993.

The price for this database is : EUR 250

2 - AURORA Project Database - Subset of SpeechDat-Car Finnish database (AURORA/CD0003-01)

This database is a subset of the SpeechDat-Car database in Finnish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Finnish digits spoken in the following driving conditions inside a car :

-  0 km/hr with the car engine on
-  40-60 km/hr with the car windows closed
-  40-60 km/hr with the car windows open
-  100-120km/hr with no music in the background
-  100-120km/hr with music in the background

The database also contains the software needed to run simulations using the Entropic’s HTK, which has been adopted as the "standard" HMM recogniser for the Aurora standard evaluation.

Two original copies of the contract (postscript | rtf) must be sent to ELDA.

Price for research use by academic organisations : EUR 200
Price for research use by commercial organisations : EUR 1000

3 - AURORA Project Database - Subset of SpeechDat-Car Spanish database (AURORA/CD0003-02)

The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :

-  ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm
-  ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in Spanish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Spanish digits spoken in the following noise and driving conditions inside a car :

-  Quiet environment : Stop motor running.
-  Low noise : Town traffic + low speed rough road.
-  High noise : High speed good road.

Two original copies of the contract (postscript | rtf) must be sent to ELDA.

Price for research use by academic organisations : EUR 200
Price for research use by commercial organisations : EUR 1000

4 - AURORA Project Database - Subset of SpeechDat-Car German database (AURORA/CD0003-03)

The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :

-  ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm
-  ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in the following noise and driving conditions inside a car :

-  High speed good road
-  Low speed rough road
-  Stopped with motor running
-  Town traffic

Two original copies of the contract (postscript | rtf) must be sent to ELDA.

Price for research use by academic organisations : EUR 200
Price for research use by commercial organisations : EUR 1000

5 - AURORA Project Database - Subset of SpeechDat-Car Danish database (AURORA/CD0003-04)

The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :

-  ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm
-  ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in Danish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Danish digits spoken in the following noise and driving conditions inside a car :

-  High speed good road
-  Low speed rough road
-  Stopped with motor running
-  Town traffic

Two original copies of the contract (postscript | rtf) must be sent to ELDA.

Price for research use by academic organisations : EUR 200
Price for research use by commercial organisations : EUR 1000

6 - AURORA Project Database - Subset of SpeechDat-Car Italian database (AURORA/CD0003-05)

The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :

-  ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm
-  ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in Danish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Danish digits spoken in the following noise and driving conditions inside a car :

-  High speed good road
-  Low speed rough road
-  Stopped with motor running
-  Town traffic

Two original copies of the contract (postscript | rtf) must be sent to ELDA.

Price for research use only : EUR 1000

7 - Aurora 4a

The Aurora project is now releasing a number of list files for performing the training and testing on the Wall Street Journal (WSJ0) data at two sampling rates -8 kHz and 16 kHz. The Aurora 4a database is based on the WSJ0 with artificial addition of noise over a range of signal to noise ratios. It contains both clean and multicondition training sets and 14 evaluation sets with different noise types and microphones.

Price for research use by academic organisations : EUR 250
Price for research use by commercial organisations : EUR 1000

8 - Aurora 4b

An additional database has been released. It contains noisy versions of the Nov’92 WSJO development set.

Price for research use only : EUR 250




|