The UAM corpus of spoken Spanish
(under development since 1991)
  1. 1,100,000 words and an electronic support (about 110 hours of recording) 
  2. Speaking 2. Spontaneity 3. Adequacy 4. Representative nature 5. Authenticity 6. Standard
  3. Texts are recorded in ASCI code in n SGML files according to the TEI format (Text Encoding Initiative).
 
 
Textual typology. Number of words Percentage
Administrative and political 61,200  5,6%
Scientific 36,600  3,3%
Conversation or familiar 269,500  24,5%
Educational 58,300  5,3%
Humanistic 61,200  5,6%
Instructions (megaphone) 6,600  0,6%
Juridical 35,200  3,2%
Ludic (plays with a prize, etc.) 61,200  5,6%
Journalistic:
Debates
Sport
Documentaries
Interviews

93,500
58,300
28,600
171,200 

8,5%
5,3%
2,6%
15,6% 

Information services 72,600  6,6%
Advertising 30,800  2,8%
Religious 12,100  1,1%
Technical 43,100  3,9%
APPROXIMATE TOTAL 1,100,000  100%