|
The UAM corpus of spoken
Spanish
(under development since 1991) |
-
1,100,000 words and an electronic support
(about 110 hours of recording)
-
Speaking 2. Spontaneity 3. Adequacy
4. Representative nature 5. Authenticity 6. Standard
-
Texts are recorded in ASCI code in n
SGML files according to the TEI format (Text Encoding Initiative).
| Textual
typology. |
Number
of words |
Percentage |
| Administrative
and political |
61,200 |
5,6% |
| Scientific |
36,600 |
3,3% |
| Conversation
or familiar |
269,500 |
24,5% |
| Educational |
58,300 |
5,3% |
| Humanistic |
61,200 |
5,6% |
| Instructions
(megaphone) |
6,600 |
0,6% |
| Juridical |
35,200 |
3,2% |
| Ludic
(plays with a prize, etc.) |
61,200 |
5,6% |
Journalistic:
Debates
Sport
Documentaries
Interviews |
93,500
58,300
28,600
171,200 |
8,5%
5,3%
2,6%
15,6% |
| Information
services |
72,600 |
6,6% |
| Advertising |
30,800 |
2,8% |
| Religious |
12,100 |
1,1% |
| Technical |
43,100 |
3,9% |
| APPROXIMATE
TOTAL |
1,100,000 |
100% |
|
|