|
The UAM corpus of spoken
Spanish
(under development since 1991) |
-
1,100,000 words and an electronic support
(about 110 hours of recording)
-
Speaking 2. Spontaneity 3. Adequacy
4. Representative nature 5. Authenticity 6. Standard
-
Texts are recorded in ASCI code in n
SGML files according to the TEI format (Text Encoding Initiative).
Textual
typology. |
Number
of words |
Percentage |
Administrative
and political |
61,200 |
5,6% |
Scientific |
36,600 |
3,3% |
Conversation
or familiar |
269,500 |
24,5% |
Educational |
58,300 |
5,3% |
Humanistic |
61,200 |
5,6% |
Instructions
(megaphone) |
6,600 |
0,6% |
Juridical |
35,200 |
3,2% |
Ludic
(plays with a prize, etc.) |
61,200 |
5,6% |
Journalistic:
Debates
Sport
Documentaries
Interviews |
93,500
58,300
28,600
171,200 |
8,5%
5,3%
2,6%
15,6% |
Information
services |
72,600 |
6,6% |
Advertising |
30,800 |
2,8% |
Religious |
12,100 |
1,1% |
Technical |
43,100 |
3,9% |
APPROXIMATE
TOTAL |
1,100,000 |
100% |
|
|