Sampling
The sampling distribution should be as follows:
INFORMAL 150,000 words
(Long sample 4,500 (L) ; short sample 1,500 (S); collections of very short dialogues (M)*
Private / Family context: -public - partially scripted 113,000 |
Public context: + public - partially scripted 37,000 |
-public + partially scripted |
|
Monologues 33,000 |
Dialogues/Conversations 80,000 ** |
Monologues 6,000 |
Dialogues/Conversations 31,000 |
* up to 7500 words collections of very short dialogue in public context (where possible)
**At least 23.000 conversations with more then two participants
10 long texts and 64 short sample (or more, accordingly with the possible presence of some very short dialogue collections in the corpus ) distributed as much as possible proportionally on the four fields.
FORMAL 150,000 words
Formal in natural context + public + scripted or partially scripted 65,000 (2 or 3 sample for each gender of 3000 words average) |
Media + public + scripted or partially scripted 60,000 (2 or 3 sample for each gender of 3000 words average) |
Telephone
25,000 text length not defined in the decisions (suggestion: 1500 words upper limit, no bottom limit) |
political speech |
news (small sample) |
private conversation: |
political debate |
weather forecast (small sample) |
phone to call services |
preaching |
interviews |
man-machine interaction |
teaching |
reportage |
|
professional explanation |
scientific press |
|
conference |
sport |
|
business |
talk shows political debate |
|
law (through media) |
talk shows thematic discussions |
|
talk shows culture |
||
talk shows science |