A corpus is a searchable database of language samples for linguistic research. A corpus may be based on written or spoken language. Some corpora are tagged or annotated by part of speech; other corpora are plain text.
Some of the most well-known corpora of American and British English. Includes COPA, COHA and British National Corpus. From Mark Davies of Brigham Young University.
Database of transcribed audio recordings of conversations with children. Samples are in English and in 25 other languages. Transcriptions and media can be downloaded to CLAN software. Data is transcribed in CHAT format.