CLARET is a national research training programme in Corpus Linguistics running in the UK. It is open to doctoral students whose work involves the investigation of large amounts of electronically-stored language data. The programme aims to provide participants with:
- a thorough understanding of the conceptualisation of the field;
- knowledge and practical experience of different corpus linguistic research methodologies;
- an increased awareness of the types of insights into language theory and use that can be gained from corpus data.
The programme consists of three one-day workshops given by leading researchers in the field. The following areas are covered:
Conceptualisation of the field
- The history of corpus linguistics: how different approaches have arisen
- Operationalising research constructs: what can and cannot be found from a corpus
Methodology
- Corpus design, construction, collection, preparation, encoding
- Types of corpora: spoken, written, multi-modal, parallel, or diachronic
- Modes of corpus annotation, e.g. syntactic, semantic, discoursal, generic
- Corpus tools and software; practical "bring your own data" sessions
- Computational approaches
- Statistical approaches
Insights into linguistic theory and language use from corpus data
- Languages other than English
- Translation
- Language variation
- Historical language
- Lexicography
- Text characteristics: e.g. keywords and stylistics
- Genre/register: e.g. media, academic discourse, conversation
- Phraseology: e.g. patterns, multi-word expressions, formulaic language
- Learner corpora and language teaching
Workshop topics 2010 (dates to be confirmed):
- Lancaster: Corpus Compilation and Annotation
- Birmingham: Applications of Corpus Linguistics
- Nottingham: Spoken Corpus Analysis
Each CLARET workshop is limited to 30 participants. Applications for the programme will open in October 2009.
Workshop convenors:
CLARET is a collaboration between the Universities of Birmingham, Lancaster, Liverpool, Nottingham, and Reading. The 2007-2008 programme was funded by award 07/01/N under the Collaborative Research Training Scheme of the Arts and Humanities Research Council.