Using Corpora in English Language Teaching

Hima Rawal

English language teachers throughout the world are always in search of a theory or method of language teaching that helps them resolve all the language teaching problems they face. However, there has never been such a method which can do so because of the varied nature of language teaching situations, unavailability of resources, issues about the relevance and applicability of a method in all contexts. Experts in the ELT field try to come up with some tools that can enhance language teaching to some extent. Corpus based language teaching is one of those convenient ways language teachers have been using because this presents an opportunity to teach authentic and contextualized language usage as a readymade tool. In this post, I present a brief introduction to some of the most prevalent corpora in the field of ELT.

Corpus is a collection of natural data from several different fields from which we can draw the materials for teaching, conducting research and so on. It is “a large, principled collection of naturally occurring texts (written or spoken) stored electronically” Reppen (2010, p. 2). Naturally occurring text means language from “actual language situations, such as friends chatting, meeting, letters, classroom assignments, and books, rather than from surveys, questionnaires, or just made-up language’ (p. 2). It includes both qualitative and quantitative data to draw from.

The most widely used corpus is COCA (Corpus of Contemporary American English). It is an online and searchable corpus consisting of 450+ million American English words and is arranged by different fields and registers. We can search the words from different disciplines, compare words, and find out collocations. The words can be searched in terms of time frame, frequency, relevance, alphabetic order and so on. It can be accessed through this link click and also click here.

Let’s look at some of the examples of how we could use COCA. Once we enter the site, we can see four options of display>>list>>chart>> KWIC (Key Work in Context), and>> compare.

If we choose the list option, type the word we are looking for (e.g. proficiency) and it will show us all the contexts in which the word has been found. The contexts will be exhibited from five different sections: spoken, magazine, fiction, newspaper, and academic language. Since the corpus will show thousands of examples of the word in all the contexts in which it appears, we can limit our search by selecting a specific time frame or a specific area, for example, how the word has been used in the academic field between 2005 and 2009. We can also find out the word with which it collocates the most by finding the words that mostly precede and/or follow it.

If we choose the option ‘compare’ and type two words that we want to compare (e.g. proficiency and achievement), the corpus will exhibit both the words appearing in different contexts from which we can draw a conclusion. Likewise, if we search a word (e. g. validity) through KWIC, we can see the contexts in which it appears (e.g. construct validity, discriminant validity, face validity, predictive validity, convergent validity, concurrent validity, diagnostic validity, consequential validity and so on). These combinations will appear in and/or across sentences.

Including corpus data in textbooks is relatively a new concept; however, we are familiar with the concept in the form of corpus-based ESL and EFL dictionaries like Cambridge Dictionary of American English, Longman’s Dictionary of Contemporary English, Cambridge Academic Content Dictionary, etc. Examples of corpus based textbooks are Basic Vocabulary in Use by McCarthy and O’Dell (2010) and Touchstone by McCarthy, McCarten and Sandiford (2004). Basically, corpora provide ready resources for teachers. They are natural and authentic. They can be used for language learning, teaching and testing purposes. They can also be used for research purposes. Language textbook writers can use the data from corpora to include the teaching materials in the textbooks.

The word lists from the corpora can serve different functions: finding words in terms of frequency; finding content vs function words; finding related word forms (abandoned, abandonment); examining the role of prefixes and suffixes, finding the collocation of words (Reppen, 2010, p. 8) and so on. Some words can have different grammatical roles. The corpora provide us with information about those grammatical roles, the parts of speech and grammatical categories of the words as well. We can also find KWIC (key word in context) through which we get the information about the context in which a particular word is used.

One of the widely used applications of a corpus is to teach academic vocabulary to learners of English as a second or a foreign language. The learners in a particular field need to be familiar with the highly frequent academic words in their field. Teachers can use corpus such as Coxhead’s (2000) Academic Word List (AWL). It is a compilation of academic words consisting of 3.3 million words representing 570 word families from different genres. Within this corpus, we can search through different subcorpora since the collection is from different academic disciplines. By doing so, one can find out the most frequent academic words used in a particular genre and teach them to the learners to equip them to raise their level of comprehension and production in the respective genre. For example, one of the most frequent academic words found in the list is ‘analyse’ and this word appears along with all the related words such as “analysed, analyser, analysers, analyses, analyzing, analysis, analyst, analysts, analytic, analytical, analytically, analyzed.”

However, the problem with AWL is that it just provides the list of frequent words in an academic field and not the context in which they appear. Similarly, it is self-evident that learning a language also includes formulaic expressions to a great extent. On the basis of corpus research, Martinez and Schmitt (2012) have produced a PHRASal Expressions List (PHRASE List), which consists of 505 most frequently used phrasal expressions functioning as formulaic language. If teachers could select from and teach the expressions in the list, it can help English language learners comprehend naturally occurring conversations and texts.

Another very useful corpus site is Michigan Corpus Linguistics which links the users to different corpora and can be accessed through Two of the valuable corpus sites it links the users to are MICASE (Michigan Corpus of Academic Spoken English) and MICUSP (Michigan Corpus of Upper Level Student Papers). MICASE is a free and searchable corpus site which is very helpful for teaching and carrying out research on academic spoken language. MICUSP is a site where we can find papers from different disciplines. We can search the papers focusing on different genres, different types of writing (e.g. argumentative, creative writing, critique/evaluation, proposal, reports, research paper, response paper) or even different parts of writing (e.g. abstract, introduction, literature review, methodology, conclusion, citation, etc). Along with these two sites, Michigan Corpus Linguistics also includes a corpus of conference presentations.

Similarly, Time Corpus ( is another useful site which is the online corpus of Time Magazine and helps us see how language changes over time. There are three other very useful and user-friendly corpus based concordancing programs: AntConc, MonoConc, and Wordsmith. These programs help us find word frequency lists, concordances, key words and so on. AntConc( and Wordsmith ( are free programs while MonoConc ( is an affordable one.

The use from the websites in most of the corpora is free. The teachers can use them for: selecting and teaching academic words frequently found in authentic use in both written and spoken modes; using the contexts to help learners induce the real application of the English language. Corpora like MICUSP also enhance teacher professional development by providing teachers with the collection of conference presentation samples and valuable guidelines to develop different forms of writing. The data in the corpus can be utilized in devising research tools as well. Therefore, I suggest that English language teachers, textbook writers and researchers use some of these corpus sites, play with them and invest some time to see what small changes can be brought.


Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213-238.
Martinez, R. and Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33(3), 299-320.
Reppen, R. (2010). Using corpora in the language classroom. Cambridge: CUP.


11Hima Rawal is currently a Fulbright Scholar doing her masters in TESOL at Michigan State University, Michigan, USA. She is a lecturer at the Department of English Education, Central Campus, T.U. She is a life member of NELTA and editor of the Journal of NELTA.


Leave a Reply

Your email address will not be published. Required fields are marked *