Last week I presented a paper called ‘Introducing the Cambridge Corpus of Dutch English: Methodological insights and first results’ at the 17th Conference of the International Association for World Englishes at Monash University, Melbourne (Australia). Here’s the abstract:

Corpora are increasingly being built and used to examine varieties of WEs, from different L1 varieties to Outer Circle varieties like Indian and Singaporean English. Fewer focus on Expanding Circle Englishes, and those that do usually take an error-based SLA perspective. The Dutch component of the International Corpus of Learner English (Granger, 2002), for example, includes only undergraduate essays, by definition precluding the English used daily by countless Dutch professionals and academics. Thus no corpus yet allows for insight into the wide-ranging, educated use of English in the Netherlands from a WEs perspective.

The Corpus of Dutch English that is currently being built fills this empirical gap. With 200 texts and text extracts of 2000 words each from different academic and business genres (i.e. 400,000 words in total), in size and structure it is modelled loosely on the written component of the regional ICE corpora. This presentation explores the implications of this design for the positioning of the corpus (as ICE currently only targets ENL and ESL varieties) and the issues surrounding description of varieties traditionally seen as belonging to the Expanding Circle.

The presentation also discusses the results of preliminary lexical analyses, particularly in terms of semantic modification (narrowing, widening, grammatical shift, etc.) and loan translation. The latter includes numerous examples of false friends, or what Hülmbauer (2007) refers to as ‘true friends’, where the L1 form suggests an English word which traditionally has a different meaning, e.g. the Dutch paragraaf becomes ‘paragraph’, with the new meaning ‘section’.

The corpus will eventually be made accessible and searchable along parameters like age, sex, region, occupation and education. Given its comparability with ICE and other corpora, it will be of use to WEs researchers as well as ELT practitioners.

Ready-made holes in our socks

Mini-column ‘Alison in Wonderland’, published in the Observant, Maastricht

Last week I went to a talk by Richard Dawkins, evolutionary biologist and God of all atheists. During question time, a philosophy student asked what Dawkins makes of the ideas of Descartes – what can we really know as true, without any doubt? – and Bertrand Russell – how do we know that we weren’t put on the Earth five minutes ago, with fossils in place, pre-stored memories and ready-made holes in our socks? Dawkins bit back his usual retort (that we also can’t disprove the existence of the flying spaghetti monster), and said he simply prefers to believe in science. For example, if you build a plane using technology based on scientific principles, it does actually get you across the Atlantic. “To quote a t-shirt,” he said, “Science – it works, bitches”.

The negative utility correlation

Mini-column ‘Alison in Wonderland’, published in the Observant, Maastricht

People are getting smarter, and this is irritating. You used to be able to make smug comments like “You know, I read somewhere that [insert wildly inaccurate fact here: Geert Wilders was raised in a madrasa, Nederland is an Old Saxon word meaning ‘poor of weather but stellar with cheese’, etc.]”. Now, people want to know your source, and God forbid it be CNN, Wikipedia or your hairdresser. Equally irritating, many fields of study these days insist stubbornly on hard data. Quantitative evidence. Statistical truths. This is unfortunate, because it means countless social sciences students like me have to sit through insufferable lectures on Quantitative Methods or – what I’m ignoring at this very moment as I write – Mathematical Linguistics, which will no doubt take me 27800,500,000 hours to understand and turn out to be negatively correlated with usefulness.

Building the corpus proper

Now that data collection is complete, it’s time to get down to the fun stuff. But first, just to clear something up: people often ask me what a ‘corpus’ actually is. Good question. I like the definition provided by Gilquin & Gries in their 2009 article: to summarise their description, a corpus is:

– machine readable
– representative, meaning that it contains data for each part of the variety/register/genre it is supposed to represent
– balanced, meaning that the size of its constituent parts are proportional to the parts of the variety/register/genre the corpus is supposed to represent (this being … er … a ‘theoretical ideal’ given the absence of reliable data on the proportional makeup of genres etc. in any given variety/language).

So: a corpus is a collection of texts that is stored and can be analysed electronically, and designed so as to promote generalisability of the findings to a wider population.

Prima. So now that we have collected the texts (in a carefully balanced, representative way, naturally), the next step is to convert them into the appropriate electronic form. For reasons I won’t go into here, that form is currently XML. To do this, we’re using a software development platform called Eclipse. You first need to set up a DTD (document type definition) file, which is a sort of template that sets up the rules for how you will present the data. From this file you then generate your (hundreds of!) XML files. In each file, you can then insert not just the text itself, but also all the metadata relating to the text: that is, the information that everyone who contributed a text indicated in the questionnaire. In the image below (excuse the quality; no time to fix it at present), you can see that this data is simply entered in the right-hand column.

In the source code, if you like this sort of thing, it looks like this:

These metadata fields are important because they will allow us, later, to search the corpus according to different properties; for example, you might want to look only at texts written by women, or only women with a high education level, or only acrobats aged 70+ with red hair from the south of the country who only eat cheese on Tuesdays, etc.

So this, among a million other things, is what I’m up to at the moment. Along with textual markup, but more on that later …

Copulating beetles

Mini-column ‘Alison in Wonderland’, published in the Observant, Maastricht

Last month, Maastricht’s own Professor Herman Kingma was granted an Ig Nobel Prize for his work on why discus throwers get dizzy. The Igs – an alternative to the real Nobel Prizes – are awarded annually at Harvard University to celebrate unusual (if not downright wacky) research. Further Igs were awarded for research on why Australian beetles try to copulate with discarded beer bottles, and how needing to urinate influences our decision-making processes. The mayor of Vilnius, Lithuania, also received recognition for demonstrating that the problem of illegally parked luxury cars can be solved by running them over with an armoured tank. As for me: I can barely add up my expenses at the end of each month, so there’ll be no Nobel Prize in Mathematics for me. But here’s hoping there’s an Ig in Dunglish research.

Mr Wilders’s heartland

Mini-column ‘Alison in Wonderland’, published in the Observant, Maastricht

The Economist recently published an article called ‘Return to Maastricht: Twenty years on, the euro’s birthplace has become suspicious of Europe’. First, it highlights Dutch Euroscepticism: “The Netherlands has moved from a cosy pro-EU consensus to a sceptical, even antagonistic stance”. And second, it describes Dutch politics as “dysfunctional”, with a liberal-led coalition government that is “propped up” by Geert Wilders’s right-wing PVV party. The article goes on to say that, naturally, all this is slightly awkward for Maastricht. Closer to Brussels and Cologne than to Amsterdam, Maastricht “thinks of itself as a most European city”. The university in particular has a major European Studies curriculum, teaches primarily in English, and half its students are non-Dutch. “And yet”, the article points out, “the town, and its province of Limburg, are Mr Wilders’s heartland.” Touché.

I will not be silent

Mini-column ‘Alison in Wonderland’, published in the Observant, Maastricht

I am a slogan t-shirt geek. I like them nerdy, like one belonging to my friend, a chemist, which lists in alphabetical order the elements of a human body (65% oxygen, fyi). I like them political, as in “That’s ok, I wasn’t using my civil liberties anyway”. I’m a particular fan of the feminist kind, with a picture of [insert your choice of Germaine Greer, Simone de Beauvoir or Naomi Wolf] saying “I will not be silent”. But my favourite is my own t-shirt with a picture of a downcast monkey and the slogan “Intelligent design makes my monkey sad”. Though it did get me into a heated escapade with a busload of god-fearing American tourists at the supermarket, from which I only narrowly escaped …