Goed nieuws

The idea was to complete data collection by the end of September, and happily, we’re on track! I’d like to give precise figures here, but I’m still hoping to get a few more texts this week to fill a couple of final gaps and to replace a few less-than-ideal texts (i.e. those written by people who’ve spent longer than 10 years outside NL, etc.). So in approximate figures only, below is a description of the present data collection. These categories, I think I’ve mentioned before, are based on the structure established by the International Corpus of English (ICE) project. Using this structure means that it will be possible to compare my ‘Dutch English’ findings with different national varieties of English (e.g. Australian vs Jamaican vs Indian, etc.). So without further ado, we have:

  • 60,000 words of correspondence, divided into social and business categories. This mostly concerns emails, but for the first time in an ICE-based project it also includes facebook messages
  • 20,000 words of ‘apprentice academic writing’ (i.e. by graduate students under untimed circumstances; master’s theses mostly!)
  • 80,000 words of academic writing, divided into four categories of around 20,000 words each: humanities, social sciences, natural sciences and technology. These are mostly extracts from PhD dissertations, journal articles, book chapters and monographs.
  • 80,000 words of popular writing, divided into the same four categories as the academic writing. This is a bit of a catch-all category, but it mostly includes magazine articles, blog posts, webpages, etc.
  • 40,000 words of reportage, i.e. press news and feature stories. This was a difficult category to collect, because when Dutch journalists do write in English these texts are usually edited by a native speaker, such as when the NRC had its English section. To avoid this I mainly targeted journalists from smaller publications and foreign correspondents, who may be more likely to write in English.
  • 20,000 words of persuasive writing, i.e. press editorials. Again, a difficult category to collect because there were few to be found. So this section also includes other forms of persuasive writing, like advertorials, and – interestingly, because it hasn’t been done before! – journalists’ blogs. (I get excited about this because when ICE was originally conceived, in the early 1990s, blogs didn’t exist. So how the design needs to be changed to reflect social and technological developments is worthy of reflection. I include journalists’ blogs as a form of editorial given that they often comment on current affairs, but in a way that includes more opinion than e.g. a straight press news report.)
  • 40,000 words of instructional writing, which is divided into administrative/regulatory texts (e.g. contracts, course guides) and skills/hobbies (which out of necessity mostly came from tech blogs)
  • 40,000 words of creative writing, especially short fiction and – again, excitingly – fan fiction (another new element vis-à-vis other ICE-based projects, also stemming from tech developments!).

This comes to a total of about 380,000 words. To reach the full target of 400,000 words, I intend to use 20,000 words from the International Corpus of Learner English (ICLE). This is the sister project to ICE, which has already been conducted by researchers in Nijmegen. The texts in ICLE are deliberately conceived as being written by ‘learners’, whereas the contributors to my project may in many cases be better described as ‘users’. So including one category of ICLE texts, which are undergraduate student essays, should allow for some nice comparisons with e.g. my ‘apprentice academic’ and ‘academic’ categories above.

The basic idea in ICE is that the 400,000 word total is reached by way of 200 texts of around 2000 words each. But of course, monographs are far longer than 2000 words; an extract is thus taken from longer texts like these. Conversely, few of us write e.g. social emails as long as 2000 words. For this reason, a number of texts comprise several subtexts, which effectively means that the true number of contributors is closer to 300. Soon – once I make some pretty tables based on the numbers – I will try to post here some more specific stats, e.g. percentage of men/women, demographic spread (i.e. town/province of birth and town/province in which the contributors were raised), age groupings etc.

In short, I’m excited that things are taking shape! Over the next few months I’ll be converting all these texts into XML and doing all sorts of fiddly housekeeping exercises in preparation for the fun stuff – the actual analyses. Stay tuned!


Introducing the Corpus of Dutch English

Published in English Today, Vol. 27, No. 3: pp. 10-14, Printed in the United Kingdom © 2011 Cambridge University Press. doi: http://dx.doi.org/10.1017/S0266078411000319

What it is, and where it does – and doesn’t – belong

Data collection is now underway for a new corpus of ‘Dutch English’ within the broad scope of World Englishes. This news is often met with suspicion from ELT practitioners, SLA researchers and the average person on the street, Dutch and English L1s alike. How could a Dunglish-style interlanguage arising from ‘imperfect learning’ be cast as legitimate regional variation? Yet this has been a fruitful field for many decades across Asia and Africa, and researchers in Europe are starting to follow suit (see e.g. Erling, 2004; Erling & Bartlett, 2006 for the case of Germany). With English being used for intra-national purposes on the continent all the more frequently, especially in higher education, it is not hard to find examples of regionally flavoured English being more appropriate than any native ‘norm’.

Class is key

Mini-column ‘Alison in Wonderland’, published the Observant, Maastricht

I’ve talked about my celebrity academic crushes before. Here’s how not to leave a lasting, intelligent impression. Ben Goldacre writes a column in the Guardian called Bad Science (and published a book with the same name). Each week, he targets topical examples of science gone bad (e.g. all things homeopathy, or those ‘TV doctors’ who get their qualifications from a cereal box). Recently I met the man himself after a comedy evening by scholars aiming to popularise science. I dribbled something like “Omg, could I have a photo with you?” We posed awkwardly as my friend fumbled around with his camera phone, then, when he asked politely “Are you Australian?”, I blushed and giggled and scurried away (a ‘yes’ would have sufficed).

Scholastic-celebrity crushes

Mini-column ‘Alison in Wonderland’, published in the Observant, Maastricht

Don’t give me Brad Pitt or Matt Damon (well, ok, maybe Matt Damon). All my celebrity crushes are on big-name academics. A friend and I once saw Richard Dawkins on the street and started squealing like schoolgirls. Another friend is now heading to CERN in Geneva for a year’s worth of PhD research, and I am green with envy. Why? Professor Brian Cox, that’s why – the world’s first rock-star physicist. The keyboard player from the 1990s pop group D:Ream, Cox is now a particle physicist at CERN and presenter of the BBC television programme Wonders of the Solar System. If you’re into science nerds with swoopy hair and shiny teeth getting excited about the temperature on Saturn, this is your show.