КОРПУСНАЯ ЛИНГВИСТИКА-2008

Конференция Темы Организационный комитет Контакты English/Русский
 
 
Конференция
Темы
Организационный комитет
Контакты
English/Русский 
 
Arshavskaya E.

Automatic Profiling of Learner Corpora

The study undertakes a comparison of non-native speakers' (NNS) written language to the academic language of English native speakers (NS). Previous research (Granger & Rayson 1998) has shown that EFL learners of French background overuse items of spoken register and underuse items of academic vocabulary in their academic written samples (International Corpus of Learner English (ICLE) database). This study was designed to find out whether ESL and EFL students of various L1 backgrounds also lack knowledge of formal academic vocabulary and instead opt for informal doublets (e.g., in spite of  vs.  despite, till vs.  until). In the first part of the study, MELD (Montclair Electronic Language Database) served as the NNS corpus and BAWE (corpus of British Academic Written English) was the NS corpus. Both were tagged with the Tree-tagger (Schmid 1994) and the Penn Treebank tagset (Marcus et al. 1993). Part-of-speech (POS) profiles (frequency lists) in BAWE database and MELD were then compared. Differences in POS's use in the NS and the NNS corpora were statistically significant (one sample t-test). The comparison of the use of the POS in the two corpora showed that upper-level ESL students predominantly use items of spoken language and rarely make use of academic vocabulary. In the second part of the study, POS profiles of more advanced learners (BAWE ESL corpus) and the same NS corpus (BAWE) were compared. In this case, the POS's had a similar distribution across the two corpora and it was statistically significant (X2 test). Thus, the first study re-confirms the speech-like nature of learner writing of upper-level ESL students (Granger & Rayson 1998) of different linguistic backgrounds. Since the learners whose writing samples were analyzed come from a number of different backgrounds, this finding (i.e., the speech-like nature of L2 learners' writing) cannot be attributed to L1 transfer. Upper-level ESL/EFL learners of various L1-s lack knowledge and acquaintance with academic vocabulary. However, with longer exposure to L2, writing skills of L2 learners may approach those of NS speakers (the second part of this study).

  1. Granger, S. and Rayson, P. (1998). Automatic profiling of learner texts. In S. Granger (Ed.), Learner English on Computer. New York: Longman, pp. 119-131.
    EFL (English as a foreign language) stands for learning and teaching English in countries (e.g., Japan, Russia) where English is not a major language of commerce and education and which students do not usually hear outside their classrooms. ESL (English as a second language) stands for learning and teaching English in countries (e.g., the US, the UK, India) where English is a major language of education and commerce and which students often hear outside their classrooms. Brown, D. (2001). Teaching by Principles: an interactive approach to language pedagogy. White Plains, NY: Longman, p. 3.
  2. Fitzpatrick, E. and Seegmiller, M. S. Montclair Electronic Language Database (MELD).
  3. Nesi, H., Gardner S., Thompson, P., and Wickens, P. The British Academic Written English (BAWE) corpus.
  4. Schmid, H. (September, 1994). Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of International Conference on New Methods in Language Processing. Manchester, UK, pp. 44-49.
  5. Marcus, M., Santorini, B., and Marcinkiewicz, M. (1993). Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics (Special Issue on Using Large Corpora)
  6. , 19(2), pp. 313-330.

Назад