IATEFL Poland
Computer Special Interest Group

Teaching English with Technology
A Journal for Teachers of English
ISSN 1642-1027
Vol. 2, Issue 3 (May 2002)

IATEFL PL home page

Articles  

 
 
Journal Contents

Editor's Message

Articles

Lesson Plans

Website Reviews

Software Reviews

Word from Techie

Previous Issues


Go back to:
Journal Home Page

 
 

WHAT CAN BE, BUT IS NOT (AND WHY),
IN LEARNERS' MRDS
by Wlodzimierz Sobkowiak
Adam Mickiewicz University

sobkow@amu.edu.pl

Abstract

Modern Machine-Readable Dictionaries (MRDs) offer users an unprecedented richness of content and form, and gradually oust traditional paper-based word books out of existence. Despite the breathtaking developments of hard- and software, however, popular MRDs, especially those made for learners of foreign languages, are still deficient in a number of respects. Two of these are dealt with in this paper: (a) width and flexibility of user access to the riches of lexicographic content, and (b) the degree and (artificial) "intelligence" of user modelling and customisation. It is argued that the two deficiencies are not due to any inherent technological obstacles, but rather to the conservatism of dictionary makers and users (both learners and teachers). A few examples of functionalities "which could be, but are not" are provided in a hypothetical case study of EFL student Tom and his MRD.

Introduction

Electronic dictionaries (computer dictionaries, machine-readable dictionaries – MRDs) are now commonplace in research, education, tourism and a number of other human pursuits around the globe. There are many reasons why they are gradually ousting traditional paper-based printed dictionaries from all these spheres and others: they are fast and convenient to use, they are up-to-date, they are small in terms of physical size (palmtops are portable), but large in terms of coverage, they are lavishly multimedialized with sound, photos, animations and video, they are often equipped with a suite of lexical exercises, games and frills of all kinds (e.g. personal notepads), and they are computer-based, a thrill for the novice, and a must for a guru. Additionally, some MRDs are partly authorable, mostly in that they offer some expansion facilities (but never allow actual editing of the firmware contents), and customizable in terms of some user interface options. They are also fashionable, and this factor of their popularity should not be taken lightly.

The variety of MRD types, hardware-, software- and content-wise is amazing. In terms of hardware, there are those on CD-ROMs to be used in ordinary PC workstations and notebooks; there are handheld devices, both dedicated MRDs and those integrated in a larger system, usually a downgraded implementation of Windows; and then there are the virtual Internet dictionaries, with no physical carrier whatsoever (at least from the point of view of the user). Software-wise there are MRDs for all popular platforms: Windows, Mac, Unix (and especially Linux), and the remains of DOS. The client-host architecture, where the user's machine is just a terminal for a remotely located MRD, works not only for the Internet (www mostly), but also for a variety of intranet setups. Finally, and most interestingly from our point of view, the variety of lexicographic configurations reflects, and expands, that of traditional dictionaries. There are mono-, bi- and multi-lingual MRDs; there are those for native speakers and those for learners; there are "ordinary" dictionaries providing meanings for forms (semasiology) and thesauruses, doing the opposite (onomasiology); there are general dictionaries and special-purpose ones, and the latter can have coverage limited macrostructurally (e.g. a dictionary of fishing or acronyms) or microstructurally (e.g. an etymology, a pronunciation or a picture dictionary); there are MRDs with minimal content (word-lists) and those whose lexicographic and encyclopedic richness far supersedes that of large multi-volume traditional word-books. Hartmann (2001) has a readable and fairly comprehensive overview of the available plethora of lexical reference sources, with the unavoidable English focus.

With all this variety, coverage, multimediality, user-friendliness and on-line availability, one would be excused for thinking that contemporary MRDs have already reached a summit of functionality, with virtually (pardon the pun) no improvements possible. This is certainly the picture painted by reviewers of popular MRDs in computer magazines occupying at least one shelf in newsstands the world over. The gripes they do have, if they have any, relate to the absence of this or that lexical item from the entry list, some abstruse installation problems, one or two incorrect factual references, or the quality of the onboard multimedia.

And yet, there are crucial areas in MRD design where dramatic improvements are possible and necessary for the dictionaries to reach a vastly higher level of functionality than they have been on so far. The two areas which I will briefly sketch below, using a hypothetical case study, are (1) access flexibility and (2) user customization. The treatment is brief of necessity; interested readers are referred to my book on EFL MRDs (Sobkowiak 1999), from whose chapter 3 the following material is a revised excerpt. There, I develop the concept of a Multi-Access Dictionary (MAD), in which virtually all lexicographic content is available to the user for active query, and to the system for intelligent customization to the dynamically constructed profile of the user. These ideas are based not only in good pedagogical and lexicographic theory/practice, but also flow directly from a rather uncontroversial conception of linguistic data, as seen by computational linguists, a conception which is aptly encapsulated in the following quote: "The data are multidimensional, so the computing environment must be able to attach many kinds of analysis and interpretation to a single datum. The data are highly integrated, so the computing environment must be able to store and follow associative links between related pieces of data" (Simons 1998:24; my emphasis – WS; see also http://www.sil.org/computing/routledge/simons/summary.html).

Surprising as it may sound in view of the above exhortations of MRDs, a fair proportion of the content of contemporary electronic dictionaries is treated as one-dimensional in Simons's sense, and hence its actual multidimensionality is not available to the user. The integrated nature of lexicographic data is at best seen in the option of hypertextual link from word X in the definition or example of an entry to entry X (not all MRDs offer this functionality). Thus, user's access to the wealth of the "multidimensional" and "integrated" linguistic and multimedia content of an MRD is typically highly restricted. Similarly, only the rather superficial customizing options are offered, such as, for example: (a) ignoring certain elements of the entry (micro)structure for screen display (e.g. phonetic transcription) or in full-text search (e.g. example sentences), (b) hiding certain word categories (e.g. compounds), (c) changing font size, style and colours, (d) manipulating toolbars, and the like. All these must be deliberately toggled by the user, with the system not even attempting a more intelligent approach to customization, which could capitalize on the observed exploitation by the user of the many dimensions and associative links inherent in the dictionary.

These deficiencies of MRD design are not due to any hardware shortcomings, of course. Nor is there lack of artificial intelligence, at least not the AI which could run the relatively simple user profile generator necessary to accomplish better MRD customization (see for example Bielawski & Lewand, 1991, Shapiro et al., 1992, Prat, 1994, Tarantowicz-Gasiewicz, 2000 and the references therein). The main causes why no innovative design of MRD is apparent (at least in the senses sketched above) appears to be the conservatism of dictionary makers and publishers on the one hand, and users on the other. These two types of conservatism are mutually reinforcing, of course: lexicographers claim that there is no demand for access-wise more powerful and flexible systems with built-in artificially intelligent customization; users can see no such dictionaries on the market, and are of necessity satisfied with what they have got. Realizing the hidden potential of a computer application takes a fair amount of practice, expertise and frustration with the unavailability of a useful functionality. And, as it turns out, there is very little MRD practice, at least in the Polish educational setting. In one questionnaire study only 26 out of 712 EFL students (3.6%) in all types of schools in Poland have ever used an EFL MRD (Lew, forthcoming; personal communication).

In a highly competitive market the questions of capital investment risk and return are also of paramount importance to MRD publishers, of course, even if they need not detain us here. Yet one more reason why no multi-access self-customizing MRDs are available may have to do with scarcity of (meta)lexicographic research in the field, both theoretical and empirical.

This short paper, as well as the book chapter on which it is based, is a modest attempt to suggest new areas for such research. The central part of it is a "case study" of a hypothetical Tom, a student of English as a foreign language (EFL) who is using his intelligent multi-access MRD every day. The context, in terms of some of the variables mentioned above, is thus: (a) intensive use of a learner's general mono/bilingual EFL MRD on an MS Windows platform, (b) intranet and Internet connection and full functionality, (c) educational institutional setting, further circumscribed to academic-level English philology studies. It is by looking at Tom's interaction with his dictionary that I will try to answer the question "what can be, but is not, in learners' MRDs".

1. MRD access and customization

The maximally user-customized multiple-access dictionary will require a fair amount of artificial intelligence to organize a smooth interaction between the lexical database and the user. With so many access options built into the system it would be dysfunctional to query the user every time about the desired search criteria or settings. Some of these will of course be fixed as defaults, to be changed from appropriate configuration menus. Some others must indeed be user-input every time to ensure the retrieval of just the right information at the right time. But there are areas of MRD-user interaction where the dictionary can actually dynamically adapt to the changing needs and activities of the user which will be stored in his/her user profile file. Such adaptive systems were first prophesied in the eighties (e.g. Dodd, 1989, Jonassen, Mandl, 1989, Kay, 1991), and are now being gradually introduced in hypertext access software engineering, as is evident from the growing number of books (Brusilovsky, Kobsa, Vassileva, 1998), theses (Bontcheva, 2001), periodicals (User Modeling and User-Adapted Interaction) and conferences (flexible/adaptive hypertext/hypermedia workshops and conferences; for example in May 2002, at the University of Málaga, Spain) devoted to this subject.

These developments - while being to quite a degree instigated by arguments and forces outside of the educational scene generally, and foreign language teaching and learning in particular - indirectly correspond to the contemporary learner-centred and learner-autonomy paradigms in language pedagogy. It is the learner who is supposed to formulate his/her own educational needs and preferences, who must (in collaboration with the teacher) take the burden of designing his/her own syllabus and curriculum, of selecting his/her own learning resources and materials (including dictionaries), of fixing the short- and long-term didactic aims, of settling on the preferred learning strategies, of actively searching for information, explanation and advice, of self-evaluation and post-hoc analysis (see Wenden, Rubin, 1987, Nunan, 1988, O'Malley, Chamot, 1990, Wenden, 1991, Oxford, 1993, Rubin, Thompson, 1994, Reid, 1995, Tutor, 1996, Ely, Pease-Alvarez, 1996, Naiman et al., 1996, Benson, Voller, 1997). If such is the expectation of a (good) learner, computer-assisted foreign language resources should be adapted accordingly. Providing for maximum customization with quasi-intelligent computer assistance is one method of promoting learner autonomy.

There is hardly a limit of which data can be profitably stored and manipulated in the MRD user profile file. Age, sex, proficiency in the foreign language are obvious choices. To users who only need the MRD for ad-hoc translation from L2 the system would show a "different face" of the dictionary than to those who use it as a learning resource in acquiring new vocabulary. Those who mainly need the dictionary for encoding would see it differently from those who mostly decode. The pronunciation-oriented learners would have a Phonetic-Access Dictionary (PAD; Sobkowiak, 1994, 1998, 1999) in front of them, whereas those who need a dictionary for writing in a foreign language would have one which would focus on spelling and style analysis and correction. Those users who customarily refer to one variety of the foreign language, say American English, would have this variety foregrounded across different levels of dictionary content and use: spelling, pronunciation, grammar, stylistics, examples, realia, exercise module, etc. The system would keep a running log of the different circumstances of use to 'guess' what is the currently best MRD profile to present to the user.

In a flash of foresight, Dodd (1989: 92) sketched the following customizable elements in his "personal dictionary": "Each of the various styles of definition that are stored could be to the liking of a given user or group"; the "profile would cover everything from the choice of colours used to pick out different elements displayed, to the sorts of information proffered by the machine and the order in which they were presented"; "some would want etymology, history and evolution of words; others would actively avoid this". Between 1989 and now, other elements joined the customizable user profile thanks to the developments of computer technology and programming. The following is my own vision, suited to the needs of a prototypical Tom.

2. Tom and his MAD: a case study

Tom is a first-year student of English as a foreign language in a neophilology department of a Polish university, a higher vocational school (wyzsza szkola zawodowa) or a teacher-training college. He is using the networked version of a customizable bilingual multi-access machine-readable dictionary of English for his work in preparing class assignments and in preparation for the practical English exam at the end of the semester. Most of the time he needs to look up difficult words which he finds in the assigned reading which comes from British magazines and newspapers as well as American literature readers and anthologies. From time to time he must write a narrative essay on an assigned topic. After a few sessions, Tom's user profile will start to adjust to his needs and preferences.

First, Tom is never interested in pronunciation so this aspect of lexical information is switched off. Words appear without phonetic transcription and the audio icon is hidden. Phonetic access functions (for example requesting words with a given number of syllables, or with a given stress, or containing specified sounds, or differing between British and American accents) are backgrounded, as are phonetic drills in the exercise module. The phonetic difficulty index (Sobkowiak, forthcoming), tagging each headword in the dictionary for pronunciation problems, is unplugged from the exercise module. Tom can, however, be alerted to the particularly high value of the index, if he wishes so (and sets the index threshold appropriately).

Second, as Tom is a highly advanced learner, some of the more common and "easy" senses of most lexical items are hidden or demoted to the bottom of the entry [1]. Tom is unlikely ever to look up the word-senses of words like write, like, make, water which are normally listed at the top of their entries. On the other hand, he may need senses such as: "to raise the par value of (issued capital stock) without a corresponding increase in the real value of assets" (28th sense of water in Collins) in reading the assigned Economist article, and he may need "any fluid secreted from the body, such as sweat, urine, or tears" (7th sense of water in Collins) to understand the graffito inscribed on the table where he is currently working: All is shit except water. These senses will, then, be retained.

Third, a complex syntactico-semantic network will be in place to assist Tom in his essay writing: (a) comprehensive coverage of collocations [2] (not only mistake will be listed, but also its left- and right-hand collocates: make a mistake, by mistake, serious mistake, mistakes creep in, mistakes abound, to mistake sb/sth for), (b) easy listing of words falling into particular morphosyntactic categories or "parts of speech": non-ly adverbs, pluralia tanta, hyphenated vs. spaced compounds, etc., (c) prompts of useful lexico-semantic relations (see Wordnet), such as: antonyms (mistake ---> accuracy, precision), hypernyms (mistake ---> failure, dog ---> canine) or hyponyms (mistake --> blunder, faux pas, goof, slip-up, oversight, typo).

Fourth, because Tom has often opened the British and American "culture component" of the dictionary where he consulted some vocabulary entries of strictly contemporary relevance, e.g. militant, molestation, cloning, Bin Laden, Euro, this stratum of the dictionary will henceforth be highlighted: contemporary cultural items will be preferentially linked to the properly formatted (keyword-in-context concordanced) text-corpus and on-line multimedia evidence, their multiply conditioned frequencies will be displayed, and the vocabulary exercise module will grade them as especially desirable in constructing tests and exercises (plus many other adjustments, of course).

Fifth, the advanced L2-to-L1 decoding view of the MRD will be prioritized: only the monolingual English dictionary will appear as default, with no restriction on the definition language (Sobkowiak, Kuczynski, forthcoming) or presented grammar coding (Tom has rarely looked up words from the definition or part-of-speech and subcategorization codes). When Tom requests to see the encoding Polish-English view, woda will not be there (pending his decision to change the default), but there will be cross-references to Euro from zloty, zjednoczony and waluta.

Sixth, because Tom' special preoccupation is with British journalese and American literary language (which the system discovered from a number of sessions Tom had with it), this will be the bias of the lexical frequency data when Tom requests it. The figures will be taken from British contemporary press corpora on the one hand, and from the American prose of the period Tom has mainly consulted through the built-in encyclopedia and literature reader, on the other. Unless of course the targeted corpus turned out to be too small to generate reliable frequencies for the requested lexical items, or unless Tom wanted custom-weighted frequency figures, in which case the system would act accordingly.

Seventh, as Tom has not shown a special predilection for MRD multimedia elements in the past, the picture library option is dimmed and the videos showing WTC blasts and Bin Laden's TV releases are not linked to the headword terrorism, while the animation explaining AIDS infection is not connected to HIV. The recorded (or text-to-speech synthesized) audio accompanying the animation in the multimedia view is displayed as plain text instead. All this subject to deliberate override from Tom, of course.

Eighth, because Tom has checked the advanced exercise module option, each of the words he looks up is linked to a number of appropriate exercises. For example, HIV appears in a word-formation exercise on the Latinate forms with <o>-final prefixes (immuno-) as well as in an irregular plurals exercise as a distractor (virus pluralizes regularly), and in an acronym deciphering exercise. Advanced level exercises are offered only, with little Polish involvement and no phonetics, but with an enhanced cultural content and a rich supply of mnemonic devices (imagery keywords, etc. [3], see Hulstijn 1997 for an overview), all of this according to Tom's profile.

Ninth, because Tom needs to make frequent notes about, and bookmarks to, the visited entries and MRD areas, this option is elaborated and always active: all searches take it into account, the notebook is interactively connected with Tom's favourite word-processing package and the Internet, the line drawings and sketches which Tom makes there can be converted into search keys, so that ☺ will retrieve smiley (among other hits), the word which he temporarily forgot, and whose Polish equivalent escaped him, too. Needless to say, entering search parameters such as "round and red on green background" will retrieve pictures and photographs of, among others: rose, cherry, tomato, beef patty, blood drop, ruby on green velvet, etc. (see, e.g. Blobworld or Google's image search facilities).

Tenth, if Tom ever needed on-line help, a text-oriented English-only facility would be activated, explaining the topic required in advanced English, with roughly the same amount of detail which Tom always requested from the dictionary in his past sessions with it. American English would be used for help because this is the option which Tom selected in his previous encounters with the help facility.

Eleventh, because the system is networked, Tom can access some statistics on dictionary use in his school as a whole (and beyond [4]). Other users' preferred choices and shortcuts can be accessed, so that he can indirectly learn from his colleagues how to put the dictionary to even better use. Indeed, the system itself will be able to gain from a variety of user profiles. If it discovers that 87% of all student users in the school prefer to have phonetic transcription placed after the English equivalent in a Polish-English encoding view of the dictionary, it will duly be placed there (this is the actual proportion obtained in my questionnaire study of 645 students; see Sobkowiak 1999, insert after p. 148). If it takes most users longer to locate entry senses when they are arranged by their etymology/chronology (as they are in OED, for example), the system will reorder them by frequency of occurrence, or by whichever order which has proved empirically to ensure fastest lookup (with the current population of users).

Twelfth...

3. Why not?

Such MRDs do not exist yet. But the direction in which electronic lexicography is moving is exactly this: towards more content, more flexibility and customization, more user-friendliness, better access and more connectivity with other sources of knowledge, lexicographic and beyond.

If there is anything worrying in this generally optimistic picture, it is the tempo at which the changes are taking place. In his "after-cocktail fantasies" of 1984, David Crystal predicted voice-operated multimedia remote-access lexicopedias with some of the functionalities which are now standard in EFL MRDs, and some which are still not. His "ideal users in their ideal lexicographical world" would access their lexical database which

"is now available in electronic form, which their terminal allows them to access, and to which they can plug in one of several lexicographical computer games. If they wish to look something up, they have the option of referring to their lexicopaedias, or addressing the data base direct through their voice-activated terminal. They know their access code words. [...] 'Meaning', 'Pronunciation', 'Usage', 'History', 'Picture', 'Spelling', 'Idioms', or whatever, as required — the information to be made available in sound, on screen or in print, depending on which mode selection they make" (Crystal 1986:79; my hyperlink -- WS).

Similar prophesies were made by many other lexicographers and media specialists at that time, which saw the beginning of the global computer network connectivity (e.g. McArthur, 1986: 174, 179). Ten years later, and two computer generations further down the line, in a skeptically titled paper of 1994, "Have we wasted our time?", Nancy Ide and Jean Vé ronis, two of the leading MRD lexicographers, prophesied that:

"future dictionaries will likely be very similar to linguistic workstations, and provide many of the same facilities [...] Computerization of dictionary-making at the semantic level could involve things such as the creation of explicit semantic links (hypernym, part, colour etc.) between words or entries in electronic (hypertextual) dictionaries with sophisticated navigation and query capabilities. Information could be linked to images and sounds, and displayed in template form; or ultimately, we could achieve real-time/on-line generation in natural language in any desired form (concise version, learner's version, full-blown version, etc.) from a common internal representation. The possibilities are endless" (Ide, Vé ronis, 1994: 1).

Eight years later, precious little of their vision has come true in popular marketed MRDs, or even in this paragon of all modernity, web-based dictionaries and encyclopedias. The problem does not appear to be a lack of lexicographic or computational expertise and advanced technology. Rather, as it turns out, language education - native and foreign alike - is not yet ready to apply machine-readable dictionaries and encyclopedias on a large scale. The potential of electronic lexicography remains unexplored because educators and educatees see no place for multiple-access electronic dictionaries of radically innovative design in the process of language acquisition, be it in school [5] or at home. Sadly, this conclusion is not terribly revelatory either. In her 1995 paper on "Machine-readable dictionaries and education", Kegl agreed that "little in the way of progress has been made" since a large policy-charting conference on educational uses of word processors with dictionaries had been held thirteen years before (Kegl, 1995: 271). Her closing line also remains valid today: "the best future applications of MRDs in education will be those most able to respond to the insights and the needs of their users" (ibidem: 280). It is predominantly with users in mind that I have, in this paper, sketched the shape of things to come.

Notes

1. Modern learner's dictionaries provide for an expanding window of proficiency: the more advanced the target user is the larger the dictionary will be. Yet, this is wasteful of space and resources because the learner will gradually "grow out of" some basic lexicographic information, which can, then, be deleted. I believe the right metaphor to apply here is a dynamically moving proficiency window, where the discarded information does not burden the dictionary. The issue deserves a separate discussion, of course. See Bé joint, 1994: 95-7, 153 and 186, Scholfield, 1997: 281 and Perry, 1997 for recent short appraisals of this idea. As early as 1984, Kipfer noticed that "it would probably be best if some words were presented in chronological order, others were presented in decreasing order of frequency, and still others presented by grouping basic meanings together into subcategories" (ibidem: 108).

2. Because "users may simply wish to know which word, or words, function at one structural point other than that of the headword consulted" (Cowie, 1999: 137). For ingenious ways to extract and access collocations in an ordinary general bilingual dictionary, see Fontenelle, 1997.

3. Building mnemonics into "teaching" dictionaries has been suggested a few times, for example by Nation, 1989: 69 or Scholfield (1997: 298): "...compiling L2>L1 bilingual dictionaries (or L1 specific monolingual dictionaries) with suggested keywords added to entries, so that when an item is looked up, a means of actually retaining the information is directly offered by the dictionary".

4. Compare this vision of Aust, Kelley, Roby (1993: 72): "Wide-area databases could then compile data on such variables as the most commonly looked-up words, which texts prompted the greatest number of consultations, and the percentage of consultations by part of speech. These data would assist educators in teaching reading and vocabulary more effectively...".

5. These trends extend more widely to any FLT computer use in a formal educational setting in Poland, as demonstrated in a number of empirical studies (see Sobkowiak, 2002).

References

Aust, R., Kelley, M.J., Roby, W. (1993) "The use of hyper-reference and conventional dictionaries". Educational Technology Research and Development vol. 41, no. 4, 63-73.

Béjoint, H. (1994) Tradition and innovation in modern English dictionaries [Oxford Studies in Lexicography and Lexicology 1]. Oxford: Oxford University Press.

Benson, P., Voller, P., eds. (1997) Autonomy and independence in language learning. London: Longman.

Bielawski, L., Lewand, R. (1991) Intelligent systems design: integrating expert systems, hypermedia and database technologies. New York: John Wiley and Sons.

Bontcheva, K.L. (2001) Generating adaptive hypertext. Unpublished PhD dissertation. University of Sheffield: Computer Science Department.

Brusilovsky, P., Kobsa, A., Vassileva, J. (1998) Adaptive hypertext and hypermedia. Dordrecht: Kluwer.

Coady, J., Huckin, T. eds. (1997) Second language vocabulary acquisition. Cambridge: Cambridge University Press.

Cowie, A.P. (1999) English dictionaries for foreign learners - a history. Oxford: Clarendon Press.

Crystal, D. (1986) "The ideal dictionary, lexicographer and user". In Ilson ed. (1986), 72-81.

Dodd, W.S. (1989) "Lexicomputing and the dictionary of the future". In James ed. (1989), 83-93.

Ely, C., Pease-Alvarez, L., eds. (1996) "Learning styles and strategies". TESOL Journal vol. 6, no. 1 [special issue].

Fontenelle, T. (1997) Turning a bilingual dictionary into a lexical-semantic database. Tübingen: Max Niemeyer Verlag.

Hartmann, R.R.K. (2001) Teaching and researching lexicography. London: Longman.

Hartmann, R.R.K., ed. (1984) LEXeter'83 proceedings. Papers from the International Conference on Lexicography at Exeter, 9-12 September. 1983. Tü bingen: Niemeyer.

Hulstijn, J.H. (1997) "Mnemonic methods in foreign language vocabulary learning". In J.Coady & T.Huckin eds. (1997), 203-24.

Hunyadi, L. et al., eds. (1998) ALLC/ACH'98 Conference Abstracts. Debrecen: Lajos Kossuth University Press.

Ide, N., Véronis, J. (1994) "Have we wasted our time?". Cambridge Language Reference News vol. 4, no. 1.

Ilson, R.F., ed. (1986) Lexicography: an emerging international profession. Manchester: Manchester University Press.

James, G., ed. (1989) Lexicographers and their works. Exeter: University of Exeter Press.

Jonassen, D.H., Mandl, H. (1989) Designing hypermedia for learning. Berlin: Springer Verlag.

Kay, A.C. (1991) "Computers, networks and education". Scientific American (September). 138-48.

Kegl, J. (1995) "Machine-readable dictionaries and education". In D.E.Walker, A.Zampolli & N.Calzolari, eds. (1995), 249-84.

Kipfer, B.A. (1984) "Methods of ordering senses within entries". In R.R.K.Hartmann, ed. (1984), 101-8.

Lawler, J., Dry, H.A. (1998) Using computers in linguistics. London: Routledge.

Lew, R. (forthcoming). Dictionary use by Polish learners of English.

McArthur, T. (1986) Worlds of reference. Lexicography, learning and language from the clay tablet to the computer. Cambridge: Cambridge University Press.

Naiman, N. et al. (1996) The good language learner. Clevedon, Avon: Multilingual Matters.

Nation, P. (1989) "Dictionaries and language learning". In M.L. Tickoo, ed. (1989), 65-71.

Nunan, D. (1988) The learner-centred curriculum. Cambridge: Cambridge University Press.

O'Malley, J.M., Chamot, A.U., eds. (1990) Learning strategies in second language acquisition. Cambridge: Cambridge University Press.

Oxford, R. (1993) Language learning strategies, what every teacher should know. Boston, Mass.: Heinle & Heinle Publishers.

Perry, B.C. (1997) "Electronic learners' dictionaries (ELDs): an overview of recent developments". CALL Electronic Journal vol 1, no. 2. [http://www.lerc.ritsumei.ac.jp/callej/1-2/Perry.html].

Prat, I. (1994) Artificial intelligence. London: MacMillan.

Reid, J., ed. (1995) Learning styles in the ESL/EFL classroom. Boston: Heinle & Heinle.

Rubin, J., Thompson, I. (1994) How to be a more successful language learner. Boston: Heinle & Heinle.

Schmitt, N., McCarthy, M., ed. (1997) Vocabulary: description, acquisition and pedagogy. Cambridge: Cambridge University Press.

Scholfield, P. (1997) "Vocabulary reference works in foreign language learning". In N.Schmitt & M.McCarthy, eds. (1997), 279-302.

Shapiro, S.C. et al., eds. (1992) Encyclopedia of artificial intelligence. New York: Wiley Interscience.

Simons, G.F. (1998) "The nature of linguistic data and the requirements of a computing environment for linguistic research". In J.M.Lawler & H.A.Dry, eds. (1998), 10-25.

Sobkowiak, W. (1994) "Beyond the year 2000: phonetic access dictionaries (with word-frequency information) in EFL". System vol. 22, no. 4, 509-23.

Sobkowiak, W. (1998) "Phonetic access in OED2 on CD-ROM". In L.Hunyadi et al., eds. (1998), 158-161. [on-line here]

Sobkowiak, W. (1999) Pronunciation in EFL Machine-Readable Dictionaries. Poznan: Motivex. [abstract here]

Sobkowiak, W. (2002) "The challenge of electronic learners' dictionaries". Teaching English with Technology vol 2, no. 1. [http://www.iatefl.org.pl//call/j_article7.htm]

Sobkowiak,W. (forthcoming). "Subjective phonetic difficulty of English words to Polish learners: does frequency matter?". Language Learning. [abstract here and preliminary report here]

Sobkowiak, W., Kuczynski, M. (forthcoming) "Phonetics and ideology of defining vocabularies". Paper to be presented at the 10th Euralex international congress, Copenhagen, 13-17 August 2002. [abstract here]

Tarantowicz-Gasiewicz, M. (2000) Student modelling in intelligent computer-assisted language learning. Pedagogical issues. Unpublished PhD dissertation. Wroclaw: Uniwersytet Wroclawski.

Tickoo, M.L., ed. (1989) Learners' dictionaries: state of the art. Singapore: SAMEO Regional Language Centre.

Tutor, I. (1996) Learner-centredness as language education. Cambridge: Cambridge University Press.

Walker, D.E., Zampolli, A., Calzolari, N., eds. (1995) Automating the lexicon: research and practice in a multilingual environment [papers from a Marina di Grosseto workshop in 1986]. New York: Oxford University Press.

Wenden, A. (1991) Learner strategies for learner autonomy. London: Prentice Hall.

Wenden, A., Rubin, J., eds. (1987) Learner strategies in language learning. Hemel Hempstead: Prentice Hall.


MULTILINGUAL DATA ORGANISER (M.D.O.)
AN OVERVIEW OF A SMALL-SCALE PROJECT
by Leszek Bajkowski
Jagiellonian University Teacher Training College,
Cracow, Poland
lebajkow@merlin.in.uj.edu.pl

Introduction

This paper is an overview of an ongoing, small-scale project whose aim is to create a multi-modal Windows environment for foreign language material handling, tentatively called Multilingual Data Organizer or M.D.O. (http://merlin.in.uj.edu.pl/mdo/, where some more information about it can be found and a trial version downloaded). Some aspects of its development, its functions and possible implementations are presented.

The idea behind M.D.O. is to create a multi-function, multi-purpose, multi-language Windows environment which would enable the user (ideally a teacher, a learner, a translator, a lexicographer or a linguist) to collect and organise linguistic data in accordance with their individual needs. It would be then suitable for complementing teaching, self-study, or other professional or non-professional objectives of linguistic nature. The central module of the application, from which the work started, is the dictionary. A distinction is sometimes made in computational lexicography between lexicons "for computers" and dictionaries "for humans". In this paper the terms lexicon and dictionary are used interchangeably. A few other terms should also be regarded as loose concepts as they seem imprecise descriptions of some of the dynamic electronic entities under discussion (basic definitions used in lexicography can be found in Burkhanov, 2000)

M.D.O. is designed to hold and represent language data differently from regular publishers' electronic dictionaries like Webster’s or Longman’s (LDOCE). First and foremost, it is designed to build an empty database that the user fills in with lexical input. In fact, the Polish and English language files are not totally empty as, in the process of the project development, several thousand entries have been included in the main tables. For instance, approximately 16,000 English lexical items already exist. Some helpful functions have been created which make it possible to use the existing ASCII files (with texts or word lists) to fill in the M.D.O. by either reading all the words off the list or, semi-automatically, by asking the user first which words to add. This method was used to obtain the first 14,000 English database records.

The original idea dates back to 1997. The project sprang up out of the author's professional needs and experience of an English language teacher as well as his personal interest in humanities computing. One of the primal motives was to create a flexible, customizable tool for organising English language lexicographic data and, more generally, English language teaching material. The dissatisfaction with the existing (available) multimedia educational software and educationally-oriented electronic dictionaries was a stimulus to take interest in database development environments and programming languages. The development of the author’s own programming skills was a welcome spin-off.

The choice of the database structure

By its nature any Machine Readable Dictionary (MRD) is a kind of a database. Large-scale software engineering projects often use UNIX-based ASCII databases and tools to re-format information in human-readable texts using text markup.

Standard Generalised Markup Language (SGML) is a well-known coding system widely accepted in lexicography (cf. The WWW Consortium which sets the SGML standards, http://www.w3.org). In this system labels in brackets carry the extra-textual information, which in lexicographic contexts indicate the micro- and macrostructure features. The best-known derivative of SGML is HTML used to code WWW documents. Extended Markup Language (XML), which is another sub set of SGML, has also been proposed as a flexible source interchange format to describe the information in different dictionaries. The textual format of the above solutions ensures common readability and transfer of the same material across many systems and coding conventions. This is one of the reasons why a relational database using a particular data file format and working on one platform is not regarded as suitable for electronic lexicons and MRDs, especially ones used in computational lexicography contexts. On the other hand, any existing, Windows-based relational database management system can be taken advantage of in making a functional lexicographic application for home use.

Computerised lexicographic databases are traditionally based on a relational model. In short, a relational database can be visualised as a set of tables, storing the content information or the relations (or links) between the content records. M.D.O. stores each headword, label, word sense definition, etc in a separate table (file) in a pre-defined format. The records are made of fields. One of the basic requirements of a relational database is that an ID number should exist for each record for its unique identification. Since an index of numbers can be searched and filtered faster than textual data, relational databases tend to be faster than text-processing tools in returning output.

As mentioned above, lexicographers' databases work on texts and one possible way of structuring the data is by annotation. Annotation (also called tagging) is a practice where each headword is provided with descriptive codes (for examples of MRD inner structures see Gazdar’s examples for LDOCE and COMLEX). This way of representation inflates the data to be filtered. M.D.O. database is formatted according to a principle of the minimum number of component fields. It is through linking that the higher level of structuring is achieved. The word entry, therefore, is meant here as more or less synonymous to record. Entry blocks are virtual graphic representations automatically generated by way of collecting and filtering the data sets on the basis of links.

M.D.O. has been made using Clarion for Windows 2.0 (a product of Softvelocity, http://www.softvelocity.com/core/default.html), a rapid database development environment, which is capable of building complex, dynamic relational databases in an object-oriented language, and may be compared with the widely used Access, Paradox, or FoxPro. In the "wizard" mode Clarion can generate the application's code based on the predefined templates and in such a mode creating an unsophisticated, conventional database is a question of minutes. Any major alterations or a move towards sophistication, however, will unavoidably complicate the matter by involving some hand-coding. Opting for the hand-coding mode (hand-coding from scratch) makes the task much more arduous and time-consuming but, at the same time, allows for invention limited only by the programmer's skills and the restrictions of the tool itself. The executable is relatively small and a well-built application works fast.

Reusability of resources

Technically, the project primarily aims at achieving a level of flexibility and manageability not possible with the existing mono- and bi-lingual commercial products of this type. Its other objective is to provide means for extensive data sharing, referred to as reusability of resources. This can be done on two levels: (1) the newly added modules could use the same data for a variety of purposes, (2) the same data (entries) could be used in a variety of ways. The first rule is not different from any modern software development procedure. It allows designers to change the user-end interfaces, or add new features to the existing applications. It is not clear, however, how common the second approach to database design is. Since the M.D.O. entries must be provided manually, making extensive use of them seems a simple way of avoiding redundancy. In practice, the reusability of data can take several forms. For example, the same first language headwords can be linked with any number of foreign language headwords. Reusing sentences, on the other hand, means that every constituent string of characters inside a sentence is considered for display and the same sentence appears to illustrate the use of a number of words found in it.

Overview of the M.D.O. basic features

The M.D.O. database consists of the following components: native language (L1, e.g. Polish) headwords, foreign language (L2, e.g. English) headwords, L2 pronunciations, L2 usage and user-defined labels, L2 sense definitions, L2 lexical co-occurrence notes, L2 etymology notes, and L2 example sentences. All records are automatically sorted in alphabetical order. Additionally, there are tables storing relational information on synonyms, antonyms, related headwords, irregular forms, etc. Some features link the database with the user’s untagged corpus of text files (in txt format) as well as multimedia files – audio (.wav format), images (.jpg, .gif, .wmf, .bmp or .pcx formats), and movie clips (player-dependant formats), which can be collected and accessed from within the program through API calls to Windows standard or user-defined tools. The user can add, delete and change entries, which become resources used by testing or material-building modules incorporated in the environment. Teacher-oriented modules might include: a vocabulary test-maker (creating printable tests), a simple corpus-based sentence extractor (for finding relevant samples of language usage), or a text vocabulary extractor (for deriving a list of words from an electronic text). Student-oriented modules might include: a word meaning tester (multiple-choice vocabulary test), a spelling game (for instance 'hangman'), a grammar book (again using the collected sentences to illustrate the structures covered), or a concordancer (to study word use and collocations).

M.D.O. is a Windows application with the 'mother' window against whose background all modal windows open. The modules are available from that 'mother' window menu. The program loads the necessary information at the start and operates on the tables stored in RAM unless the user changes the data, in which case the hard disk files are engaged and the RAM-stored information is updated. By default the dictionary module opens first, and the last-accessed item automatically becomes the first query request.

Prior to the appearance of the 'mother' window the user is offered a choice of the foreign language s/he wants to work with. The choice list simply reflects the existing subdirectories in the program’s main folder. Creating a new folder adds a new language to the start-up selection window. (In fact the procedure has not been automated and the help file is needed to guide the user through the process of adding a new language, which happens to be a little more complicated.)

The program opens two 'languages' at the same time, one of which must be the native language (L1). Since English is the foreign language of the designer’s choice, the application's metalanguage is English and most functions are geared towards that language. Working with languages other than English is possible, although functionally restricted. Windows allows the user to switch the keyboards in order to use a different font needed for writing in any of those languages. At this point, however, M.D.O. is unable to cope with the limitations arising from the morphological (inflectional) nature of those languages.

The same L1 table is used with all foreign languages. The dictionary works both ways and once a link has been made between headwords, both L1-L2 and L2-L1 queries are possible. Interestingly, the publishers' MRDs being but electronic representations but the hardcopy dictionaries, 'two-way' queries of the same database information do not seem to be the attitude of all electronic dictionaries developers, who tend to have both the L1-L2 and L2-L1 volumes separated. (It is not difficult to imagine that looking up a Polish word among the definitions in a traditional English-Polish dictionary and looking up the Polish sense in the Polish-English dictionary would yield different results.


The dictionary module

The screenshot of the dictionary window can be seen at this location. By using the drop-down combo list it is possible to incrementally access any headword of the selected language (L1 or L2). It is also possible to type in the query string without locating it in the list. The language (LG) button switches the combo box between L1 and L2.

The dictionary search is a two-step procedure. First, the main list of entries is checked against the query string and a set of matching lexical items is retrieved. For example the query string 'design' will make the dictionary retrieve 'design (n1)', 'design (n2)', 'design (v1)', 'by design', 'randomized block design' etc. These are put into a special list of pre-selected records. In the second step, the user may highlight any of the items on the list to bring up the related information. By default, the first entry is automatically selected for the display. Since it is one of the design principles that the same data can be reused and exploited in a variety of ways, it is possible to link one L1 headword with numerous L2 headwords. This approach eliminates the need to have homonymous forms repeated. Unless the user decides in favour of precise L1 definitions, the same form is applied to a number of L2 entries, regardless of the sense differences between them. For instance, the Polish word 'rzad' can be linked to many English headwords ('government', 'administration', 'cabinet', 'rule', 'row', 'line', 'rank', 'tier', 'file', 'order', etc), irrespective of the sense distinctions and saving on retrieval time.

Each L2 headword can be linked to one or more L2 definitions. It is interesting to see how the time spent with the program invisibly increases the time a person spends with numerous paper and electronic dictionaries in search of data and information to feed the database. In the course of the study the material gathered allows for making at least superficial comparisons of the sources. One of the most interesting observations of the author is that, for a non-native speaker of English, the final understanding of an English item was, in many cases, a build-up process through a series of complementing definitions, equivalents, and illuminating examples accumulated from those sources. One thing it suggests is that the information in such dictionaries, drastically simplified and shortened for lack of print space, is often confusing and inconsistent. It may be indeed insufficient to allow the learner full insight into the meaning of an item. With M.D.O. the user is able to collect and combine information from a variety of sources to arrive at a comprehensive compilation.

Synonyms, antonyms, etymology notes, and relatives and friends can also be linked to the L2 headwords. In accordance with the reusability of resource principle, synonyms and antonyms come from the same file as the L2 headwords. Etymology notes can be more than informative pieces; they can be used by learners to foster certain items by making mental connections to its origins and thus remember it better. Some entries can be forced to be displayed automatically together with a given headword. These can be either 'relatives' - morphologically related forms, like derivatives, different enough to be normally ignored by the look-up mechanism ('designer' is not called up when looking up 'design') or 'friends' - items associated on any other basis (for instance 'turtle' and 'tortoise' and other words frequently confused for their formal similarity).

Sentences and phrases - the use of annotation

The sentences collected and typed in by the user are displayed with each entry if the query string has been found inside the string of the whole sentence. In order to avoid confusion with homographs ('spelling look-alikes') and, consequently, to retrieve only those examples relevant for the active entry a simple annotation system has been implemented. The L2 entries can be labelled to distinguish between headwords identical in form but different in meaning or function. The form of the tags is user-dependent and the system will work as long as it is consistent. In the existing demonstration version 'n' stands for a noun, 'v' stands for a verb, and so on. Thus 'design (n1)' - a noun sense number one - is separate from 'design (n2)' - a noun sense number two and from 'design (v1)' - a verb sense number one, and so on. Since the same tags may be used in sentences, an example such as 'Has she made the design (n1) for her dress herself?' will appear to illustrate 'design (n1)' while the sentence 'An architect is designing (v1) a house for us' will pop up if the highlighted entry is 'design (v1)'. The English suffixes (-ing, -ed, -s, etc) are also recognized by the retrieval procedure so various inflected word forms - 'designed', 'designing' or 'designs' - will be found and displayed. The 'SENT' button offers an alternative display of the long sentence examples in bigger font and wrapped-up in the box.

Multi-word lexical items have always been lexicographer’s nightmare (cf. Baddorf, 1996). The M.D.O. solution is by the strategy of problem avoidance. A sentence to be found illustrating a complex multi-word expression (phrasal verb, idiom) must be tagged with that expression in its canonical form, as it appears in the L2 headword list. Thus, to make a sentence illustrate 'give sb a telling-off' a sentence would have to be coded as '(give sb a telling-off) Diana gave the children a severe telling-off when she saw them playing near the road.' The phrase would be detected but not displayed.

Unlike the above, the phrase/collocation search is done by way of parsing the text entry of the phrases file. The entries require that the headwords be in the same form as in the main L2 list (tagged, if necessary). The plus sign (+) must be used on one or two sides of the headword to indicate the possible collocational or syntactic context. A possible entry for the word 'faint' is an illustration of this method: 'be, feel + faint (adj) + voice, murmur, sound, idea, chance, hopes, traces of' reads: the adjective 'faint' can occur in the following contexts: be faint, feel faint, faint voice, faint idea, etc. The same information can be separated by having two entries: be, feel + faint (adj) and faint (adj) + voice, idea, ..., but the plus sign must be there for the program to consider the entry at all.

Multimedia

M.D.O. displays any multimedia file linked with the active headword but the multimedia files must be stored in their respective folders, called 'SOUNDS', 'IMAGES', and 'MOVIES'. If audio files have been linked with the active entry, the 'play sound' button is enabled (otherwise it appears as dimmed). The linked audio (.wav) files are played one by one at each consecutive press of the button. If an image file or a video clip is related to that entry, the appropriate display button is similarly enabled.

The corpus

Selecting the corpus option allows to access text (*.txt) files stored in the 'CORPUS' folder and pieces of texts including the query string are retrieved. The text button calls Windows Notepad showing the whole (con-)text for a given corpus example. The mechanism does not look for tags in those texts and therefore it is not a corpus-query tool proper. It will tend to find the string rather than the word and, naturally, fail to distinguish between homonyms. Corpus search result display is illustrated in screenshot that can be accessed at this location.

The concordancer makes it possible to view all examples in the appropriately structured manner. A 'history' list can be used at any time to go back to any of the previously accessed entries.

Filtered searches

A number of search mechanisms are available, including a label filter and a number of 'only in' searches. Filtering of output means extracting of only a subset of the headwords according to a specified criterion. One possible way of performing a filtered look-up is to scan for all headwords that have been linked to a particular label, e.g. all nouns, all phrasal verbs, or all adjectives. In the future it will be possible to combine the search criteria and look for all adjectives, referring to people, and used in American English. 'Only in definitions' procedure looks up only those records whose definitions contain the query string. 'Only in sentences' and 'only in corpus' browse the respective resources in search of the query sub string. The phonetic query ('only in pronunciations') allows a query with International Phonetic Alphabet (IPA) symbols. Only those headwords will be displayed whose pronunciations contain the IPA query string.

Category labels

One of the strongest features of M.D.O. is the table of user-defined category labels. Standard categories, such as 'noun', 'verb', 'countable', 'intransitive', as well as custom indicators can be introduced into the database. Custom labels can be of any type, for example 'level: proficiency', 'from Masterclass Unit 10', 'refers to movement or way of walking', 'symbolizes healing, growth or newness', or 'synonym group: blame' (the last one based on the idea of Longman Language Activator). When English is the active foreign language, shortcuts are available on the dictionary window to label the active headword as any of the main part-of-speech labels (N, V, ADJ, etc). 'Make Links' and 'Delete Links' functions, available from the dictionary menu, create and delete relations between the selected items, respectively. Additionally, the labels belong to groups, which are introduced only to help arrange and locate them in the display window. One obvious example is 'POS' (part of speech) group, with such members as 'noun', 'adverb' or 'preposition'. The group labels are also user-defined.

Editing and selecting

The individual lists can be altered and updated through Editor windows available from the main menu. It enables the user to add new items, delete existing entries, and change the currently highlighted ones. Any entry can be linked with a sound file ( in .wav format) as long as the user can provide it. The audio files are played out using the standard Windows device sndrec32.exe. The same windows are used to edit and select records. Screenshot available from this location shows the pronunciation list editor.

Some technical considerations

Basic computer skills are needed to operate M.D.O., but intermediate computer understanding is expected from the user to take full advantage of some of its more intricate solutions. The required skills include things like managing the phonetic alphabet key strokes or the recognition of the importance of the space (the empty character) in entries. The user, for example, must be aware of the difference between the headword + the space + the tag string like 'design (n1)' as opposed to a string where the tag directly follows the headword, in which case the machine recognises the whole form as one ( 'design(n1)' ). At this stage M.D.O. assumes the user’s accuracy in the creation of entries.

The system of tags and the example sentence hand-made annotation imposes a duty on the user to ensure congruity and coherence within the database. The tags are user-defined but there is no procedure to verify their actual application and concurrency across the data sets. Referential integrity constraints the user to be careful, but the program itself will be performing well even if the user's input is chaotic. Any instance of incoherence can be edited the moment it is spotted as the sentences are easily available for editing and the annotation is easy to add and change. Total automation or the creation of a semi-automatic tagger is beyond the scale of the project and the programming capabilities of its author.

Another two issues are the absence of the syllabification indicators and the tool’s inability to parse compounds. To make sure that flowerpot, flower pot, and flower-pot are all retrievable regardless of the form of the user’s query, the alternative spellings inside square brackets would have to be provided (in this case the following entry would have to be added: 'flowerpot [ flower-pot , flower pot ]' )

Possible applications of M.D.O.

M.D.O. has a wide range of possible applications. It was first designed as a language teacher’s help. The results of Sobkowiak’s survey (Sobkowiak, 2001), done to examine the attitude of Polish teachers of English towards the use of electronic dictionaries, show that more than half of subjects know and use at least one computer dictionary, but none of them has used it in the classroom. This observation, however, may be reversed: in spite of teachers' reluctance to use electronic dictionaries in the classroom, many of them use them at home. It can be safely assumed that they also use word processors and other software to prepare, directly or indirectly, for work. Those individuals in the profession who like to arrange and adapt their own teaching materials, may clearly benefit working with the M.D.O.'s support.

Professional translators and lexicographers may also find it useful in building their own field-specific glossaries (a 'glossary of building tools and materials', or a 'glossary of roofing terms' are examples based on the list of online specialty dictionaries at YourDictionary.com - www.yourdictionary.com).

Learner-oriented features can make M.D.O. a valuable learner's aid. Computer-assisted vocabulary acquisition is but an obvious example. Language learners are in constant need of material revision. They spend a lot of time taking notes, highlighting words in printed texts, compiling glossary lists, and otherwise working with new and old vocabulary. What has been done with paper and pencil can now be done with a screen and keyboard. Despite the change of the medium, the practice of vocabulary note-taking need not disappear. However, it has to be borne in mind that computers lend themselves to home study better than classroom use. Blok et al. (2001), who reviewed courseware comparing the effectiveness of traditional as opposed to computer-assisted acquisition of words found out that both brought similar results. Concluding, the researchers point out that

"... probably the most important advantage the computer has to offer is the fact that the computer allows for individually tailored learning. Students may choose the words they want to study, the way the words and word meanings are presented, the kind of learning activities they prefer, the pace in which they want to process the information and the frequency of processing, the moment they want to take a self test, the time of the day they want to study, and so on."

They also regret that "...most of these advantages have not been implemented in the courseware we have reviewed ...."

 M.D.O. offers mechanisms to collect and accumulate lexical data. From the student's perspective, saving the word stock in a database makes it available for future exploitation and keeps the words from falling out of memory (at least the computer memory). The recollection, consolidation, and fostering techniques can put into effect at any time through testing and learning modules of the program. 'Writing' the notes into an electronic database, and thus authoring one's own lexical database (making a well-motivated learner into a sort of 'lexicographer') could raise the student’s knowledge of the foreign language vocabulary and the awareness of the foreign language itself. This is not far from the conclusions of a study presented by Nikolova (2002), which produced some evidence of the positive effects that student participation in authoring of multimedia instructional materials has on vocabulary acquisition. It would be unrealistic to expect many learners to self-study in this way, but learners whose motivations are typically computer-related are potentially among M.D.O.’s first users. Moreover, for the computer-generation learner, computer-assisted learning may become the first thing that comes to mind and a more natural choice.

A comprehensive point of view has been expressed by Rundell (1996), who points to the general shift in pedagogy towards a more learner-centred paradigm and traditional distinctions losing significance in the technologically diversified reality. He makes a comment that appears to be relevant in the present context:

"Liberation from the tyranny of print will have profound consequences because it completely invalidates a whole range of long-established binary notions about reference materials. In the electronic medium, many of the old distinctions - e.g. dictionary vs. grammar, dictionary vs. encyclopedia, onomasiological vs. semasiological dictionary, global-monolingual vs. bilingual dictionary - become much less relevant, and it is possible, for example, to envisage an all-purpose learners' dictionary that could be customized according to the specific first language, level of competence, and field of interest (e.g. Business English) of an individual user."

Other custom oriented dictionaries

Very few non-commercial tools have been made that can serve the same purpose or purposes as M.D.O. The author of this article is aware of two such projects, namely KURA and the Customized Lexicon. KURA (Nepali for 'language') was made by Boudewijn Rempt (http://www.xs4all.nl/~bsarempt/linguistics/index.html). It is a UNIX/Windows MySQL database application for language description. It is different from and by far more sophisticated than Customized Lexicon, which in turn is a vocabulary tool aimed at enriching and testing English vocabulary. The Customized Lexicon is a project by Cosmin Grigorescu (www.cs.rug.nl/~cosmin/) of the Faculty of Automatics and Computers at the University of Bucharest. The designers of the CALLE system present a very similar attitude by building

"a generic environment for learning foreign languages, (...) a personal companion for the user, who starts to learn a new language from scratch together with the user. The language student can use CALLE in combination with a textbook to assist him in understanding and translating new texts as well as for the fast retrieval of the meaning of foreign words. Because of its symmetric system architecture CALLE provides also translations of words and sentences into the foreign language. Furthermore, CALLE can perform several types of exercises, e.g. inserting missing words in sentences, correcting the sequence of words..."

 Commercial dictionaries, many of which offer testing modules, games and other learner-oriented features, are increasing in number. Major publishers are developing their well-established products. For instance, Oxford Advanced Learner's Dictionary has pronunciation, grammar and vocabulary activities for advanced learners, and language learning games 'providing hours of fun through countless permutations.'

Only few of those products allow for the addition of user-defined data (for example Leksykonia, http://www.lexland.com.pl), but are restricted in functionality in other respects when compared to M.D.O. Predictably, well-known publishers like Merriam-Webster (www.m-w.com), Pearson Education Limited (http://www.pearsoned-ema.com) or Collins (http://titania.cobuild.collins.co.uk/) are determined to keep the original content of their learners’ dictionaries intact.

Final remarks

The next step in the development of M.D.O. is to continue improving the application, adding new modules, and ensuring their reliability. A step further would be to create a similar application that would combine the full client-server capability of online resource browser (including access to corpora, dictionaries and Distance Learning centres) with the flexibility of a user-controlled database.

Since work on M.D.O. commenced and has continued with no external funding and in the designer’s spare time only, and the project tends to be delayed by technical difficulties, it should not be surprising that it has taken quite a long time for such a small project to take shape. In spite of the drawbacks of the underdeveloped application (like the lack of language content and the need to correct its weak or defective components), the simple Windows program has shown a potential of being a very effective teacher’s aid, and as such it has proved, in more than one respect, to be more useful than well-known professional applications. Perhaps more than a working tool, it could be an inspiration, or a working model, for any better application to be built in the future.


References

Baddorf, D.S. (1996) "What constitutes a well-formed phrasal entry in a computer lexicon?" Ph.D. Thesis Proposal. Chicago, Il.: IIT.

Blok, H, van Daalen-Kapteijns, M.M., Otter, M.E., Overmaat, M. (2001) "Using Computers to Learn Words in the Elementary Grades: An Evaluation Framework and a Review of Effect Studies", Computer Assisted Language Learning, vol. 14, no. 2, 99-128.

Boguraev, B., Briscoe, E., Carroll, J., Copestake, A. (1992) "Database Models for Computational Linguistics". Proceedings of EURALEX '90 Barcelona: Biblograf/VOX. 59-78. Abstract at http://www.cogs.susx.ac.uk/lab/nlp/carroll/abs/92bbc.html

Burkhanov, I. (1998) Lexicography. A Dictionary of Basic Terminology. Rzeszow: Wydawnictwo Wyzszej Szkoly Pedagogicznej.

Corréard, M-H., Mangeot, M. (1999) "XML - A Solution For LDBs, Eds and MRDs?" Proceedings of COMPLEX'99, Pécs, Hungary, 16-19 June 1999, vol 1/1
http://www-clips.imag.fr/geta/mathieu.mangeot/
publis/complex99/complex99-MM-MHC.rtf

Nikolova, O.R. (2002) "Effects of Students’ Participation in Authoring of Multimedia Materials on Student Acquisition of Vocabulary". Language Learning & Technology, vol. 6, no.1, January 2002, 100-122, http://llt.msu.edu/vol6num1/NIKOLOVA/default.html

Rundell, M. (1996) "The corpus of the future, and the future of the corpus." Talk at Exeter conference on 'New Trends in Reference Science' http://www.ruf.rice.edu/~barlow/futcrp.html

Sobkowiak, W. (2001) The Challenge of the Electronic Learners' Dictionaries. Teaching English with Technology, vol. 2, no. 1, January 2002. http://www.iatefl.org.pl//call/j_article7.htm

Related links

CALLE Project Website, http://citeseer.nj.nec.com/update/411558

Carroll, J. (1992) The Cambridge/Acquilex Lexical Database System, http://www.cl.cam.ac.uk/Research/NL/acquilex/ldb.html
Cosmin, G. Home Page  http://www.cs.rug.nl/~cosmin/

Gibbon, D. (1998) Computational Lexicography. An online course. http://coral.lili.uni-bielefeld.de/
Classes/Winter99/GSdictionary/CompLex/node1.html

Gazdar, G. (1999) The web course pages http://www.cogs.susx.ac.uk/
lab/nlp/gazdar/teach/nlp/

M.D.O. website, http://merlin.in.uj.edu.pl/mdo/


JOURNAL HOME PAGE | CONTACT US | COMP SIG NEWS | PAST EVENTS | FUTURE EVENTS | SUBSCRIPTION INFO

Produced in Poland by IATEFL PL (c) 2002
Last Updated: May 10, 2002