|
IATEFL Poland A Journal for Teachers of English ISSN 1642-1027 Vol. 3, Issue 3 (July 2003) |
|
Software |
||||||||||||||||
| |
|
|
|
SPEECH RECOGNITION SOFTWARE: A
little girl picks up a ringing telephone and says, “Hello?” Three businessmen
are seen and heard on the other end of the phoneline speaking Japanese. At the
same time, the sounds coming out of the phone the girl is holding are
recognizably English. The little girl leans away from the phone and asks her
father in English the question that the men are asking. He yells the answer
from another room, she relays it in English; her answer is heard by the men in
Japanese. The men happily end the conversation and hang up. This scenario is
based on a recent Speech recognition technology works in the
following way: the user speaks into a microphone, and a computer uses acoustic
analysis to analyze the phonemes (individual sounds) uttered. The computer
searches the available vocabulary database and then chooses the words that seem
most likely to have been produced. Accuracy increases under the following
circumstances: words are spoken slowly and individually, there is a small range
of vocabulary possible, low background noise exists, repetition exists, and/or
the computer is familiar with the speaker’s voice. Speech recognition accuracy
can reach 99 % if these conditions exist; 87 % is the best that can be done
without these aids (Ordinate, 2002) History Speech
recognition technology has had an interesting history. According to
Christensen, Maurer, Miranda and Vanlandingham (2002), the first speech
recognition product that was ever offered on the commercial market was actually
a toy dog. When the dog’s name, “Rex,” was uttered, the acoustic energy of the
vowel sound broke an electromagnetic field and caused the dog to come out of
his house. During the 1940’s the U.S. Department of Defense searched for a way
to automatically translate messages sent in Russian into English. Although the
program was a failure, the government did go on to fund more successful
research in speech recognition as a result. Bell Laboratories experienced early
success with speech recognition technology, in 1952 producing a system that
could recognize the numbers 0 through 9 and then in 1959 a system that could
recognize English vowel sounds with 93% accuracy. Today’s technology has
progressed greatly as it has been possible to handle increasingly varied
vocabularies, dialects and rates of speech - the keys to future progress
(Kewley-Port, 1994). (For more specific information about the development of
speech recognition technology, visit http://cslu.cse.ogi.edu/HLTsurvey/ch1node4.html)
However, the technology needed to carry out the task in the opening scenario
above has not yet been developed. Social Context In the consumer
market, most of us have encountered speech recognition technologies on the
telephone when utilizing directory assistance. Several telephone companies use
a speech recognition server that recognizes the names of cities uttered by
customers, and then connects those customers with the correct operator. (For an
audio demonstration of this type of application, visit http://www.nsc.co.il/).Those working in the
medical field utilize speech recognition software for medical dictation rather
than relying on sending out tapes to transcriptionists, a process which can
take days and several drafts to eliminate errors. Many people who are unable to
use a keyboard due to disabilities are able to enter data or surf the Web with
the assistance of speech recognition technology. This technology entered the
military landscape recently when a hand-held device, the Phraselator, was used
by Educational
Context Since
technologies usually find their way from the consumer market to the educational
arena, it is worth noting any developing technology for its inevitable impact
on education. Speech recognition technology most often shows up in schools as
an assistive device for students with disabilities. Two commonly used programs
are ViaVoice Pro USB Edition (2003)
by IBM (http://www-3.ibm.com/software/speech/)
and Naturally Speaking Preferred 7.0 (2003) by Dragon Systems
(http://www.1st-dragon.com/dragnatspeak.html).(For an evaluation of ViaVoice and Naturally Speaking, visit http://www.webreference.com/new/991108.html). In addition, some schools are
beginning to use speech recognizers to assist students as they read aloud.
Videos describing Across student
populations, speech recognition technology that may hold the most promise
for those learning or needing to
communicate between languages. This promise makes itself evident, for example,
in the recent television commercial described earlier. How might this
technology affect the language learning classroom? Speech recognition software has already
begun to make an
impact on language learning. One example is that of language testing or
grading. In an intersection between psychology and linguistics, Ordinate
carried out research on how native speakers of English rate the
understandability of non-native speakers of English and then utilized speech
recognition software to create a test in which a non-native speaker of English
places a phone call to the Ordinate testing number, listens to prompts in
English, answers the questions in English and receives a rating from the
software on fluency, listening, vocabulary and pronunciation. (A demonstration
of this test is available at http://www.ordinate.com.)
Interestingly, Ordinate claims to have higher accuracy at judging non-natives’
speaking abilities than that arrived at by human raters (Ordinate, 2002). Another
educational application is that of pronunciation training for the profoundly
deaf. Projects such as the Tucker-Mason Project, which is supported by a
National Science Foundation grant, involve the creation of software that allows
deaf users to give oral commands to the computer (Center for Spoken Language
Understanding, 2002). For a
description of these speech recognition applications, visit http://cslu.cse.ogi.edu/asr/. If a minimum level of understandability is not reached,
the computer will not carry out a command. It is worth noting that rather than
being focused on accuracy of language use, such applications appear to hold
communicative competence as their goal. Currently, a few
educational software packages for English language learners take advantage of
speech recognition technology. DynEd has produced New Dynamic English
(2001) for adult learners (http://www.101language.com/dyned-nde.html)
and Let’s Go (2001) for child learners (http://www.esl.net/dyned-lgfeatures.html
. The children’s version allows the user to orally produce a single word at a
time, while the adult version allows the user to produce either a single word
or an entire sentence in response to video or graphic cues and then receive
feedback on the pronunciation of the user’s production. If a minimum level of
understandability is not reached, the program encourages the user to try again.
One current drawback of New Dynamic English is that if the uttered
sentence is very close in sound to the intended answer, the program may not
catch an error. For example, if the learner uttered a sentence with “is”
instead of “isn’t” - a serious difference in meaning - the learner may not be alerted of the difference. Auralog has
also developed programs utilizing speech recognition: TeLL me More Pro
(2000) for adults (http://www.multilingualbooks.com/aura-tellp.html)
and TeLL me More Kids (2000) for children (http://multilingualbooks.com/aura-tellk.html).
The minimum level of understandability can be adjusted for each student with
these programs. In addition, TeLL me More Pro allows the user to view
the acoustic patterns of an utterance. However, there are two problems with
offering learners acoustic patterns as evidence of their pronunciation ability.
First, most language learners are not linguists, and a linguistic background is
practically necessary in order to understand these wave forms. Second, even
native speakers have difficulty reproducing the exact wave forms produced by
the speakers on the software. One possible
application of speech recognition software for beginning language learners is
that of a scaffolding device for building literacy. If learners are able to
produce spoken English much more readily than they are able to produce written
English, it might be useful for them to bridge into writing by, for example,
telling stories to the computer and then seeing their own stories in print. The
problem with this scenario is that the usefulness of such a tool would probably
be shortlived in terms of the learners’ need for this literacy assistance, yet
a program such as ViaVoice, which takes only minutes for a native speaker to
train it to his or her voice, might
take many hours to adapt to the non-native speakers’ voices and thus accurately
type the words spoken. This would most likely put an added burden on the
teacher, as well, whose efforts might be better spent on other
literacy-building activities. One issue that
instructors of adult English language learners often grapple with is that of
the special spelling problems of students who speak either Arabic or Hebrew as
a first language. Since neither of these languages usually includes vowel
sounds in writing, students often face seemingly insurmountable spelling issues
in English; words are often written with such unusual spellings that even spell
checkers cannot locate the correct words. Speech recognition software would
allow these students to sidestep this serious writing issue. Once again, the
time that it takes the technology to adapt to a non-native user’s voice is an
issue here, although less so than with a child learner. Also, this technology
might actually step in the way of a learner ultimately improving spelling
problems; rather than utilizing the tool as a scaffolding device, a learner
could become dependent upon the tool. Speech
recognition software shows promise for assisting language learners with
pronunciation issues. Pronunciation is an area that few language teachers have
expertise in, yet many learners need or demand assistance with in order to gain
communicative competence. Although quality pronunciation training following
from the most recent research would be optimal, software utilizing this
technology may be able to help learners understand when they have reached a
level of general understandability, especially as this technology continues to
improve in its ability to respond to learners’ utterances. Referring back to
the example at the beginning of this paper, although it is most likely far into
the future, speech recognition software with accompanying translation
technology might allow those with little or no speaking ability in a foreign
language to carry on conversations via telephone with speakers of that
language. For example, a middle-school EFL class in Hong Kong could brainstorm
questions that they have about some aspect of British culture, arrange for a
phone conference with a native of England, plan out what they are able to say
in English, and then let the translation software pick up where the learners’
abilities to speak and understand English break down. Deeper Issues In Fabos’ (2001) study, "Media in the
Classroom: An Alternative History," Fabos stated that although all new
technologies in the classroom over the last century have been greeted with the
same initial enthusiasm and hope that the technology would be able to solve
administrative problems and enhance the teaching process, these technologies
have eventually been rejected to some degree by teachers. Fabos suggested that
the problem has often been the content that consciously or unconsciously enters
the classroom along with the medium. Whenever a technology is brought into a
learning environment, it always creates a slightly different
learning environment, although the differences may be difficult to discern at
first (Postman, 1992). So, how might our utilizing of speech recognition
software with language learners influence our classrooms? What would we
(possibly unknowingly) be teaching our learners about the world, about language
and about communication with others? The use of speech recognition technology
in combination with software that includes role plays based on authentic
situations would teach our students that oral interactions with others is the
goal of language learning and that pronunciation is one aspect of communicative
competence. The use of this technology to assist those who have problems with
writing would teach that we are able to access our strengths in language
learning to assist with our weaknesses. It might, however, also teach learners
that they can rely on their strengths without having to improve the areas that
most challenge them. By using the technology as a translating
device, we would be giving many messages to our students: that language
learning is not essential and that communication is simply a matter of
translating vocabulary items and grammar. Monke (2001) asked in response to
educational choices such as this one: Just
how small do we want our children to believe the world to be? How much of the
illusion of next-doorness do we want to give a student who hasn’t traveled much
beyond the borders of his or her state, or city for that matter? What kinds of
misunderstandings about the world does this kind of undifferentiated communication
give a young person? (Monke, 2001: 66) Mastering a second or foreign language is
a huge task; successfully negotiating meaning with native speakers is an
enormous accomplishment. By utilizing speech recognition technology in ways
such as this, we may be obscuring this reality from our students. In addition, if technology reaches a point
at which we no longer need to learn a second or foreign language in order to
communicate with others, we need to rethink our reasons for acquiring another
language. Research has pointed towards a link between language learning and
cognitive development. Although some researchers caution against drawing strong
conclusions about a causal link, there does seem to be a positive relationship
between bilingualism and linguistic, metalinguistic and cognitive abilities
which reach far into other areas of the language learners’ lives (Diaz, 1985;
Hakuta, Ferdman & Diaz,1986). Any such gains from language learning could
be lost, however, if the government no longer sees a need to fund programs for
foreign language teaching or for language minority students due to advanced
speech recognition and translation technology. Since this technology is fairly
inexpensive and could potentially be adopted by many intensive English programs
as pronunciation aids, for example, the use of this tool may hinder a
recently-improved aspect of M.A. TESOL programs. In the early and mid-nineties,
few M.A. TESOL programs trained pre-service teachers in pronunciation issues.
However, in the last five years, such preparation has become more widespread.
Although software can never replace the role of the teacher in pronunciation
training, it may be viewed as capable of this. Once again this aspect of
communicative competence may no longer be covered for Master’s degree students.
Salaberry (2001)
suggested that we express cautious and
reflective interest in new technologies rather than an overly enthusiastic
attitude. Many of the issues raised above point towards the need for much
consideration of the impact that speech recognition technology might have on
the language learning classroom. Readers are encouraged to critically explore
the possibilities and implications of speech recognition themselves by
downloading some examples of current technologies. Several examples can be
found at http://www.speechtechnology.com/free/links.html.
References British Educational Communications and
Technology Agency. (2001). Speech recognition: Information and advice.
Retrieved June 9, 2003, from http://www.becta.org.uk/technology/speechrecog/information/software2.html Carnegie Communication Aids for
Language and Learning (CALL) Centre. (2001) Resources
to view or download. Retrieved Center for Spoken Language Understanding.
(2002). Automatic speech recognition at CSLU. Retrieved Christensen, B., Maurer, J., Miranda, N.,
Vanlandingham, E. (2002). Accessing the internet via the human voice. Retrieved
Diaz, R.M. (1985). "The intellectual
power of bilingualism." ERIC Document Reproduction Service No. ED 283368. Fabos, B. (2001). "Media in the
classroom: An alternative history." Proceedings of the American
Educators: Research Association. Hakuta, K., Ferdman, B.M., Diaz, R.M.
(1986). Bilingualism and cognitive development: Three perspectives and
methodological implications. Kewley-Port, D. (1994). "Speech
recognition". In A. Syrdal, R. Bennet & Mieszkowski, K. (2003). How do you say
“regime change” in Arabic? Salon.com. Retrieved Monke, L. Burniske, R.W. ( 2001). Breaking
down the digital walls: Teaching in a post-modem world. Natural Speech Communication. (2002). NSC speech recognition demo. Retrieved
Ordinate. (2002). Set 10 Demo Test.
Retrieved Postman, N. (1992). Technopoly. Salaberry, M. R. (2001). "The
use of technology for second language learning and teaching: A retrospective."
Modern Language Journal, 85(1), 39-56. Terry, R. (2002, April 16). The
Phraselator: Translation system put to the test in Zue, V., Cole, R., & Ward, W. (1996). 1.2: Speech recognition. Retrieved
Software references Let’s Go: English Language Learning.
(2001). Naturally Speaking Preferred 4.0. (2001). New Dynamic English. (2001). TeLL me More Kids. (2000). TeLL me More Pro. (2000). ViaVoice Pro Millenium
Edition. (2001). QUICK PLACEMENT TEST ON CD Publisher: Product type: Interactive English language placement test on CD-ROM Language: English by default (instructions in the following languages can be set
from the supervisor's mode: Spanish, French, German, Dutch, Italian, Portuguese
and spoken Japanese) Level: pre-intermediate to advanced Operating system: Windows 95 and above Hardware requirements: Pentium PC with a minimum of 16 MB RAM, sound card,
CD-ROM drive (at least 8 x transfer rate), 10 MB free hard disk space (650 MB
for full installation). Availability: commercial. Overview Quick Placement Test on CD-ROM (referred to as QPT
later on in this review) is a multimedia test package offering quick and
reliable assessment of English language proficiency of the testee. It matches
successfully the most recent developments in testing theory with many blessings
of computer technology such us using multimedia; its unique format allows it to
evaluate grammar, reading and listening while its banks of carefully graded
exercises are accessed selectively to finely-tune the test to the current
proficiency level of the testee (this additionally contributes to the feeling
of accomplishment that was sometimes lacking in similar tests before). Test
results can be made available to the supervisor only and are presented in a
number of 'understandable formats' (i.e. in accordance with Council of Europe
or ALTE specifications). Description The electronic version of QPT (the traditional paper
and pen version also available) makes use of the unique Computer-Adaptive
Testing (CAT) technique that enables the program to adjust automatically
to the actual language proficiency level of the taker on the basis of data
gained from previous responses. The CD contains banks of items (activities)
ordered by difficulty: if the taker fails a question - s/he is given an easier
one, if s/he succeeds - a more difficult one is posed (needless to say the
initial activity is of medium difficulty). There are about 25 questions asked.
Such procedure saves a lot of time (it takes 15-20 minutes to do the test and
results are available instantly) and the complicated statistical formulae are
there to assure reliability. QLT was initially validated by more than 5,000
students in 20 countries and supervisors are encouraged to take part in the
on-going validation procedure by sharing test results of their testees with the
test makers to make it even more reliable (one of the floppies included with
the program can be used for such a purpose). The results of the test are available in either an
Association of Language Testers in Europe (ALTE) level or points (out of 100).
The ALTE level can be translated easily (the Chart of Equivalent Levels)
into: a) Council of b) Cambridge Examination levels. The program offers a special password-protected mode
for supervisors in which they can customize: ·
the language of instruction
(nine options) ·
the amount of personal
information they want to obtain from the taker (which is stored on the hard
drive and can be accessed from the supervisor mode) ·
whether to reveal test
results to the testee (test results are by default available only to
supervisors). QPT evaluates listening, reading and the use of
English (including grammar and vocabulary), mostly through multiple choice or
cloze formats (suggestions for assessing writing and speaking can be found in
the manual). The program can be installed on standalone computers or on
networks, which means that more than one testee can have access to it at the
same time (in the latter case each taker is given different items to work with). Evaluation The electronic version has some obvious advantages
over the paper-and-pen one: ·
it checks listening
comprehension which is a major problem for many, even quite advanced, students ·
it instantly adapts to the
testee, offering gradually more challenging activities (constant challenge and
high motivation guaranteed!) ·
it is more interactive and
looks more attractive, which contributes to significantly less weariness on the
part of the testee. The few problems that were noticed while evaluating
the program were: ·
it did not let choose the
drive or directory in which to install it ·
Polish characters did not
show properly (in the supervisor mode one enters institutional data and each
testee provides some basic personal info at the beginning of the test) ·
Polish was not one of the
nine languages (or language varieties) available to users in the help mode (the
help mode can be run either before the test starts or accessed during the test
by means of a special button). It is hoped that these will be dealt with in the new
versions of the program. A much more serious issue the author of this review
had to deal with was his inability to recover the remaining user counts after
his system had crashed and he had to re-format the hard drive. User counts are
supplied on a floppy (called the Authorisation Disk) and there
are 50, 250 and 1000-use floppies currently available. All the counts are
transferred to the hard disk during the installation process (it is possible to
retrieve some/all of them later on). Since the author's hard disk had crashed
before he managed to transfer the remaining counts to the floppy, he lost them
(the CD-ROM is useless without them). The good thing was, though, that after
the author had got in touch with the on-line help, he was immediately offered a
free replacement (they should be praised here for a very prompt reply!). Maybe
it would be safer if the uses were gradually 'debited' from the floppy rather
than transferred to the hard drive all at once, or gradually obtained over the
Internet. Another drawback may be the price (see: prices in PLN), which may discourage individual teachers (floppies with
more user counts are much better value, though). Recommendation I do recommend the programme to schools, educational
institutions and individual teachers for the following reasons: ·
it is easy to install and run
and user-friendly; the interface is simple but appealing; the user manual is
detailed and on-line help is available (see the e-mail address below); it can
be installed on a few computers and/or on a network and simultaneously accessed
by more than one user ·
its assessment is quick and
accurate (the result is readily available once the test has been taken),
allowing to place many takers in their appropriate groups relatively quickly ·
the test is fun to take as it
checks a few skills in a variety of ways and it can adjust to virtually any
level (with the exception of elementary students, perhaps, who are not
encouraged to take it anyway) ·
testees can find out
instantly (thanks to the Chart of Equivalent Levels) what their current level
of advancement is and which of the Cambridge Exams they are 'ready for';
teachers can assign students to appropriate groups quickly and accurately and
have a way of dealing with late-comers joining groups as the course progresses.
For additional information and resources on the QPT go
to its official webpage (if your browser directs you to the main OUP page and
prompts you to choose your country, simply ignore the message and click on ELT
International Site link at the bottom). You can also find a free
sample of the paper and pen version (PDF) and an interactive presentation of
the CD-ROM version (Flash Player) there. The program has its own support and
information service (qpt@ucles.org.uk) that is, as my example proves, very quick and
helpful. Note This article is a
significantly extended and modified version of the review
prepared for the IATEFL Poland webpage. | ||||||||||||
|
Last Updated: July 10, 2003 |