Speech Recognition


Lisa Wadors Verne
Senior Program Manager: Education Research and Partnerships
Benetech

And

Charles LaPierre
Technical Lead: DIAGRAM and Born Accessible
Benetech

What Is Speech Recognition?

Speech recognition is the process of converting human speech into a machine-readable format. Today many people interact with their devices, such as phones and computers, through a variety of speech recognition mechanisms. The technology in this space is moving very quickly as more and more consumers use their voices to complete a variety of tasks, from sending text messages to ordering groceries.

“Speech recognition works using algorithms through acoustic and language modeling. Acoustic modeling represents the relationship between linguistic units of speech and audio signals; language modeling matches sounds with word sequences to help distinguish between words that sound similar” (Syntony, n.d.).

Simply put, a user speaks into a microphone that is connected to a computer, and, through complex algorithms, the sound waves are converted into numbers that can be interpreted by the computer (Geitgey, 2016), which can then convert it into something like plain text or respond to an interpreted command. The example below depicts a single word being analyzed; however, modern speech recognition programs examine whole sentences and take into account speaker identity to improve speech recognition accuracy.

Also known as “automatic speech recognition” (ASR), “computer speech recognition,” or just “speech to text” (STT), this technology utilizes machine learning and big data to adapt and increase accuracy. Note: while the terms are often used interchangeably, speech recognition and voice recognition are different. Speech recognition detects and identifies spoken words while voice recognition can detect patterns of an individual speaker. For the purposes of this report, we focus on speech recognition.

 

Why Is This Important?

Speech recognition dates back to the early 1950s, and while it has been in use for over half a century, it has finally hit mainstream technology (Davies et al., 1952). In 2017, people are inundated with speech recognition products. It is built into our phones, tablets, smart watches, cars, and homes. We are familiar with the names Siri (Apple) and Alexa (Amazon) and command them to do a variety of tasks ranging from calling friends to reading books. Now, for the first time, people are controlling their environment, such as turning on the lights or asking for directions, by using words instead of physically flipping a switch or typing in addresses.

Andrew Ng, professor at Stanford University, predicts that as the technology nears 99% accuracy, it will be the primary way we interact with computers. Today it is estimated that this technology performs at a 95% accuracy rate (Geitgey, 2016) for persons without disabilities, but the 4-5% margin “is the difference between “annoyingly unreliable and incredibly useful(Geitgey, 2016). Furthermore, Deng & Huang, 2004 find that people still want human-like free-flowing speech that is spontaneous.

 

Who is Working on This?

According to the new market research report, (Speech & Voice Recognition Market, (n.d.)) by Marketsandmarkets.com, the speech recognition industry is expected to grow from $3.73 billion in 2015 to $9.97 billion by 2022. Grand View Research estimates that by 2024, the global voice recognition market could reach $127.58 billion (Grandview Research. (n.d.)). These predictions are supported by Morgan Stanley who estimated that Amazon sold 11 million units of their speech technology device, Echo, between mid-year 2015 and November 2016. An additional nine million units were sold during the 2016 holiday season (González, 2017). The Echo rivals similar voice-controlled devices from other technology giants that are predicted to be the future of personalized computing. Below are examples of products that are already available.

  • Amazon Echo – A hands-free talking computer speaker that you control with your voice, Echo can answer questions, read the news, report traffic and weather, read audiobooks from Audible, provide information on local businesses, provide sports scores and schedules, and more using the Alexa Voice Service. Echo claims to be the first hands-free screen reader for persons with visual impairments. Special features include:
    • Ability to decipher human speech from across the room while music is playing.
    • Ability to control lights, fans, TVs, switches, thermostats, garage doors, sprinklers, locks, and more with compatible connected devices from WeMo, Philips Hue, Sony, Samsung SmartThings, Nest, and others.
  • Apple Siri – A digital personal assistant that comes preinstalled on the iPhone, iPad, MacBook, Apple Watch, and Apple TV. It recognizes speech and can complete tasks such as reporting the weather or controlling devices in your home. It can also “learn” to pronounce words and names correctly using machine-learning algorithms.
  • Dragon NaturallySpeaking – Speech recognition software that writes emails and can be used with all Microsoft programs. It uses Bluetooth technology for hands-free operation and allows you to create custom word lists and voice commands.
  • Google
    • Voice Search – A Google Chrome feature that allows the user to navigate web pages without the use of a mouse. iOS users can utilize the Google translation features while using Voice Search.
    • Google Home – Similar to Echo and Siri, this home assistant can control fixtures in the home as well as answer questions. Home uses voice recognition to provide a personalized interaction with the user.
  • TalkTyper – Free speech-to-text dictation software that works in your browser using the Google voice algorithm. It lets you copy text to the clipboard, email it, print it, tweet it, and translate it into another language.
  • Tazti – Speech recognition and voice recognition software that allows users to control PC commands and games with their voice. If a command does not exist, users can easily create one without needing special technical skills. Tazti is designed for gamers who want to use their voice to control online and digital games.
  • Windows Speech Recognition Dictation – A feature that comes preinstalled on Vista, Windows 7, and Windows 8 PCs. Once the speech recognition preference is turned on, users can activate the feature by saying “start listening.” This easy-to-use feature can turn a computer on and off, receive dictation, and open browsers.

 

How is Speech Recognition Applied in Education?

 

In a report on reading for 2011, the National Assessment of Educational Performance reported that only one in three U.S. students is able to read and understand grade-level material (NAEP, 2011). While these statistics have remained constant over the years, the results are tightly correlated to formal education (Adams, 2016). Research shows that just ten minutes per day of oral reading can improve reading outcomes (Pinnell et al., 1994).

For early readers, the use of one-to-one instruction has shown to improve reading scores in fluency, comprehension, and word recognition (National Reading Panel, 2000). Unfortunately, in many classrooms, teachers may not have the time to have this kind of interaction with all of their students (Moody et al., 1997; Derwing, et al. (2000)). The use of speech recognition technology can not only address the time issue but also provide the teacher with detailed reports of the students’ progress (Adams, 2005). Students can use this technology to read passages and receive feedback to improve their reading scores. Teachers are able to review the feedback and intervene with more personalized instruction as necessary.

Students can use this technology to not only to get feedback on their reading fluency but also to decrease the amount of time they spend writing. The average person can speak 120 words a minute but can only type 40 words per minute (Raskind et al., 1999). The use of speech recognition technology can now bridge that gap making it easier and quicker to complete assignments. Teachers also benefit from the ability to make comments on assignments that students can listen to, eliminating the potential difficulty of decoding handwriting. Lastly, students can record and get transcripts of their lectures and not miss anything in class because they got distracted or were busy taking notes.

 

Challenges and Opportunities for Students with Disabilities

Speech recognition technology has been used with students with certain disabilities for some time. Many researchers have studied the effects of using speech recognition software on persons with traumatic brain injuries, learning disabilities, and dysarthria, among other disabilities.

 

Challenges

It is important to understand that “smart” technology solutions out of the box are more effective after training the software to respond to an individual’s unique speech pattern and vocabulary. In a study that looked at three different speech recognition programs (Hux et al., 2000), explored the use of speech recognition for persons with dysarthric speech productions and other physical disabilities that make it challenging to use keyboards. They found that the technology needed to be flexible to account for the differences in the person’s speech patterns that may vary depending on stress and fatigue levels, or to account for variations in speech production related to articulation, vocalization, and respiration.

To understand the complexity of the software, Hux et al., 2000 also compared a student with a traumatic brain injury to a typical neurodiverse student using three different software programs. They learned that the students needed significant user training to operate the three programs to increase accuracy in the outputs. While neither subject made large gains using the software, the authors hypothesized output could improve with additional time spent training the technology to respond to the individual user’s speech patterns and vocabulary.

Noted above are just some of the issues and opportunities for using speech recognition software for students with disabilities. Additional considerations are needed when evaluating this technology for students whose oral language ability is better than their writing skills because of motor issues, traumatic brain injury, or cultural accents. We will explore some of these issues and opportunities in the “Opportunities” section below.

As mentioned above in the section called, “Who Is Working on This?,” there are a number of specialized and mainstream technologies employing the use of speech recognition that can be used by students with disabilities. While some are free and built into programs like Word and Google Chrome, many of these programs have limitations to functionality. Hux et al., 2000 compared Dragon NaturallySpeaking with Microsoft Dictation and VoicePad Platinum (a Kurzweil product) and found large disparities between the technologies, and all three required training to increase the accuracy of the outputs. For educators, these issues bring up two barriers: cost and time. While VoicePad ($15.99) and Microsoft Dictation (free) are low cost options compared to Dragon NaturallySpeaking ($299), according to Hux et al., 2000, Dragon outperformed the other two after five training sessions. Even though it is hypothesized that both VoicePad and Dictation might have better performance with more training, teachers face significant budgetary and time constraints. In addition to the time needed to train the software to be responsive to students’ speech and vocabulary, the studies conducted by Derwing, et al. (2000), Higgins & Raskind (1995 & 2000), and (MacArthur & Cavalier, 2001) found that students also needed a significant amount of time to be trained on the software.

Another challenge is that the technology still struggles with the ability to capture words from people with accents, speech impediments, and languages other than English. Derwing, et al. (2000) explored the use of speech recognition technology with high-proficiency English speakers whose native languages were not English to see if the software could identify pronunciation variants in Cantonese, Canadian English, and Spanish accents. They found that the software was able to recognize native English at a 90% accuracy rate compared to 71% for Cantonese-accented and 73% for Spanish-accented language. Human listeners had a much greater ability to comprehend the accents compared to the software, and the authors suggest that in order to have successful implementation of this software in the classroom, the technology must be improved to understand human speech.

To address the issue with the software not recognizing accented language, Mozilla, a nonprofit dedicated to making the internet accessible to all, has created Project Common Voice. This crowdsourcing project is gathering voices from over 10,000 people with a wide range of accents so that the speech recognition software will have more voice samples for increased accuracy.

 

Opportunities

As the technology improves, so do the opportunities for students with disabilities in the typically evolving classroom. More students with disabilities are being included in mainstream classrooms, and teachers have to find creative ways for students to participate in all aspects of the class. When writing tasks take a disproportionately longer time for some students, teachers must accommodate students’ needs while providing supports for success. Students with learning disabilities often struggle with writing and spelling tasks. While speech recognition software will not improve students’ spelling, it can remove barriers that make the writing process difficult (NCIP, 2003).

In a group of studies exploring the effects of speech recognition technology for students with learning disabilities and non-native speakers, the authors found that the use of speech recognition software can compensate for some deficits. The software improved word recognition, comprehension, and increased the complexity of the writing by allowing students to use their oral word vocabulary which is often larger than their written-word vocabulary (Higgins and Raskind, 1995; Higgins and Raskind, 2000).

In another study conducted by MacArthur & Cavalier, 2001, the authors looked at speech recognition technology as an accommodation in large-scale assessments to determine whether the use of dictation technology gave the students support to overcome their disability or an unfair advantage over their peers on the assessments. They found that students with learning disabilities who used the dictation accommodations greatly improved their scores compared to the scores they received for hand-written work, but they still scored lower than their peers, thus proving that the technology is an accommodation and not a tool that would help any student improve.

Some students with attention disorders struggle with typing on a keyboard because of how long it takes them to communicate their thoughts. The use of speech recognition software speeds up the writing process and can help these students achieve their maximum potential. Students with some physical disabilities also have difficulty typing on a standard keyboard. Speech recognition software helps students with physical limitations or limited hand and motor coordination produce written assignments and navigate web pages (NCIP, 2003).

The future of speech recognition technology has potential to be used as a diagnostic tool. When students read aloud into the microphone, the computer can evaluate their reading fluency, accuracy, and speed. While the technology still requires human interaction for instruction, the ability to capture reading abilities helps teachers personalize coaching to meet the unique needs of each learner.

 

Stories from the Field

Students with disabilities are increasingly being educated with their peers without disabilities. This push for inclusion provides opportunities for students with disabilities to have access to advanced concepts, but also requires additional supports. Speech recognition software was a game changer for Erin Winkles, a law student with Dyslexia. Her speech recognition software helped her coherently articulate her thoughts verbally instead of struggling with formulating her ideas on paper. As Winkles said in a 2017 interview:

“Since using speech recognition software, my life has changed. I’m number one in my class and I think the software is the reason for this. It helps me put my thoughts down in a more coherent manner. I think it is wonderful and it helped me reach the top of the class” (Robbie, 2017).

One teacher in Danville, California, also extolled the virtues of speech recognition software for seventh-grader Kevin O’Brien, who saw big gains as he attached a wireless device with software to his wheelchair.

“Kevin can be far more involved in group activities. He can converse with his peers, participate in class discussions, and do his homework, no matter where he is. This has increased his ability to be an independent member of the school and the community” (Hayes, 2013).

 

Conclusions / Actions

As the technology improves and becomes integrated further into our lives, we may one day move from simply telling a device to do something to a place where that device can interpret our comments and provide suggestions based on their own ideas. For example, a student with executive function issues may tell her calendar to set an alarm for 7:00 a.m. to leave for school. The software in the device can determine that the student needs to pack her lunch and get dressed, so it may suggest that she set an alarm for 6:00 a.m. to get up, 6:10 a.m. to take a shower, 6:35 a.m. to get dressed, 6:50 a.m. to make a lunch, and 7:00 a.m. to leave for school. The possibility for these communications and interpretations are here today and may one day soon be commonly available.

 

Educators

  1. Teachers and students need to work together to make sure that they both understand how to use speech recognition technology.
  2. If speech recognition training is part of your curriculum, help students identify which computers in your school are equipped with voice recognition technology by pairing enabled computers with custom imprinted mouse pads featuring tips for using the software.
  3. Allow time to train the software to improve accuracy.
  4. Ensure that the students who have this accommodation on their Individualized Education Programs (IEPs) are using the technology for all writing tasks.

 

Parents

  1. Visit your local Parent Technical Assistance Center to evaluate various speech recognition tools before making a purchase.
  2. Your child’s school may have assistive technology (AT) tools available to test. Even if your child’s school does not provide and pay for your child’s AT, do not hesitate to use it as a resource before you purchase AT tools for your child.
  3. Some software publishers, like Dragon, have websites that offer demonstration versions. Other publishers offer fully operable programs for a thirty-day trial. Check if free-trial offers are available for the products you are interested in.

 

Students

  1. Allow time to become familiar with the software tools.
  2. Some community colleges have assistive technology centers where you may be able to try different types of AT tools.
  3. Try a variety of products to find the ones that work best for you.

 

References

  • Adams, M. J. (2016, May 11). Bringing Speech Recognition to Reading Instruction. Retrieved July 23, 2017, from http://www.edweek.org/ew/articles/2011/11/29/13adams.h31.html
  • Adams, M. J. (2005). The Promise of Automatic Speech Recognition for Fostering Literacy Growth in Children and Adults. In M. McKenna, L. Labbo, R. Kieffer, & D. Reinking (Eds.), Handbook of Literacy and Technology, Volume 2. Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Davies, K.H., Biddulph, R. and Balashek, S. (1952) Automatic Speech Recognition of Spoken Digits, J. Acoust. Soc. Am. 24(6) pp.637 – 642
  • Deng, L., & Huang, X. (2004). Challenges in adopting speech recognition. Communications of the ACM47(1), 69-75.
  • Derwing, T. M., Munro, M. J., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL quarterly34(3), 592-603.
  • Geitgey, A. (2016, December 23). Machine Learning is Fun Part 6: How to do Speech Recognition with Deep Learning. Retrieved July 26, 2017, from https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-learning-28293c162f7a
  • González, A. (2017, January 18). Amazon has sold more than 11 million Echo devices, Morgan Stanley says. Retrieved July 23, 2017, from http://www.seattletimes.com/business/amazon/amazon-has-sold-more-than-11-million-echo-devices-morgan-stanley-says/
  • Grandview Research. (n.d.). Voice Recognition Market To Reach $127.58 Billion By 2024. Retrieved August 7, 2017, from http://www.grandviewresearch.com/press-release/global-voice-recognition-industry
  • Higgins, E. L., & Raskind, M. H. (1995). Compensatory effectiveness of speech recognition on the written composition performance of postsecondary students with learning disabilities. Learning Disability Quarterly, 18, 407-418.
  • Raskind, M. H., & Higgins, E. L. (1999). Speaking to read: The effects of speech recognition technology on the reading and spelling performance of children with learning disabilities. Annals of Dyslexia, 251-281.
  • Higgins, E. & Raskind, M. (2000). Speaking to read: the effects of continuous vs. discrete speech recognition systems on the reading and spelling of children with learning disabilities. Journal of Special Education Technology, 15, 19-30.
  • Hayes, H. B. (2013, March 28). How Technology Is Helping Special-Needs Students Excel. Retrieved July 23, 2017, from https://edtechmagazine.com/k12/article/2013/03/how-technology-helping-special-needs-students-excel
  • Hux, K., Rankin-Erickson, J., Manasse, N., & Lauritzen, E. (2000). Accuracy of three speech recognition systems: Case study of dysarthric speech. Augmentative and Alternative Communication16(3), 186-196.
  • MacArthur, C. & Cavlier, A. (2001). Dictation and speech recognition technology as accommodations in large-scale assessments for students with learning disabilities. Data from study, unpublished.
  • Moody, S. W., Vaughn, S., Schumm, J. S. (1997). Instructional grouping for reading: teachers’ views. Remedial and Special Education, 18, 347-356. Mostow, J. & Aist, G. (2001). Evaluating tutors that listen: An overview of Project LISTEN. In K. Forbus & P. Feltovich (Eds.), Smart machines in education: Cambridge, MA: The MIT Press.
  • NAEP Report Cards – Home. (n.d.). Retrieved July 23, 2017, from https://www.nationsreportcard.gov/
  • National Center to Improve Practice (NCIP). (2003, February 6). Use of Voice Recognition in Special Education. Retrieved August 7, 2017, from http://www.rehabtool.com/forum/discussions/97.html
  • National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Rockville, MD: National Institutes of Health.
  • Pinnell, G. S., Lyons, C. A., DeFord, D. E. Bryk, A. S., & Seltzer, M. (1994). Comparing instructional models for the literacy education of high-risk first graders. Reading Research Quarterly, 29, 8-39.
  • Robbie, A. (2017, April 18). For students with special education needs, speech recognition technology has the ability to unlock new opportunities. Retrieved July 23, 2017, from http://www.itproportal.com/features/for-students-with-special-education-needs-speech-recognition-technology-has-the-ability-to-unlock-new-opportunities/
  • Speech and Voice Recognition Market by Technology (Speech Recognition, Voice Recognition), Vertical (Automotive, Consumer, Banking, Financial Services and Insurance (BFSI), Retail, Education, Healthcare & Government) and Geography – Global Forecast to 2023. (n.d.). Retrieved August 7, 2017, from http://www.marketsandmarkets.com/Market-Reports/speech-voice-recognition-market-202401714.html
  • (n.d.). Speech Recognition, IVR & CTI. Retrieved July 23, 2017, from http://syntony.co.uk/speech-recognition-ivr-cti/
  • WhatIsIt? (n.d.). What is speech recognition? – Definition from WhatIs.com. Retrieved July 23, 2017, from http://searchcrm.techtarget.com/definition/speech-recognition

 

Resources


Published: 2017-08-31

Ideas that work.The DIAGRAM Center is a Benetech initiative supported by the U.S. Department of Education, Office of Special Education Programs (Cooperative Agreement #H327B100001). Opinions expressed herein are those of the authors and do not necessarily represent the position of the U.S. Department of Education.

HOME | BACK TO TOP

  Copyright 2019 I Benetech

Log in with your credentials

Forgot your details?