SEE MORE LEGAL SOLUTIONS »
Your premier source for legal solutions, including Thomson Reuters Westlaw and law books.
Practice Innovations – Managing in a changing legal environment
Gray Rule
January 2013 | VOLUME 14, NUMBER 1
Gray Rule
Voice Activated Computing  –  Does It Really Work?
»For easy printing – view as PDF

 

IN THIS ISSUE:
spacer

»Making Attorneys Champions of Data and Information Usage
»You Can't Manage What You Can't Measure: A Sneak Peek at Westlaw Analytics
»Essential Technology for the Minimalist Lawyer
»Voice Activated Computing – Does It Really Work?
»Security in Era of Mobile Devices and Cloud Computing
»At the Crossroads of Lawyering and Technology: Ethics
»Back to Contents

LINKS:
spacer
» About Practice Innovations
» Editorial Board
» Past Issues
» Reader Feedback

Voice Activated Computing  -  Does It Really Work?Don Philmlee, Legal Technology Consultant Washington, D.C.
It is astonishing to have a computer accurately interpret and comprehend the amorphous blob of natural language quickly and accurately, yet today's newest smartphones and tablets often manage to accomplish this in just seconds and with a device that typically costs only a few hundred dollars.

The world of fiction is full of amazing examples of talking interactive computers, from the misbehaving HAL 9000 in the movie "2001: A Space Odyssey," to Deep Thought in the "Hitchhikers Guide to the Galaxy," to the extremely interactive computer in "Star Trek." For many years a vocally interactive computer has been relegated to the realm of science fiction, leaving us mere mortals in the real world to either push buttons on a keyboard, click a mouse, or touch a screen.

In the past few years voice technologies have advanced and we appear to be on the verge of a tectonic change.

Where are the voice-enabled computers today?

Speech technology is undergoing a renaissance. Today you can pick up your smartphone and say, "Send text to Don – 'Meet you in an hour.'" And what happens? Your phone doesn't miss a beat. It will quickly comply and accurately generate a message for you to send. No fuss. No muss. It is astonishing to have a computer accurately interpret and comprehend the amorphous blob of natural language quickly and accurately, yet today's newest smartphones and tablets often manage to accomplish this in just seconds and with a device that typically costs only a few hundred dollars.

Today even the lowly automobile GPS can talk and give directions. Mobile phones, tablets, computers, and cars are now talking. Not only have computers gained a "voice," but they are also starting to interact with us in our daily lives in real and dynamic ways.

Inherent Problems

Getting to this technological plateau has not been easy. To appreciate this technology, it helps to know some of the problems that have been overcome.

The world is a noisy place – For a computer to interpret what you are saying, it needs to hear your voice. While this can be an easy thing to do in a closed office, hotel room, or at home, it is a much more challenging accomplishment in today's world of users that talk-as-they-walk, work at a noisy Starbucks, or inhabit a cubicle hive. Computers today often include noise-canceling microphones to better isolate extraneous noise, and software that can focus more tightly on your voice when you are speaking.

Computers are too slow – The human brain can process speech quickly, and so we can talk very quickly to communicate what we want. In the past, computers just haven't been geared for this task and just could not keep up. However, in the past few years this has changed as speech processing is now "baked" right into both the underlying hardware and into mainstream operating systems like Windows, Apple Macintosh, or Android.

Varieties of Human Speech and Language – Controlling a computer with a human voice is tough. A computer not only has to differentiate the words being spoken, but it must discern the correct context and meaning of what is being said. This can be an exceedingly difficult task as voices and language can vary in timbre and tone. A single language is fluid, and meaning can vary with dialects, new slang, and local phrases. This issue comes down to sampling. For instance, to correctly recognize the word "aluminum" a computer needs to know that it is English and be able to differentiate between the distinctive ways various people will pronounce the word. If it has vocal samples to choose from, it can quickly identify the word correctly. The typical solution for this problem is to limit the languages the computer will recognize and have the user laboriously train the computer.

Training is tedious – Training the computer to recognize your voice can be time consuming and frustrating, and has been a major stumbling block to the acceptance of voice technology. However a new solution has arisen that doesn't involve training. It involves "modeling" large samples of spoken language to create a high quality language model. Extensive sampling results in superior voice recognition, with no training. As these models tend to be large and grow even larger, they are often hosted and accessed from the Internet. Much of today's consumer-grade voice recognition technology is built upon this concept, most notably Apple's personal assistant Siri (see below) and Google's Voice Search.

Privacy Issues – As mentioned above, to improve the speed and efficacy of voice recognition, the language model often resides on and is accessed from the Internet. As such, anything you dictate is recorded and sent to the vendor (Apple, Google, etc.) to convert what you say into text. Certain metadata might also be sent, such as your name, the names of your address book contacts, or more. While all of this data is ostensibly and solely used to help the device (such as a smartphone or tablet) better recognize what you say, it does pose a potential privacy issue for your data.

What can you do today?

Broad advances in the underlying speech recognition algorithms and in processing power on both mobile devices and the servers that they access have given us a new generation of voice technology that is responsive, interactive, and requires little or no training. The following discussion of technology is a small example of what is going on in the world of voice recognition.

Microsoft

Microsoft has voice recognition baked right into its Windows operating system. You can use specific voice commands to control your computer and it can be used for dictation, but requires a modicum of training. Despite being a fairly powerful feature for Windows, it does not get the kind of publicity it deserves. Also, Windows 8-based phones include a conversational and interactive speech recognition system that is similar to the Siri feature on Apple's iPhone or iPad.

Google

Google's voice offerings are wide, varied and sometimes confusing, but they work very well. They currently do not provide conversational interaction. Their applications include:

  1. Voice search: Vocally search your desktop and mobile device.
  2. Voice actions: Search and control your phone including GPS and SMS.
  3. Voice input: Why type? This feature lets you just type what you want in your Google Android phones and tablets.

Apple

Apple's Macintosh operating system can translate words translated to text. Like Windows, it is not a separate application but is baked right into the operating system. As with Microsoft Windows, this feature works in any application where text can be entered. No training is required.

Another Apple product, Siri, is a marquee application available for the iPhone and iPad. Remarkably it goes beyond just understanding your voice and interacts with you using natural language. Using a conversational style of intuitive responses, Siri almost seems to have a personality. It represents a paradigm shift in how we interact with computers.

Nuance

The 900-pound gorilla of voice technology is a company called Nuance Communications. They are the purveyors of the long popular and effective voice recognition software Dragon Naturally Speaking. They are very deeply involved in voice technology from user applications, to corporate and cloud applications, to military applications. They are rumored to provide the servers that power Apple's Siri application, which is mentioned above.

They provide voice dictation and voice search applications for mobile phones, computers, tablets, cars, and more. Their products go beyond basic dictation and can help you vocally format a document, search the Web, send e-mail, and more.

Where is this technology going?

As the voice technology matures, the online vocal "sampling" datasets collected by companies like Apple, Microsoft, or Google will continue to grow and mature. This will allow voice technology to expand into new and unpredictable places in our work and home lives.

When voice technology proliferates, a voice-enabled system might always be ready and waiting to tell you what you need to know. It could synchronize with other computers and provide a uniform and unbroken "conversation" as you move from home to car to office (or other locations) and interact with it.

As voice recognition improves you may see it interpret more than the meaning of your words:

  • Emotional or mood recognition – Imagine calling a support line and having the system detect if you are angry.
  • Footstep recognition – As you are moving thru your office or home the computer can identify you and prepare for your arrival.

Developers are creating a mirror for the bathroom that talks to you, shows you news headlines, updates your prescriptions, schedules appointments, and even helps you match articles of clothing and more, all by simply asking it to do so.

Voice technology will likely be infused with more and more artificial intelligence. This technology will involve much, much more than just good voice recognition. The computer must interact and reason with you. This level of technology is not commonly available to corporations or consumers yet, but it is only a matter of time before it is.

Today, applications like Apple's Siri often seem "intelligent" but the app is merely recognizing your words and then outputting a guessed response based on those words. It does not truly understand you, but it is recognizing a pattern in your words. So your smartphone is not so smart, quite yet.

But this distinction between whether a voice app is truly "intelligent" may soon be moot. Today a human can easily exhaust the conversational possibilities of apps like Siri that may only have 100, or possibly 1,000, available responses. We can easily recognize that it is not an artificial intelligence. But what happens when such a voice recognition app has over a million possible responses and it becomes extremely adept at interacting with you? That is when voice technology will become truly amazing.

Sources

"Intel Says Future Ultrabooks Will Come with Touchscreens, Voice Recognition," PCWorld, January 9, 2012.

"Apple's Siri and the Future of Artificial Intelligence," Forbes, October 15, 2011.

"Many Cars Tone Deaf To Women's Voices," AOL Autos, May 31, 2011.

"NY Time's R&D Lab Brings Voice-Activated Computing To The Bathroom Mirror," Maximum PC, September 21, 2011.

"Toronto Trio Develops Voice-Activated Computer," The Globe and Mail, September 7, 2012.

"With Siri, Apple Could Eventually Build a Real AI," Wired Magazine Cloudine, October 17, 2011.

"Chinese Room Argument," Internet Encyclopedia of Philosophy.

"Windows Phone 8 Lets You Have 'Conversations with Apps,'" June 20, 2012, The Verge.

Back to Contents