The Acceleration of Touchless Voice Technology
We all know voice-enabled technology has grown leaps and bounds over the last several years. Just last year almost 75% of homes had a voice device in their homes. Now, as ABI Research recently reported, global shipments of smart home voice devices will increase 30% in 2020 alone. For the most part, this growth has focused on the smart home as people begin to go beyond using voice to shop, gather information, or play their favorite songs. Voice has grown to control many devices around the home: smartphones, TV remotes, light switches, thermostats, faucets, door handles, and more. We work on this stuff every day. The growth we’re going to see moving forward is “why” and the “where.”
The bottom line is this: It’s becoming pretty clear that in the near future there will be a huge acceleration in touchless technology everywhere people need to interact with a shared surface. Think commercial buildings, elevators, offices, hotels, hospitals, industrial spaces, banks, restaurants, and retail. The list goes on. Yes, this trend was already in the works, but with COVID-19 and other possible viruses in the future, voice control shifts greatly from being a “nice-to-have” option to a clear necessity because the public will be demanding it. Once people experience the first elevator they can speak to, the first vending machine they can order from, and the first iPad they can sign in with without having to touch any buttons or screens, the consumers will be pushing forward the new normal. The question won’t be coming from brands and it won’t be “Why Voice?”. The questions will be coming from the consumers and they’ll be asking, “Why can’t I just use my voice for this?”
Going Touch-Less in the Public Square
Just consider all the public spaces where people come in contact with surfaces:
- Bank ATMs
- Commercial lobbies and elevators
- Kiosks or vending machines for ordering food, tickets for public transportation and airports
- Drive-through transactions
- Medical equipment and healthcare facilities
- Industrial facilities and factory floors
- Gyms, bathrooms, conference rooms, hotels, lunchrooms, classrooms, self-check-out grocery stores, and the list goes on.
In many cases, smartphone apps or motion sensory devices will drive the touch-less interactions, but not all of them. For instance, not everyone is going to want to fiddle with an app to activate their devices. Instead, precision voice, coupled with AI and visual cues, will play important roles.
For voice to achieve its full potential, it will need to operate from greater distances and in many different challenging environments.
Voice Will Need to Get Better
As demand grows, brands, product developers, IoT companies and other consumer-facing businesses will need to adopt better voice technology than what is currently being offered and used in the marketplace. What is out there now is just not good enough.
While current devices work well, they still have some limitations. For example:
- Many devices are still push-to-talk and aren’t listening for commands until you physically push the button
- How close you need to be from a device or the mic in any given situation
- Trigger words are still required
- External ambient noise and other people speaking can mask commands
- Performance diminishes in large spaces and around corners
People are going to demand better. The user experience will be paramount and brands will have to deliver. Safe interaction will be a priority. Using voice control and audio feedback to create a more touchless experience at home, in public and workplaces raises the bar for smart voice solutions.
Much like what’s happening in the home, voice is going to need to address all these requirements for a more touch-free, but frustration-free environment in the public space. What will need to be taken into consideration is the currently challenging acoustic characteristics in public places where voice-controlled vending machines, kiosks, POS systems, or elevators, for example, would be located. It will require the devices to function in highly reverberative environments, and learn and suppress persistent noise sources such as air conditioners, piped music, and PA systems. It will need to capture voice commands and barge-in at longer distances while suppressing the multiple loud noises in public spaces.
The Next Generation of Far-Field Technology
Fortunately, changes have already been made to address these issues. Using far-field technology and advanced mic-array configuration offers a clear performance advantage for capturing voice commands from twice the standard distance (or farther), around corners, in noisy and reflective environments. It’s done by identifying, memorizing, and suppressing persistent noise sources. It performs significantly better than the existing technology from leading OEMs.
Far-field solutions, like our own advanced ArkX EveryWord ™ audio/voice modules featuring Cirrus Logic technology, view sound in three dimensions, not just two. It works at longer range, discriminating the users from other voices and noises.
Unlike the 2-dimensional planar view of other technologies, it adds a new dimension to far-field voice by analyzing sound in 3D, increasing range and precision while capturing voice and suppressing noise. It utilizes 12 AECs versus three or four, allowing many more noise sources to be canceled. Additionally, it can map a room in 3 dimensions to allow more resolution in identifying the person giving the command.
EveryWord ™ also solves one of the biggest challenges of mic placement and configuration, allowing devices to be placed virtually anywhere. This enables both wider coverage over large living or public spaces when placed on tables, counters, in walls or ceilings, where they would otherwise be subject to splatter and obstruction. The result is a significantly better performance than existing technology from leading OEMs.