Can you hear me now?: Voice picks up steam as bot hype fades

By Max Willens • April 11, 2017 •

In 2016, an unusually high number of bright, shiny objects were waved in the faces of media executives: Virtual reality! Live video! Artificial intelligence! Personalization! Voice! Bots!

Like all good media executives, many of them pounced, and by the end of 2016, publishers big and small were declaring they’d opened VR studios, built bot teams and unleashed crack squads of live video mavens to help them win the internet.

But now, reality has set in. By some estimates, augmented and virtual reality are a decade away from becoming mainstream technologies. Bots and the artificial intelligence that powers them are much more valuable to retailers than they are to media companies. Live video barely works for anything besides sports, and by the time you read this, Facebook and YouTube will probably have changed their minds about what kind of video they’d like media companies to make (again).

There is, however, one exception to this rule. Even though the number of voice-powered devices like the Echo, Dot and Google Home sold by the end of 2017 is expected to be far below the number of people that, say, go fishing every year (about 33 million), voice will have more effect on media — both positive and negative — than any of the other whizbang technologies that have grabbed so many headlines the previous two years.

Here’s why:

Voice works
While people expect artificial intelligence and virtual reality will be ready for prime time eventually, the speech recognition technology that makes voice possible is already here. In 2010, machines could understand about a million words at about 70 percent accuracy. By 2015, that number had risen past 10 million, and at about 90 percent accuracy, according to Google research that Kleiner Perkins Caufield Byers analyst Mary Meeker shared in 2016.

Getting to 100 percent, where Google or Alexa (or something else we haven’t met yet) understands words we mispronounce, or utter by mistake, is probably another few years away. But today, it’s possible to ask an assistant about almost anything, and people are asking: according to Hound, a voice-enabled platform built by Shazam competitor Soundhound, its active users pose Hound multiple questions every day, on a wide variety of topics.

Andrew Ng, chief scientist at Baidu, says that half of all internet queries by 2020 will be done either through voice or speech.

The hardware is already here
Another big piece of this puzzle that’s already been solved is hardware. It could be years before Oculus Rift (which costs $600 and requires a powerful PC with serious graphics processors) become widely affordable and before smartphones that can handle live video streaming are ubiquitous.

Meanwhile, an Echo Dot will set you back just $50, and pretty much every smartphone on the market can handle cloud-powered voice queries. And if you don’t have your phone on you, that’s not necessarily a problem, either.

“It’s not just your phone, your Amazon Echo,” says Beerud Sheth, founder of voice and chat development platform Gupshup. In addition to the Echo and Google Home, Ford, VW and BMW are all working on cars that have Amazon’s voice platform, Alexa, inside them. Google Assistant, which is expected to become standard on high-end Android smartphones, is also expected to make its way into Android Auto, an operating system available in a growing number of connected automobiles.

And that, Sheth says, is just the beginning. Soon, voice platforms will be accessible from smart devices that we would never consider technologically advanced. “It’s your toaster oven, maybe even your table and chair,” Sheth says.

The advertising infrastructure is there
Whenever a new medium emerges, it takes a while to figure out what its advertising will look like (or, in this case, sound like). But voice has a head start. Thanks to nearly a century of terrestrial radio advertising, publishers including The Washington Post are already monetizing their flash briefings on Alexa.

And once Google and Amazon add the ability to programmatically target listeners, the money is going to flow. Audio advertising is already projected to account for nearly 12 percent of marketers’ budgets and programmers’ ad placements by the end of 2017, more than double the share it claimed in 2015, according to a survey conducted by Ad Age and The Trade Desk.

Add in the ability to claim an offer, or make a purchase directly through a media company’s skill, which is expected to be possible by the end of 2017, and it’s on: Suddenly, voice becomes a way to drive transactions, subscriptions, and other meaningful revenue sources

“What radio’s always been missing is a direct back channel,” said Pat Higbie, the founder of XAPP Media, a digital ad developer and a top Alexa developer. “What we have here is the intimacy of radio as well as the instantaneous feedback from users.”

The money for future innovation is already there
Amazon has set aside up to $100 million to invest in companies that it thinks could boost voice. An accelerator program, created in partnership with the mentorship-focused accelerator firm Techstars, will launch in July. And while Google does not have a similar voice-oriented fund, partners at its venture capital arm, GV, have said that they think voice is going to be the future.

“They are very invested in identifying use cases they’re not thinking about internally,” says Cody Simms, who heads accelerator programs for Techstars, said of Amazon.

The platforms are already there
If there’s one thing guaranteed to inject rocket fuel into a new idea or technology, it’s goliaths like Google, Amazon or Microsoft battling to own its ascent. Consider what competition between Facebook and Google did to increase the profile of streaming video.

That fight will be good for speeding the innovation that’s sure to occur on Google Assistant, Alexa et al. But what will be most interesting is seeing what happens when that innovation starts to really distinguish these nascent platforms from one another.

For now, there is very little that separates Google from Amazon. But once Google allows users to control things like Gmail, or Gcal, or YouTube from Assistant, or starts using people’s search histories to personalize each listener’s voice experience, it’s going to be very different from Amazon, which in turn will have access to an enormous trove of user purchase history, intent and other data.

“People aren’t going to want to have to interact differently, depending on what microphone they’re talking to,” said David Beisel, a partner and co-founder at NextView Ventures. “There’s a lot of complication there.”

But that’s not a problem for right now. For now, the stage is set for voice to take off. And while Amazon, Google and Microsoft’s long-term strategic visions for voice may differ, they’re all going to be focused on the same thing, for now.

“If they want to foster a rich ecosystem,” Beisel said, “they’re going to have to reward folks for it.”

Can you hear me now?: Voice picks up steam as bot hype fades

More in Media

Media Briefing: Turning scraped content into paid assets — Amazon and Microsoft build AI marketplaces

Overheard at the Digiday AI Marketing Strategies event

Digiday+ Research: Dow Jones, Business Insider and other publishers on AI-driven search