This week noticed the publication of Mary Meeker’s annual Internet Trends report, packed full of knowledge and insights into the event of the web and digital know-how throughout the globe.
Particularly of curiosity to us right here at Search Engine Watch is a 21-web page part on the evolution of voice and pure language as a computing interface, titled ‘Re-Imagining Voice = A New Paradigm in Human-Computer Interaction’.
It appears at tendencies in recognition accuracy, voice assistants, voice search and gross sales of units just like the Amazon Echo to construct up an correct image of how voice interface has progressed over the previous few years, and is more likely to progress sooner or later.
So what can we study from the report and Meeker’s knowledge concerning the position of voice in web tendencies for 2016?
Voice search is rising exponentially
We know that voice is a quick-rising development in search, because the proliferation of digital assistants and the advances in deciphering pure language queries make voice looking simpler and extra correct.
But the figures from Meeker’s report present precisely to what extent voice search has grown over the previous eight years, because the launch of the iPhone and Google Voice Search in 2008. Google voice queries have risen greater than 35-fold from 2008 to at present, in response to Google Trends, with “name mother” and “navigate residence” being two of probably the most generally-used voice instructions.
Tracking the rise of voice-particular queries resembling “name mother”, “name dad” and “navigate residence” are an sudden however surprisingly correct strategy to map the expansion of voice search and voice instructions. As an apart, anybody can monitor this knowledge for themselves by getting into the identical phrases into Google Trends. It’s fascinating to assume what the signature voice instructions is perhaps for monitoring using sensible residence hubs like Amazon Echo in a number of years’ time.
Google is, in fact, under no circumstances the one search engine experiencing this development, and the report goes on as an example the rise in speech recognition and textual content to speech utilization for the Chinese search engine Baidu. Meeker notes that “typing Chinese on a small cellphone keyboard [is] much more troublesome than typing English”, resulting in “quickly rising” utilization of voice enter throughout Baidu’s merchandise.
Meeker additionally plots a timeline of key milestones within the progress of voice search since 2014, noting that 10% of Baidu search queries have been made by voice in September 2014, that Amazon Echo was the quickest-promoting speaker in 2015, and that Andrew Ng, Chief Scientist at Baidu, has predicted that by 2020 50% of all searches might be made with both pictures or speech.
While developments in picture search haven’t been making as a lot of a splash as developments with voice, it shouldn’t be ignored, because the know-how that may allow us to ‘search’ objects within the bodily world is approaching in leaps and bounds. In April, Bing carried out an replace to its iOS app permitting customers to look the online with pictures from their telephone digital camera, though the function is restricted to customers in america, as they’re the one ones who can obtain the app.
The visible search app CamFind, which has been round since 2013, additionally has an uncanny potential to determine objects within the bodily world and name up product listings, which has an enormous quantity of potential for each search and marketing.
Why do individuals use voice?
The improve in voice search and voice instructions is just not solely as a result of improved know-how; probably the most superior know-how on the earth nonetheless wouldn’t see widespread adoption if it wasn’t helpful. So what are voice enter adopters (a minimum of in america) utilizing it to do?
The commonest setting for utilizing voice enter is the house, which explains the recognition of voice-managed sensible residence hubs like Amazon Echo. In second place is the automotive, which tallies up with the preferred motivation for utilizing voice enter: “Useful when palms/imaginative and prescient occupied”.
30% of respondents discovered voice enter quicker than utilizing textual content, which additionally is sensible – Meeker observes elsewhere within the report that people can converse virtually A occasions as shortly as they will sort, at a mean of one hundred fifty phrases per minute (spoken) versus forty phrases per minute (typed). While this has all the time been the case, the power of know-how to precisely parse these phrases and shortly ship a response is what is absolutely starting to make voice enter quicker and extra handy than textual content.
As Andrew Ng stated, in a quote that's reproduced on web page 117 of the report, “No one needs to attend 10 seconds for a response. Accuracy, adopted by latency, are the 2 key metrics for a manufacturing speech system…”
The third-hottest cause for utilizing voice enter, “Difficulty typing on sure units”, is a reminder of the essential position that voice has all the time performed, and continues to play, in making know-how extra accessible. The least common setting for utilizing voice enter is at work, which could possibly be because of the problem in choosing out a person consumer’s voice in a piece surroundings, or on account of a social reluctance to speak to a tool in entrance of colleagues.
Meeker’s report additionally seems to be into the utilization of 1 digital assistant particularly: Hound, an assistant app developed by the audio recognition firm SoundHound, and which was additionally lately used so as to add voice search capabilities to SoundHound’s music search engine of the identical identify.
What’s fascinating concerning the utilization breakdown for Hound, at the very least among the many 4 pretty broad classes that the report divides it into, is that nobody use sort dominates overwhelmingly. The hottest use for Hound is ‘basic info’, at 30%, above even ‘private assistant’ (which is what Hound was designed to do) at 27%.
Put along with the share of queries for ‘native info’, greater than half of voice queries to Hound are info queries, suggesting that many customers nonetheless see voice primarily as a gateway into search. It can be fascinating to see comparable graphs for utilization of Siri, Cortana and Google’s assistants to find out whether or not this development is borne out throughout the board.
A tipping level for voice?
Towards the top of the part, Meeker appears on the evolution and possession of the Amazon Echo, which as a tool which was particularly designed for use with voice (versus smartphones which had voice capabilities built-in into them) is probably probably the most helpful product case research for the adoption of voice instructions.
Meeker notes on one slide that computing business inflection factors are “sometimes solely apparent with hindsight”. On the subsequent, she juxtaposes the height of iPhone gross sales in 2015 and the start of their estimated decline in 2016 with the take-off of Amazon Echo gross sales in the identical interval, seeming to recommend that one prompted the opposite, or that one system is giving strategy to the opposite for dominance of the sensible system market.
I’m unsure if I would agree that the Amazon Echo is taking up from the iPhone (or from smartphones), since they’re basically totally different units: one is designed to be house-sure, the opposite moveable; one is visible and the opposite is just not; and as I identified above, the Amazon Echo is designed to work solely with voice, whereas the iPhone merely has voice capabilities.
But it's fascinating to view the development as a part of a shift within the computing market in the direction of a unique sort of know-how: an ‘all the time-on’, Internet of Things-related system particularly designed to work with voice, and maybe that’s the purpose that Meeker is making right here.
Meeker factors to the quick motion of third-celebration builders to construct platforms which combine the Alexa voice assistant into totally different units as proof of the enlargement of “voice as computing interface”. While I assume we'll all the time depend upon a visible interface for a lot of issues, this could possibly be the start of a tipping level the place voice instructions take over from buttons and textual content as the first enter technique for many units and machines.
Hopefully Meeker will revisit this matter in subsequent tendencies stories in order that we will see how issues play out over the subsequent few years.