When we hear the phrase “AI Voice” or “robot chatbot”, most of us are immediately reminded of an irritatingly static voice urging us to “dial 1 for customer service.” This is because the majority of consumer experiences with the nascent technology that is synthetic speech is constructed through these isolated interactions. However, you’ll probably be surprised to learn that the AI Voice industry is one of the most rapidly growing and intensely researched fields in modern engineering. This is because when it comes to presenting your product or marketing to consumers, there is no better form of engagement than speech. We previously reported the current status of the global TTS market and the influence of factors like the emergence of APAC as a key region. With the growing adoption of voice technology in enterprise and consumer solutions, the larger voice and speech recognition market is expected to grow at a 17.2% compound annualized rate to reach $26.8 billion by 2025.
Let’s explore some interesting facts that you might have not known about AI Voices.
1. Voice is faster than typing
Every other benefit listed in this post will be premised upon one simple, foundational understanding: the voice is the most natural form of communication we have, and as such, is our most efficient method of conveying message. While the average person can only type 38 ~ 40 words per minute, he or she can vocalize up to 110 ~ 150 words per minute. In fact, a research conducted by Stanford University declared that speech was up to 3x faster than typing. By using Baidu’s Deep Speech 2 cloud-based speech recognition software and Apple’s built-in keyboard as metrics, the researchers also found that the error rate of speech to text conversion was 20.4% lower than typing in English, and a whopping 63.4% lower than typing in Chinese.
That’s why when it comes to quick Google searches or keyword entries, nothing beats the comfort of simply pressing a microphone icon and speaking into your phone. According to a study by Perficient analyzing mobile voice usage trends in 2020, 78% of teenagers and 63% of adults preferred voice search over typing. A report by Just AI revealed just how versatile usage of smart speakers can be: 20% of total internet searches were conducted through voice search, 51% of consumers have used voice search to find restaurants, 58% of consumers have used voice search to research local business information, and when browsing through the Amazon marketplace, 85% of users selected the product that was recommended to them through voice search. It’s very telling just how much influence a smart speaker, and in turn, synthetic speech has on our personal financial choices. Not only is it an irreplaceable mechanism to further inform ourselves to make more productive purchases, but it has now become the very mechanism we use to purchase our daily items.
2. Businesses are leveraging AI Voice
Businesses worldwide are beginning to understand the technical and fiscal advantage of voice & audio marketing, and are quickly moving to optimize their voice search optimization strategy. With voice-controlled devices becoming a common form of communication in households, 68% of enterprise respondents are said to be adopting marketing tactics to directly appeal to this growing consumer base. According to Speechmatics, 2021 saw an 18% year-over-year increase in enterprise adoption of voice marketing, and of the respondents that don’t have a voice strategy yet, 60% said it was something they were considering implementing in the next 5 years.
On the other hand, the automation of CRM (customer relationship management) services has been catalyzed by the introduction of highly scalable AI Voiceover solutions. Instead of having a dedicated voice actor for every customer interaction, enterprises can instead utilize easily customizable VoiceBots that can man their IVR (interactive voice response) stations. These VoiceBots can be activated 24/7 and can handle repetitive support tickets so customer service representatives can instead spend their time more productively handling more complex requests.
Speechmatics conducted a comprehensive survey that compiled the user experiences of current consumers of voice technology. The report mentioned that the most commonly used applications of voice technology included closed-captions, automated customer experience and analytics, digital asset management, eDiscovery, VoiceBots, medical transcription, and others. For these reasons, the finance, insurance, education and healthcare sectors are predicted to be main verticals of growth.
3. Consumers trust smart speaker ads more than other major media formats
In a world where personalized advertising and search algorithms are being heavily scrutinized, audio ads have been relatively more successful in escaping the public media storm. Adobe found that 58% of consumers found smart speaker ads to be less intrusive than other legacy media outlets and social media. As the cherry on top for the audio industry, 52% of consumers also said that they found smart speaker ads to be more engaging, and 57% said the ads were more “relevant to their needs and interests.”
In the beginning of this post, we mentioned how there is a perception that VoiceBots are universally disliked by consumers. After all, who wouldn’t want to engage in a long talk with a fellow human about their personal struggles? A consumer study report by CapGemini analyzing conversational commerce found that 49% of consumers actually preferred VoiceBots to their human counterparts, simply because it got the job done faster. Consumers are able to leverage the speedy response of the automated service to quickly receive the basic assistance they require.
4. Users can use “natural language” when speaking to their AIs
Advancements in speech recognition technology, such as OpenAI’s GPT-3 (Generative Pre-trained Transformer) and Google’s LaMDA (Language Model for Dialogue Applications), are allowing people to speak to their voice assistant as they would speak to their friend. This is why 70% of search requests made through Google Assistant are done in natural, every-day language. This means that human-like synthetic speech is not only exclusively available as an “output” in voice technology, but is also available as an “input”.
In conjunction with the increasing quality of voice technology, US consumers are bringing AI Voices back to their homes. While smart speakers haven’t become a ubiquitous aspect of the idyllic modern American home just yet, OC&C Strategy Consultants report that the smart speaker industry will penetrate 55% of US households by 2022, dominated by key technology players like Amazon, Google, and Apple.
The very concept of an AI Voice can seem unsettlingly novel to many. But it is a sign of tremendous technological progress with endless applications. The ability to convert text to human-like speech is something straight out of a science fiction novel, and we at LOVO can’t help but feel a little excited about the future to come.
Learn more about the advances we’ve made in the field of synthetic speech, and give a go yourself for free with Genny!
Additional readings for you:
– Freelance Voice Actor vs AI Voice Actor
– Finding the Right Voice for Your Content