In my first article, I discussed biometrics, data security, and my thoughts on AI. Then in my last article, I talked about the meaning, technology, usage, concerns, and my two cents around voice conversion. Now that the table is set: so, who does what?
Let’s discuss the broader [Speech and Voice Recognition market] first.
- Grand View Research, on November of last year, asserted that “Speech and Voice Recognition Market” will be worth $31.82bn by 2025, with CAGR of 17.2%.
- Markets and Markets, on January 9th, 2019, reported that “Speech and Voice Recognition Market” will be worth $21.5bn by 2024, growing at CAGR of 19.18%.
- IDTech Ex Research, on January 15th, claimed that Smart Speech / Voice-Based Technology Market will reach $15.5bn by 2029.
This market encompasses every technology related to voice and speech, whether it’s recognition, synthesis, converting, etc. (e.g.; from automated voice recordings and IoT kitchenware to smart-speakers and talking robots)
Now for markets that hit closer to home for us:
1) Text-to-Speech Market
- Markets and Markets, in 2017, summarized the TTS market was valued at $1.3bn in 2016 and will reach $3.03bn by 2022, a CAGR of 15.21%.
- Other reports from this year are pointing at the 15.9~16% CAGR as well.
2) Voice Conversion (Voice-Over) & Cloning Market
- Voices.Com found in 2017 that the Voice-Over Market was worth $4.4bn in 2015.
- BusinessWire wrote that the Voice Cloning Market was to reach $1.74bn by 2023, up from $456mm in 2018, a CAGR of 30.7%.
- TikTok has seen 150mm+ searches and 34bn+ views on contents related to voice changing and voice conversion in 2018 alone.
3) Speech-to-Speech Market
- There isn’t much data here because this is a new and challenging technology, and quite frankly, nobody has really produced a successful commercial-grade service yet!
I probably shouldn’t give free plugs for these companies, but in the spirit of fairness, I shall provide objective information here. No bias. (Scout’s honor!)
- LOVO (US)
That’s it. LOVO is the only player in the market.
… Just Kidding.
- Wellsaid Labs (US)
- Murf.AI (US)
- Play.ht (UAE)
- Descript (US)
- …and more!
Some of these companies have been around since 2014, while others are as recent as 2020. Each company has different proprietary tech stack, target audience, branding, and most importantly, variance in the quality of output.
LOVO’s main product Genny, which comes from “Gen”erative “AI, is a full-force content creation platform powered by synthetic speech and other generative AI technologies. From individual YouTubers and Podcasters to Fortune 500 corporations looking to create engaging adverts and e-learning materials, anyone can easily utilize our AI voiceovers and video dubbing features at the fraction of the cost and time spent on going back and forth with human voice actors.
Unlike simple TTS tools that simply translate written words, we provide a suite of applications and features to bolster your everyday workflow. This not only helps you produce more authentic content, but saves you a lot of headache and money, so you can focus on what’s important, not the logistics involved with content creation.
Check out Genny for yourself for free!
Take a deeper dive into these tools here:
– 7 Essential Qualities of a Good Text to Speech Platform
– Top 10 Text to Speech Tools to Help You Create Enthralling Content