Site icon

Voicemod tools up with $14.5M to ride the generative AI (sonic)boom

The first thing we ask Voicemod CEO and co-founder Jamie Bosch when he picks up the phone to talk about a new round of funding isn’t something we’re used to.

Our question may become the norm in the generative future of artificial intelligence that on us flying fast: Is this your real voice?

Bosch

The Bosch startup has been tinkering with sound effects for nearly a decade, playing in the field of digital signal processing (DSP) where it initially focused on creating fun “sound emoticon” effects and reactions for players to spice up their voice chats.

And gamers remain its main user base (for now).

But the audio field is abuzz with developments in AI – which the Voicemod team hopes will lead to entirely new use cases and many more users for its tools.

Technology

So where DSP technology used to be about applying effects to a (real) human voice, developments in artificial intelligence allow startups like Voicemod to offer tools to create completely synthesized (unreal) voices. And even the ability for users to “wear” these voices in real time – so they can speak in a voice that isn’t their own. Think of it as the sonic equivalent of a Snapchat lens or viral teen TikTok filter or Reface celebrity face swaps.

The AI ​​voice can even allow you to change your voice to another person’s (real) voice. And not just for talking about the weather or filming shit. But for what is known as singing-to-singing voice conversion. This means you could sing in someone else’s voice – supercharge your karaoke game by, say, singing Bohemian Rhapsody as Freddie Mercury’s literal voice. And even switching between Mercury, May and Taylor for full mock opera effect if you have enough trained AI models (and microphones) on hand. Mamma Mia!

Ethical

AI makes it all possible—though legal and ethical questions may give pause for thought about the rush and release of real-time voice shifting in a world that still relies heavily on fixed identities. (Banks pushing customers to record a “unique voiceprint” to use as a password definitely need to sit up and start listening.)

Voicemod acquired another audio effects startup last year, called Voctro Labs, whose technology Bosch says it is working to merge with its own to create an amplified hybrid platform. That combo has already allowed it to expand what it offers — launching a text-to-song feature last December that lets you turn your own lyrics into a vocal composition using generative artificial intelligence. He tells us more is in the works — including the aforementioned sing-to-sing feature.

Musician

Voctro’s technology may be familiar as it was involved in the development of musician Holly Herndon’s voice clone that appeared in a viral Ted Talk last year – in which her AI voice could be heard as a duet with the real voice of another musician (Pher). real time. Which, well, if you haven’t seen it yet, it’s quite a visual-audio spectacle, as well as a mouthful to explain. It’s also a taster of what Voicemod is coming to a keyboard near you.

“We will definitely bring more products to market and more ways for people to express themselves with generative AI technology,” says Bosch. “Not all Voctro Labs technology is music-related – but they have a lot of singing-related technology, from this text-to-song technology to real-time sing-to-sing technology. So we have a lot of new projects and new products.

Hybrid Technology

“We’re going to boost our speech technology to real-time AI speech because we’re basically merging our technology with their technology. We’re basically creating a hybrid technology that will be better than ours – or there’s a combination of both… [So their sing-to-sing technology will be] combined with our DSP technology – which we could use for auto tuning. So we could potentially help artists with their voice and tone. And so it is, this is going to be really, really interesting.”

In addition to providing audio tools directly to consumers/creators, it offers its technologies through SDKs and APIs to third parties for integration into their own products, from games and apps to hardware. So it’s set up to distribute its technology across the ecosystem of gamers and creators and find demand for it.

AI-powered generative audio disruption of course mirrors (in an inaccurate fairground “crazy mirror”) the developments we see elsewhere: Visually, to graphics and illustration, as a result of deep learning and the advent of challenge-based image generation interfaces (such as DALL-E and Stable Diffusion). Also to the written word, through the large language models that underpin generative AI chatbots like ChatGPT, which can produce song lyrics or an entire essay on demand. And indeed, in the case of music composition – where Google recently demonstrated a fast generative AI song composer that can apparently produce arrangements that fit the musical atmosphere you describe (although it claimed it wasn’t releasing a specific generative AI model – but surely someone else did).

Clearly, AI is bending the rules of what one human can create. And like freedom, an open concept, it’s both exciting and terrifying. Because it’s what you do with it that counts.

The coming years will be all about finding out what people do with such powerful AI tools at their fingertips.

Source: Techcrunch

Exit mobile version