Stability AI Unveils Inaugural Text-to-Audio AI Platform, Enabling Users to Generate Songs from Text

Stability AI, a Generative AI Company Located in London, Introduces Its Debut Text-to-Audio Platform, “Stable Audio,” Empowering Users to Create Customized Audio Tracks.

London-based generative AI company, Stability AI, has recently introduced its inaugural text-to-audio AI platform named “Stable Audio.” This platform, driven by artificial intelligence, marks the company’s initial venture into the realm of music and sound synthesis. It has the capacity to generate songs of up to 90 seconds in duration, rendering it suitable for a diverse array of applications, including commercials, audiobooks, and video games.

Stability AI has been a prominent leader in the AI landscape, primarily recognized for its AI-generated visual content until now. However, with the launch of its first text-to-audio generative AI platform, it is now in direct competition with other industry giants, such as OpenAI, Google, and Meta.

Reportedly, the Stable Audio platform employs a diffusion model, the same AI model that powers the company’s more renowned image platform, Stable Diffusion. However, in the case of Stable Audio, this model has been trained using audio data instead of images, enabling users to generate songs or background audio of varying lengths, thereby making it a versatile tool for diverse projects.

Furthermore, the Stable Audio platform addresses the limitations associated with conventional audio diffusion models by undergoing music-specific training and incorporating text metadata specifying song starting and ending times. This functionality allows users to generate songs of any desired length, enhancing its value for music production. Traditional audio diffusion models were constrained to generating fixed-duration audio clips, restricting their ability to produce complete songs. Stability AI has improved the model to provide users of Stable Audio with greater flexibility in determining the length of the generated song, thereby granting them more control over the creative process.

Stability AI remarked in a statement, “Stable Audio represents the cutting-edge audio generation research by Stability AI’s generative audio research lab, Harmonai,” as reported by The Verge. “We continue to improve our model architectures, datasets, and training procedures to improve output quality, controllability, inference speed, and output length.”

According to the company’s statement, the Stable Audio platform has been trained using an extensive dataset comprising over 800,000 audio files, encompassing music, sound effects, and individual instrument stems. This dataset also incorporates text metadata from AudioSparx, a stock music licensing company, covering an extensive 19,500 hours of diverse sounds. Stability AI emphasizes that it has secured the requisite permissions to employ copyrighted materials through its partnership with a licensing company.

For users interested in utilizing the platform, Stability Audio offers three distinct pricing tiers:

The free version allows users to generate up to 45 seconds of audio for a maximum of 20 tracks per month. However, users are prohibited from using the generated audio for commercial purposes in the free version.
The Professional level, priced at $11.99, permits users to create 500 tracks, each with a maximum duration of 90 seconds.
The Enterprise subscription is available for companies seeking customized usage plans and pricing structures.

It’s noteworthy that text-to-audio generation is not a novel concept, with several prominent players in the generative AI field exploring this concept. For instance, in August, Meta unveiled Audio Craft, a suite of generative AI models designed to create natural-sounding speech, sound, and music based on prompts. However, Audio Craft is currently restricted to researchers and select audio professionals. Google also recently introduced MusicLM, which enables individuals to generate audio, but it is also primarily intended for researchers.

Stability AI, a Generative AI Company Located in London, Introduces Its Debut Text-to-Audio Platform, “Stable Audio,” Empowering Users to Create Customized Audio Tracks.

Related Posts

Adobe’s Photoshop on the Web Now Accessible on Google Chromebook Plus Devices

The arrival of a more affordable Phone 2a in the market may not be imminent.

Experiencing Issues with ChatGPT? You’re in Good Company; OpenAI Systems Encounter Significant Outage”

Leave a Reply Cancel reply