Nvidia has been instrumental in the current AI boom, but primarily as the maker of GPUs that power all of the next-generation AI processing tasks. But the company doesn’t just provide shovels to all diggers. They’ve gone ahead and joined the fray with their own AI model that does something truly new.
Reported by Ars Technica, Nvidia’s new AI model is called Fugatto and combines new AI training methods and technologies to transform music, voices and other sounds in ways that have never been done before. previously performed, in order to create soundscapes never experienced before.
Fugatto is based on an advanced AI architecture with 2.5 billion parameters, trained on over 50,000 hours of annotated audio data. The model uses a technique called Composable ART (Audio Representation Transformation), which can combine and control different sound properties based on text or audio prompts. The result is completely new sound combinations that were not present in the training material.
For example, Fugatto can generate the sound of a violin that sounds like a child laughing, or a factory machine screaming in metallic pain. The model also allows you to fine-tune specific features, such as amplifying or reducing French accents or adjusting the level of sadness in a voice recording.
In addition to combining and transforming sounds, Fugatto can perform classic AI audio tasks, such as changing the emotion of a voice, isolating voices in music, or adapting musical instruments to new sound sources .
For all the essential details, you can learn more about Fugatto in Nvidia’s official white paper (PDF). Otherwise, check out the Fugatto page with examples of emergent sounds and emergent tasks.
This article was originally published on our sister publication M3 and has been translated and localized from Swedish.