Nvidia (NVDA, Financials) introduced Fugatto, a new artificial intelligence model designed to generate and modify music, voices and sounds; the stock is down 3.1% in early morning trading.
The company said Monday that this approach is aimed at professionals in music production, cinematography and video game creation.
Foundational Generative Audio Transformer Opus, or Fugatto for short, allows users to generate or modify audio using clues found in text or audio sources. According to Nvidia, the model can generate completely new sounds, modify the instruments in a song, translate written descriptions into musical excerpts and even change the accents or emotions of a speech.
Rafael Valle, head of applied audio research at Nvidia, said: “We wanted to create a model that understands and generates sound like humans do.
The paradigm has useful applications in many different fields. Advertising agencies, for example, can edit voiceovers with different accents or emotions to suit campaigns in many locations. Fugatto allows video game creators to dynamically change audio assets in real time to reflect in-game activities.
In proving its adaptability, Nvidia highlighted the model’s ability to create unusual sound changes, including making a trumpet sound like a barking dog or a saxophone like a cat meowing. Fugatto can also create excellent singing voices from text inputs with minimal adjustments and small amounts of singing data, although he is not specially trained for such work.
Driven by 2.5 billion parameters, Fugatto was built on Nvidia’s DGX systems with 32 H100 Tensor Core GPUs. The company observed that developing the model took more than a year of effort.
Nvidia has not said when Fugatto will be available for public or commercial use.
This article first appeared on GuruFocus.