Meta’s AudioCraft Can Turn Your Words Into Music

Meta has released AudioCraft, a new open-source generative AI framework that can produce music from simple text prompts. AudioCraft is based on a dynamic framework that enables high-quality, realistic audio and music generation from text-based user inputs. It aims to revolutionize music generation by empowering professional musicians to explore new compositions, indie game developers to enhance their virtual worlds with sound effects, and small business owners to add soundtracks to their Instagram posts, all with ease.

AudioCraft is based on a dynamic framework that enables high-quality, realistic audio and music generation from text-based user inputs. It aims to revolutionize music generation by empowering professional musicians to explore new compositions, indie game developers to enhance their virtual worlds with sound effects, and small business owners to add soundtracks to their Instagram posts, all with ease.

AudioCraft is a collection of three robust models: MusicGen, AudioGen and EnCodec. While MusicGen uses text-based user inputs to generate music, AudioGen performs a similar role for ambient sounds. Both are trained with Meta-owned and specifically licensed music and public sound effects, respectively. A recent release from the company offers an improved version of EnCodec. This decoder allows for high-quality music generation with fewer artifacts, based on the pre-trained AudioGen and all AudioCraft model weights and code.

Blog