Microsoft’s AI app VASA-1 makes Photographs Talk and Sing with believable Facial Expressions

A team of AI researchers at Microsoft Research Asia has developed an AI application that converts a still image of a person and an audio track into an animation that accurately portrays the individual speaking or singing the audio track with appropriate facial expressions.

The team has published a paper describing how they created the app on the arXiv preprint server; video samples are available on the research project page.

The research team sought to animate still images talking and singing using any provided backing audio track, while also displaying believable facial expressions. They clearly succeeded with the development of VASA-1, an AI system that turns static images, whether captured by a camera, drawn, or painted, into what they describe as “exquisitely synchronized” animations.

Blog