TechMicrosoft's AI leaps forward: Animating photos with realistic audio

Microsoft's AI leaps forward: Animating photos with realistic audio

Microsoft has launched an innovative artificial intelligence model that enables the animation of photographs using generated audio, producing stunning yet potentially risky results.

VASA-1
VASA-1
Images source: © Microsoft

The development of advanced machine learning technologies has significantly expanded artificial intelligence's capabilities. Microsoft's new AI model, which can animate static images of people, is a shining example.

Imagine a simple image suddenly starting to speak. The Microsoft VASA-1 model makes this possible by animating human portraits and syncing them with audio recordings. This technology can transform ordinary photos into realistic videos of people talking or singing.

From a photo to a realistic movie

Microsoft utilized non-existent, generated portraits created using StyleGAN2 and DALL-E 3 in its experiments. This feature works well on realistic photos of people and cartoon avatars, with demonstrations even including the iconic Mona Lisa.

The VASA-1 model can sync lip movements and reproduce the full range of facial expressions and natural head movements, greatly enhancing the realism of the animations.

The model boasts the ability to create movies at a resolution of about 512 x 512 pixels at 45 frames per second in offline mode and can produce live recordings at nearly 40 frames per second with a minimal delay of roughly 170 milliseconds on a desktop computer equipped with an NVIDIA GeForce RTX 4090 graphics card.

The potential threat of new technology

While focused on generating animations for virtual portraits and not intended to produce misleading content, Microsoft acknowledges the potential misuse of this technology for impersonation.

In a statement on its official site, the company openly opposes using the new model for purposes that could deceive or create harmful content using images of real people. Although Microsoft does not plan to make the demonstration version, API, or complete product publicly available, it is still keen on leveraging this technology to enhance the detection of fake content.

Related content
© Daily Wrap
·

Downloading, reproduction, storage, or any other use of content available on this website—regardless of its nature and form of expression (in particular, but not limited to verbal, verbal-musical, musical, audiovisual, audio, textual, graphic, and the data and information contained therein, databases and the data contained therein) and its form (e.g., literary, journalistic, scientific, cartographic, computer programs, visual arts, photographic)—requires prior and explicit consent from Wirtualna Polska Media Spółka Akcyjna, headquartered in Warsaw, the owner of this website, regardless of the method of exploration and the technique used (manual or automated, including the use of machine learning or artificial intelligence programs). The above restriction does not apply solely to facilitate their search by internet search engines and uses within contractual relations or permitted use as specified by applicable law.Detailed information regarding this notice can be found  here.