Microsoft's AI leaps forward: Animating photos with realistic audio
Microsoft has launched an innovative artificial intelligence model that enables the animation of photographs using generated audio, producing stunning yet potentially risky results.
The development of advanced machine learning technologies has significantly expanded artificial intelligence's capabilities. Microsoft's new AI model, which can animate static images of people, is a shining example.
Imagine a simple image suddenly starting to speak. The Microsoft VASA-1 model makes this possible by animating human portraits and syncing them with audio recordings. This technology can transform ordinary photos into realistic videos of people talking or singing.
From a photo to a realistic movie
Microsoft utilized non-existent, generated portraits created using StyleGAN2 and DALL-E 3 in its experiments. This feature works well on realistic photos of people and cartoon avatars, with demonstrations even including the iconic Mona Lisa.
The VASA-1 model can sync lip movements and reproduce the full range of facial expressions and natural head movements, greatly enhancing the realism of the animations.
The model boasts the ability to create movies at a resolution of about 512 x 512 pixels at 45 frames per second in offline mode and can produce live recordings at nearly 40 frames per second with a minimal delay of roughly 170 milliseconds on a desktop computer equipped with an NVIDIA GeForce RTX 4090 graphics card.
The potential threat of new technology
While focused on generating animations for virtual portraits and not intended to produce misleading content, Microsoft acknowledges the potential misuse of this technology for impersonation.
In a statement on its official site, the company openly opposes using the new model for purposes that could deceive or create harmful content using images of real people. Although Microsoft does not plan to make the demonstration version, API, or complete product publicly available, it is still keen on leveraging this technology to enhance the detection of fake content.