TechOpenAI's GPT-4o: Revolutionizing AI with instant sound and image analysis

OpenAI's GPT‑4o: Revolutionizing AI with instant sound and image analysis

OpenAI has unveiled its latest achievement—the GPT-4o model, which can analyse sound, image, and text in real time. Surprisingly, the model demonstrates an extraordinary speed of reaction to received sound signals.

Images source: © Unsplash

Dominik Adamczyk

14 May 2024 09:14

Artificial intelligence enthusiasts eagerly awaited the OpenAI Spring Update - a presentation by the creators of ChatGPT. The mood before the event was heightened by loud industry announcements about a possible presentation of a new AI technology-based internet search engine. However, this time, the focus was on the latest model.

GPT-4o operates in real-time

OpenAI introduced the GPT-4o model, which enables more natural interactions. According to the company's statements, GPT-4o responds to sound signals in a quarter of a second, averaging a reply response time of around a third. This is comparable to the time it takes for a conversation with a human. Regarding performance, the model is similar to GPT-4 Turbo when analysing text in English and performs even better in other languages.

OpenAI claims that its new GPT-4o model is also significantly better at interpreting images and sounds than models currently available. So, what are the capabilities of this new tool? One of the moments that made the biggest impression on me was a recording in which GPT-4o was asked to start counting from one to ten.

GPT-4o's reaction to commands to change the pace was instantaneous, happening in real-time. Equally interesting was another recording in which GPT-4o took on the role of a Spanish language teacher, analysing objects seen through the camera.

When can we expect access to GPT-4o? OpenAI informs that text and graphic functions of the GPT-4o model have already become available today in ChatGPT. The new model is free, and subscription users can benefit from up to five times increased message limits. OpenAI also plans to introduce a new version of the GPT-4o voice mode in an alpha version for ChatGPT Plus users in the coming weeks.

Remember, OpenAI isn't just ChatGPT. The upcoming Sora model will allow users to create videos, which many creators are particularly excited about.