Google's Gemini 1.5 Pro leaps forward. A stride in AI accessibility and innovation
During the Google Next event, the company announced a significant upgrade to Gemini 1.5 Pro, its AI model. This enhancement broadens the tool's functionality.
7 May 2024 19:17
One of Gemini’s strengths - an advanced AI solution developed by Google - is its multimodal abilities. Designed to process a variety of data types, including text, images, audio, video, and code, simultaneously and seamlessly, it can effectively tackle complex reasoning and problem-solving tasks in numerous domains.
As reported by The Verge, the recent update will allow the model to "listen," enabling it to analyze uploaded audio files and extract information without requiring written transcripts. This breakthrough expands its audio data processing capacity, making it useful for analyzing records of business meetings or movie soundtracks, for example.
Gemini 1.5 Pro will become more accessible
Google also shared plans to make Gemini 1.5 Pro more accessible through its AI application development platform, Vertex AI. This mid-range offering in the Gemini lineup already surpasses Gemini Ultra's largest and most sophisticated model in performance. Crucially, the new model eliminates the need for intricate customization. Google views this as a notable advancement in making AI technology more accessible and user-friendly.
Google updates other AI models
In addition, Google revealed updates to another key AI model, Imagen 2, which generates images from text. Innovations like inpainting and outpainting will allow users to modify images by adding or removing elements. Furthermore, Google has introduced SynthID digital watermarking technology, designed to embed imperceptible watermarks in images created by Imagen models, facilitating tracing their origins.
Google works on integrating AI with its search engine
The Verge has reported an innovative step by Google to enhance AI's relevance by integrating it with its search engine. This initiative aims to enable AI to deliver responses backed by the most current information. This move is especially significant considering the previous constraints on language model responses. For instance, they deliberately avoided topics like the US elections in 2024 and faced criticism for generating historically inaccurate images.