Google draws criticism for its demo after the long-awaited release of ‘Gemini’
Shortly after news spread that Google was pushing back the release of its long-awaited AI model called Gemini, Google announced its launch.
As part of this release, they released a demo showcasing Gemini’s impressive – downright incredible – capabilities. Well, you know what they say about things that are too good to be true.
Let’s take a look at what went wrong with the demo and how it compares to OpenAI.
What is Google Gemini?
Rivaling OpenAI’s GPT-4, Gemini is a multi-modal AI model, meaning it can process text, image, audio and code inputs.
(For a long time, ChatGPT was unimodal, dealing only with text, until it moved to multimodality this year.)
Gemini is available in three versions:
Nano: This is the less powerful version of Gemini, designed to run on mobile devices like phones and tablets. It’s ideal for simple, everyday tasks, like summarizing an audio file and writing copy for an email.
Pro: This version can handle more complex tasks such as language translation and marketing campaign ideation. This is the version that now powers Google’s AI tools like Bard and Google Assistant.
Ultra: The largest and most powerful version of Gemini, with access to large data sets and processing power to accomplish tasks like solving scientific problems and creating advanced AI applications.
Ultra is not yet available to consumers, with a rollout planned for early 2024, while Google conducts final testing to ensure its safety for commercial use. Gemini Nano will power Google’s Pixel 8 Pro phone, which comes with built-in AI capabilities.
Gemini Pro, meanwhile, will power Google tools like Bard starting today and is accessible via API through Google AI Studio and Google Cloud Vertex AI.
Was Google’s Gemini demo misleading?
Google released a six-minute YouTube demo showcasing Gemini’s skills in language, game creation, logic and spatial reasoning, cultural understanding, and more.
If you watch the video, it’s easy to be impressed.
Geminis are able to recognize a duck from a simple drawing, understand sleight of hand, and solve visual puzzles – just to name a few tasks.
However, after gaining over 2 million views, a Bloomberg Report revealed that the video was cut and spliced ​​together, which inflated Gemini’s performance.
Google shared a disclaimer at the start of the video: “For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.”
However, Bloomberg points out that they left out some important details:
The video wasn’t done in real time or via voice output, which suggests that conversations won’t be as smooth as the demo shows. The model used in the video is the Gemini Ultra, which is not yet available to the public.
The way Gemini actually handled input in the demo was through still images and written prompts.
It’s like when you show everyone your dog’s best trick.
You share the video via text message and everyone is impressed. But when everyone is done, they see that it actually takes a whole bunch of treats, petting, patience, and repeating 100 times to see this trick in action.
Let’s do a side-by-side comparison.
In this 8-second clip, a person’s hand is seen gesturing as if they are playing the game used to settle all friendly disputes. Gemini responds, “I know what you’re doing. You play rock-paper-scissors.
But what really happened behind the scenes involved a lot more spoon-feeding.
In the real demo, the user submitted each hand gesture individually and asked Gemini to describe what he saw.
From there, the user combined all three images, asked Gemini again, and included a huge hint.
While Gemini’s ability to process images and understand context is still impressive, video minimizes the amount of steering needed for Gemini to generate the correct response.
While this has led to a lot of criticism at Google, some point out that it’s not uncommon for companies to use editing to create more fluid and idealistic use cases in their demos.
Gemini vs. GPT-4
So far, GPT-4, created by OpenAI, is the most powerful AI model on the market. Since then, Google and other AI players have worked hard to find a model that can beat it.
Google first teased Gemini back in September, suggesting it would beat GPT-4 and technically it delivered on its promises.
Gemini outperforms GPT-4 in a number of criteria defined by AI researchers.
However, the Bloomberg article points out something important.
For a model that took this long to come out, the fact that it’s only slightly better than GPT-4 isn’t the huge win Google was aiming for.
OpenAI released GPT-4 in March. Google is now launching Gemini, which outperforms but only by a few percentage points.
So how long will it take for OpenAI to release an even bigger and better version? Judging by last year, it probably won’t be long.
For now, Gemini seems like the best option, but that won’t be clear until early 2024 when Ultra rolls out.