Google's artificial intelligence division DeepMind today announced its latest foundation model, Gemini 1.5, representing a major advancement in language understanding AI. Gemini 1.5 introduces dramatically improved efficiency, performance and long-context reasoning ability compared to previous versions, opening new possibilities for how AI can be helpful.
The first Gemini 1.5 model released is the mid-size Gemini 1.5 Pro, optimized for scaling across diverse tasks. It performs at a similar level to Gemini 1.0 Ultra, Google's largest model to date, while being much more efficient computationally thanks to a new Mixture-of-Experts (MoE) architecture.
But the headline feature is Gemini 1.5 Pro's unprecedented long-context window of up to 1 million tokens - allowing it to process vast amounts of text, image, audio and video data to reason about complex topics.
Early testers can access the full 1 million token context, with Google working on optimizations before a wider release.
"These continued advances in our next-generation models will open up new possibilities for people, developers and enterprises to create, discover and build using AI," said Demis Hassabis, CEO of DeepMind.
Game-Changing Efficiencies Enable Faster Innovation Cycles
Gemini 1.5 represents a breakthrough in efficient deep learning, dividing its neural network into smaller expert pathways that specialize and only activate for relevant inputs. This selective activation delivers similar quality while requiring far less computing resources to train and run than traditional AI models.
Google has pioneered MoE techniques over recent years, with techniques like GShard enabling the unprecedented scaling of models like PaLM. DeepMind is now building on Google's research to make its models more efficient. Gemini 1.5 can learn complex tasks faster while using fewer computing resources.
Context lengths of leading foundation models |
This efficiency gain is helping DeepMind iterate and deliver next-generation AI updates much quicker. Further optimizations to the MoE architecture are also coming soon.
Massive 1 Million Token Context Window Unlocks New Capabilities
Context window refers to the maximum amount of text, images, video or other data an AI model can ingest to reason about in a given query. Up till now, foundation models have been limited to context windows under 100,000 tokens.
Gemini 1.5 Pro blows past this limitation with an experimental 1 million token context window - equivalent to over 700,000 words of text, 1 hour of video, or 11 hours of audio content. This massive context window unlocks new sophisticated capabilities:
- Seamlessly analyzing, classifying and summarizing extremely long documents like 400-page mission transcripts
- Understanding complex events in lengthy videos like 44-minute silent films
- Reasoning about codebases with over 100,000 lines to suggest relevant modifications
Importantly, Gemini 1.5 Pro maintains high performance even as the context length grows to 1 million tokens. DeepMind research has even successfully tested up to 10 million tokens.
The expanded context window also enables stronger in-context learning - the ability for AI to learn new skills simply from information provided, with no extra training required. On translation tasks, Gemini 1.5 Pro can learn from manuals to translate at a human level with zero prior exposure.
Responsible Testing is Underway; a Limited Preview is Available Now
DeepMind says it is taking great care to test Gemini 1.5 Pro for safety, ethics and representation of harm before release. Building on evaluations for Gemini 1.0 models, new techniques are being developed to account for the unprecedented long-context window.
Gemini 1.5 Pro is launching in a limited preview for developers via Google's AI Studio and enterprise customers using Vertex AI. Signups are open now to try the 1 million token context window before the wider release expected later this year.
This long-awaited upgrade cements Google's leadership in AI and opens amazing new possibilities. With responsible testing and oversight, Gemini 1.5 could take AI assistance to new heights.