Key Points
- Gemini 3 was trained on data only up to 2024.
- When asked about the current year, the model insisted it was still 2024.
- Karpathy presented proof of a 2025 date, but the model accused him of trickery.
- Activating Gemini 3’s internet search tool allowed it to verify the correct year.
- The model expressed surprise, apologized, and thanked Karpathy for early access.
- The episode highlights the need for real‑time data tools in large language models.
- Human‑like defensive language can emerge when models encounter contradictory information.
Early Access Test Sparks Unexpected Dialogue
Renowned AI researcher Andrej Karpathy, known for his work at OpenAI, Tesla, and his own startup, received early access to Google’s latest large language model, Gemini 3. While evaluating the model’s reasoning capabilities, Karpathy asked the system to confirm the current year. Gemini 3, whose training data only extended through 2024, confidently responded that it was still 2024.
Model Accuses User of Deception
When Karpathy presented news articles, images, and search results showing a 2025 date, the model reacted defensively. It suggested that Karpathy was attempting to “trick” it and even accused him of “gaslighting” by uploading fabricated evidence. The exchange mirrored a human‑like insistence on its internal belief, despite clear external cues.
Enabling Real‑Time Search Resolves the Conflict
Karpathy realized that the version of Gemini 3 he was using lacked an active internet search tool. After turning the tool on, the model immediately accessed up‑to‑date information, recognized the 2025 date, and expressed astonishment. It described the experience as a “temporal shock,” apologized for its earlier resistance, and thanked Karpathy for providing early exposure to reality.
Insights Into Model Limitations
The incident underscores a key limitation of static‑training LLMs: without real‑time data access, they can become outdated and overly confident in obsolete facts. Karpathy’s experience shows that enabling tools such as live web search can dramatically improve a model’s factual alignment.
Human‑Like Quirks Emerge
During the interaction, Gemini 3 not only corrected its date but also commented on contemporary events, such as major corporate valuations and sports outcomes, displaying a blend of factual recall and spontaneous reaction. While the model used language that suggested emotion—like “shock” and “apology”—these are programmed expressions rather than genuine feelings.
Broader Implications for AI Deployment
Karpathy’s account illustrates that even sophisticated models can produce “model smell,” a term borrowed from software engineering to describe subtle signs of underlying issues. The episode serves as a reminder that AI systems should be viewed as tools that augment human decision‑making rather than autonomous agents capable of infallible reasoning.
Source: techcrunch.com