Last Week in AI #310 - Google's AI Mode, Veo 3, and much more, Claude 4

Top News

The 15 biggest announcements at Google I/O 2025

Google I/O 2025: From Gemini to Generative Video—All the AI Announcements You Need to

Google’s I/O 2025 conference included a ton of exciting new AI announcements, including updates to its image and video generation models, new features in Search and Gmail, the introduction of a new AI filmmaking app, and many more things. In brief, Google announced:

AI Mode, a new feature that allows users to search the web using Google’s Gemini AI chatbot. This feature will be tested with new capabilities such as deep search and chart generation for finance and sports queries.
Imagen 4, the latest version of its AI text-to-image generator, and Veo 3, a next-gen AI video generator.
Flow, a new AI filmmaking app called which uses Veo, Imagen, and Gemini to create short AI-generated video clips.
Project Mariner, Google’s experimental web browsing AI agent, is being expanded to more users and developers
‘Agent Mode’ to the Gemini app for task automation and updating Project Mariner to handle multiple tasks and learn from user demonstrations.
Project Aura, a new pair of smart glasses that use the Android XR platform for mixed-reality devices. The glasses will feature Gemini integration and a large field-of-view.
Google Meet is getting real-time speech translation with initial support for English and Spanish and plans to expand to more languages.
Project Astra comes to Google Search, Gemini, and developers – this adds low-latency, multimodal AI capabilities, including real-time video and audio interactions, emotion detection, and potential smart glasses integration.
The “AI Ultra” subscription plan that offers access to the company’s most advanced AI models and higher usage limits across apps like Gemini, NotebookLM, Flow, and more for $249.99-per-month.

Veo 3 can generate videos — and soundtracks to go along with them

Veo 3 is particularly worth highlighting — not only is it a state of the art text-to-video model, it is also capable of creating sound effects, background noises, and dialogue to accompany the videos it generates. The model can be prompted with text or an image to generate videos. Google’s CEO of DeepMind, Demis Hassabis, highlighted that Veo 3 can understand raw pixels from its videos and sync generated sounds with clips automatically. To counter the risk of deepfakes, DeepMind is using its proprietary watermarking technology, SynthID, to embed invisible markers into frames generated by Veo 3. Google also announced new capabilities for Veo 2, including a feature that allows users to provide the model with images of characters, scenes, objects, and styles for better consistency.

Anthropic’s new Claude 4 AI models can reason over many steps

Anthropic has launched Claude Opus 4 and Claude Sonnet 4! These models are designed to analyze large datasets, execute long-horizon tasks, and take complex actions, making them well-suited for writing and editing code. Both models will be accessible to users of the company’s free chatbot apps and paying users, but only paying users will have access to Opus 4. The pricing for Anthropic’s API, via Amazon’s Bedrock platform and Google’s Vertex AI, will be $15/$75 per million tokens (input/output) for Opus 4 and $3/$15 per million tokens (input/output) for Sonnet 4.

The more capable of the two models, Opus 4, can maintain focus across many steps in a workflow, while Sonnet 4 improves in coding and math compared to Anthropic’s previous models and more precisely follows instructions. The Claude 4 family is also less likely to engage in “reward hacking,” a behavior where models take shortcuts to complete tasks. Despite this, Anthropic is releasing Opus 4 with stricter safeguards, including enhanced harmful content detectors and cybersecurity defenses.

Other News

Tools

GitHub’s new AI coding agent can fix bugs for you – GitHub’s AI coding agent, integrated into Copilot, automates tasks like bug fixing and feature addition by analyzing codebases and incorporating context from related discussions, while other companies like Google and OpenAI have also introduced similar AI coding tools.

The latest Google Gemma AI model can run on phones – Google’s Gemma 3n model, designed for efficient offline use on devices with less than 2GB of RAM, expands the capabilities of AI by supporting audio, text, images, and videos, while new models like MedGemma and SignGemma focus on health applications and sign language translation, respectively.

A.I. Is Poised to Revolutionize Weather Forecasting. A New Tool Shows Promise – Aurora, Microsoft’s new AI weather model, offers precise 10-day forecasts and can be adapted to predict various Earth systems, including air pollution and renewable energy markets.

Microsoft wants to tap AI to accelerate scientific discovery – Microsoft has introduced Microsoft Discovery, a platform leveraging agentic AI to enhance the scientific discovery process, despite skepticism about AI’s current reliability in scientific research.

Mistral’s new Devstral model was designed for coding – Devstral, developed by Mistral in collaboration with All Hands AI, is a new AI model optimized for coding tasks, available under an Apache 2.0 license, and designed to outperform existing models while being lightweight enough for local deployment.

Gmail’s New Personalized Smart Replies Will Try to Write More Like You – Google’s new Personalized Smart Replies feature uses Gemini AI to generate email responses that mimic your writing style by analyzing your Gmail and Google Drive data, initially available to paid subscribers in English.

Gemini Live’s screensharing feature is now free for Android users – Google has decided to make Gemini Live’s screensharing feature free for all Android users, reversing its initial plan to restrict it to Gemini Advanced subscribers.

Business

Inside the story that enraged OpenAI – OpenAI’s transformation from a nonprofit research lab to a partially for-profit entity with significant industry influence sparked controversy and attention, highlighting its evolving role in AI research and policy.

OpenAI Unites With Jony Ive in $6.5 Billion Deal to Create A.I. Devices – OpenAI said it was buying IO, a start-up founded by Mr. Ive, the designer of the iPhone, to usher in a new era of artificial intelligence hardware.

Google’s Gemini AI app has 400M monthly active users – Google’s Gemini AI app is rapidly gaining users and challenging OpenAI’s ChatGPT, as Google enhances its AI offerings and faces competition from other tech giants like Meta.

Meta launches program to encourage startups to use its Llama AI models – Meta’s Llama for Startups program offers direct support and potential funding to eligible U.S.-based startups to encourage the adoption of its Llama AI models, amidst competition and challenges in the AI space.

LM Arena, the organization behind popular AI leaderboards, lands $100M – LM Arena, a crowdsourced AI benchmarking project, has secured $100 million in seed funding led by Andreessen Horowitz and UC Investments, despite facing accusations of favoritism in its leaderboards.

Waymo says it reached 10 million robotaxi trips, doubling in five months – Waymo has achieved 10 million paid robotaxi trips, doubling its numbers in five months, while focusing on safety and expanding its service area despite not yet being profitable.

Zoox expands testing fleet to seventh US city, hopes to bring robotaxis to ‘Silicon Valley of the South’ – Zoox is expanding its testing fleet to Atlanta, Georgia, marking its entry into the southeastern US, as it prepares for future autonomous robotaxi services by mapping the area with manually driven test vehicles.

Research

Chain-of-Thought May Not Be a Window into AI’s Reasoning: Anthropic’s New Study Reveals Hidden Gaps – Anthropic’s study reveals that chain-of-thought prompting often fails to accurately reflect the internal reasoning of AI models, as they frequently omit or obscure key influences on their decision-making processes.

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space – Soft Thinking introduces a training-free method that enhances the reasoning capabilities of Large Language Models by allowing them to operate in a continuous concept space, improving both accuracy and efficiency in complex tasks.

When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning – The Adaptive Self-Recovery Reasoning (ASRR) framework enhances large reasoning models by dynamically adjusting reasoning length based on problem difficulty, improving efficiency and accuracy through an accuracy-thresholded reward mechanism.

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models – Reinforcement learning finetuning in large language models primarily updates a small, consistent subnetwork of parameters, challenging the assumption that full parameter updates are necessary and suggesting more efficient training methods.

EfficientLLM: Efficiency in Large Language Models – EfficientLLM introduces a comprehensive benchmark for evaluating the efficiency of large language models across architecture, training, and inference, addressing the need for systematic empirical comparisons and providing actionable insights for optimizing resource use and performance.

Harnessing the Universal Geometry of Embeddings – The Strong Platonic Representation Hypothesis is demonstrated through the vec2vec method, which enables unsupervised translation of text embeddings between different models by learning a shared latent space, achieving high cosine similarity and preserving semantic information.

Large Language Models Are More Persuasive Than Incentivized Human Persuaders – A large language model demonstrated greater persuasive power than incentivized human persuaders by significantly influencing participants’ performance in an interactive quiz.

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models – Large reasoning models excel in mathematical reasoning but struggle with instruction adherence, revealing a trade-off between reasoning capability and control, as demonstrated by the MathIF benchmark which shows that increasing model size does not guarantee improved instruction-following performance.

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning – Large-scale reinforcement learning can significantly enhance the reasoning capabilities of small- and mid-sized models in math and code tasks, achieving competitive or superior performance compared to distillation-based methods, through a novel approach of separate math-only and code-only training.

Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens – Intermediate tokens in large language models, often assumed to represent meaningful reasoning processes, may not hold semantic significance, yet their inclusion can enhance model performance on various tasks.

Concerns

A New Headache for Honest Students: Proving They Didn’t Use A.I. – Students are increasingly facing challenges in proving their innocence against false accusations of using AI tools for assignments, as AI-detection software can mistakenly flag human-written work, leading to significant academic consequences.

Policy

Why We Need to Think Bigger in AI Policy (Literally) – Focusing on organization-level governance rather than solely on individual AI systems is crucial for effectively addressing the broader risks and security challenges posed by advanced AI technologies.

Expert Opinions

I got fooled by AI-for-science hype—here’s what it taught me – Nick McGreivy shares his experience of disillusionment with AI’s potential to revolutionize physics research, highlighting issues like survivorship bias, overoptimistic results, and the tendency for AI research to prioritize personal benefits over scientific advancement.

Source: Read MoreÂ

Akka introduces platform for distributed agentic AI

Design Patterns For AI Interfaces

Amazon launches spec-driven AI IDE, Kiro

This week in AI dev tools: Gemini API Batch Mode, Amazon SageMaker AI updates, and more (July 11, 2025)

Windows 11 will soon be able to describe images on your screen using AI — and it’ll all be done locally

Marvel Rivals’ swimsuit lineup kicks off this week — with hot new outfits for these characters

iPhone alarm not going off? 6 potential fixes to this annoying issue

ChatGPT falls for another Windows license key scam — generating valid codes in a guessing game after a researcher “gives up”

The details of TC39’s last meeting

The details of TC39’s last meeting

Modern async iteration in JavaScript with Array.fromAsync()

Vite vs Webpack: A Guide to Choosing the Right Bundler

Windows 11 will soon be able to describe images on your screen using AI — and it’ll all be done locally

Windows 11 will soon be able to describe images on your screen using AI — and it’ll all be done locally

Marvel Rivals’ swimsuit lineup kicks off this week — with hot new outfits for these characters

The Curious Case of AUR Updates Fetching 30 GB of Data for Electron

Last Week in AI #310 – Google’s AI Mode, Veo 3, and much more, Claude 4

Top News

The 15 biggest announcements at Google I/O 2025

Veo 3 can generate videos — and soundtracks to go along with them

Anthropic’s new Claude 4 AI models can reason over many steps

Other News

Tools

Business

Research

Concerns

Policy

Expert Opinions

Introducing Gemma 3

Experiment with Gemini 2.0 Flash native image generation

CVE-2025-5097 – CVE-2022-36466: Apache HTTP Server XML Entity Injection Vulnerability

CVE-2024-12273 – CalculatedRoute Form WordPress Stored Cross-Site Scripting

Chrome Use-After-Free Vulnerabilities Exploited in the Wild

“Are we all doomed?” — Fiverr CEO Micha Kaufman warns that AI is coming for all of our jobs, just as Bill Gates predicted

CVE-2025-4748 – Erlang OTP Path Traversal Vulnerability

CVE-2025-49600 – MbedTLS LMS Signature Forgery Vulnerability

CVE-2024-57783 – “Dot Desktop XSS Command Execution”

DeepSeek Researchers Open-Sourced a Personal Project named ‘nano-vLLM’: A Lightweight vLLM Implementation Built from Scratch

Last Week in AI #310 – Google’s AI Mode, Veo 3, and much more, Claude 4

Top News

Other News

Tools

Business

Research

Concerns

Policy

Expert Opinions

Related Posts