Last Week in AI #307 - GPT 4.1, o3, o4-mini, Gemini 2.5 Flash, Veo 2

Top News

OpenAI’s new GPT-4.1 AI models focus on coding

OpenAI has launched a new family of AI models, GPT-4.1, which includes GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models are designed to excel at coding and instruction following, with a 1-million-token context window, allowing them to process approximately 750,000 words at once. The models are part of OpenAI’s ambition to create AI coding models capable of complex software engineering tasks, including programming entire apps end-to-end. The GPT-4.1 models have been optimized for real-world use, with improvements in areas such as frontend coding, format adherence, and consistent tool usage. However, OpenAI acknowledges that the models become less reliable as the number of input tokens increases, and they often require more specific, explicit prompts.

OpenAI launches a pair of AI reasoning models, o3 and o4-mini

OpenAI has launched two new AI reasoning models, o3 and o4-mini, which are designed to pause and work through questions before responding. The o3 model is touted as OpenAI’s most advanced reasoning model, outperforming previous models in tests measuring math, coding, reasoning, science, and visual understanding capabilities. The o4-mini model offers a balance between price, speed, and performance. Both models can generate responses using tools in ChatGPT such as web browsing, Python code execution, image processing, and image generation. These models, along with a variant of o4-mini called “o4-mini-high”, are now available for subscribers to OpenAI’s Pro, Plus, and Team plans. The launch of these models is part of OpenAI’s efforts to compete with other tech giants in the global AI race.

Google’s newest Gemini AI model focuses on efficiency

Google is set to launch its new AI model, Gemini 2.5 Flash, on its AI development platform, Vertex AI. The model is designed for efficiency and dynamic computing, allowing developers to adjust processing time based on the complexity of queries. Gemini 2.5 Flash is a reasoning model, similar to OpenAI’s o3-mini and DeepSeek’s R1, which takes longer to answer questions as it fact-checks itself. It is ideal for high-volume and real-time applications like customer service and document parsing. Google also plans to bring Gemini models like 2.5 Flash to on-premises environments starting in Q3, with the models being available on Google Distributed Cloud (GDC), in collaboration with Nvidia.

Google rolls out its latest AI video generator to Gemini Advanced subscribers

Google has introduced Veo 2, an advanced text-to-video AI model, to its Gemini Advanced subscribers. The AI model is capable of generating high-resolution, eight-second videos in 720p from a text prompt, with a monthly limit on the number of videos that can be created. The videos, which are output in MP4 format, can be directly uploaded to TikTok and YouTube from mobile devices. Google claims that Veo 2 has an improved understanding of real-world physics and human motion, resulting in more lifelike scenes and fluid character movements. Alongside Veo 2, Google is also offering Whisk Animate, a tool that transforms images into videos, to Google One AI Premium subscribers.

Other News

Tools

Google just Launched Agent2Agent, an Open Protocol for AI agents to Work Directly with Each Other – Agent2Agent Protocol (A2A) enables secure, cross-platform communication between AI agents, allowing them to collaborate and function as cohesive digital teams across various enterprise environments.

OpenAI debuts Codex CLI, an open source coding tool for terminals – OpenAI’s Codex CLI is an open source tool that integrates AI models with command-line interfaces to assist in coding tasks, while also offering API grants to encourage its adoption.

xAI preparing updates for Grok, including Grok 3.5 release and new features – xAI is rapidly advancing its Grok product with upcoming releases of Grok 3.5 and Grok 4, new features like Vision in voice mode, memory reference capabilities, Google Drive integration, and an image editing tool, all while closing the feature gap with competitors.

Elon Musk’s AI company, xAI, launches an API for Grok 3 – Elon Musk’s AI company, xAI, has launched an API for its Grok 3 model, offering it in two versions with reasoning capabilities, but facing criticism for its pricing, context window limitations, and political biases.

WordPress.com is offering a new AI site builder – WordPress.com’s new AI-powered site builder allows users to quickly create basic websites with AI-generated content and design, though it currently lacks capabilities for complex ecommerce sites and requires a hosting plan for full functionality.

Microsoft is about to launch Recall for real this time – Microsoft is gradually rolling out the Recall feature, which captures screenshots for later retrieval, to Windows Insiders in the Release Preview channel, indicating an imminent wider launch after addressing security concerns.

Anthropic rolls out a $200-per-month Claude subscription – Anthropic introduces a new subscription plan for its AI chatbot Claude, offering higher usage limits and priority access to new features, with the potential to boost revenue through expensive subscriptions and educational offerings.

Canva is now in the coding and spreadsheet business – Canva is expanding its platform with generative AI-powered tools, including coding, spreadsheets, and an AI chatbot, to offer a comprehensive suite that integrates design and productivity features for seamless team collaboration.

Business

Ironwood is Google’s newest AI accelerator chip – Google’s Ironwood, the seventh-generation TPU optimized for inference, offers significant advancements in computing power, memory, and energy efficiency, positioning it as a formidable competitor in the AI accelerator market.

Ilya Sutskever taps Google Cloud to power his AI startup’s research – Ilya Sutskever’s new AI startup, Safe Superintelligence, has partnered with Google Cloud to utilize its TPU chips for advancing research in safe, superintelligent AI systems, with significant financial backing and a focus on improving AI model performance.

OpenAI co-founder Ilya Sutskever’s Safe Superintelligence reportedly valued at $32B – Safe Superintelligence, founded by Ilya Sutskever after leaving OpenAI, has secured significant funding to develop a safe superintelligence product, though details remain sparse.

Wayve’s self-driving tech is headed to Nissan vehicles – Nissan plans to integrate Wayve’s self-learning AI software into its ProPilot system by 2027, enhancing its driver assistance capabilities with advanced collision avoidance and adaptability across various environments.

Ex-OpenAI staffers file amicus brief opposing the company’s for-profit transition – Ex-OpenAI employees have filed an amicus brief supporting Elon Musk’s lawsuit against OpenAI’s transition to a for-profit model, arguing it contradicts the company’s mission and could compromise safety and ethical standards.

Access to future AI models in OpenAI’s API may require a verified ID – OpenAI plans to implement a Verified Organization process requiring government-issued ID verification to access advanced AI models, aiming to enhance security and prevent misuse or IP theft.

Hugging Face buys a humanoid robotics startup – Hugging Face’s acquisition of Pollen Robotics aims to expand its robotics efforts by selling the humanoid robot Reachy 2 and encouraging developers to enhance its open-source code.

OpenAI’s Countersuit of Elon Musk Alleges Harassment and ‘Sham’ Takeover Bid – OpenAI has filed a countersuit against Elon Musk, accusing him of harassment and undermining the company through various means, including a rejected takeover bid and public attacks, as part of an ongoing legal battle set to go to trial in 2026.

Research

No elephants: Breakthroughs in image generation – Google and OpenAI’s recent advancements in multimodal image generation allow AI to directly create images with greater precision and creativity, raising important questions about creative ownership and the future of visual media.

OpenAI Open Sources BrowseComp: A New Benchmark for Measuring the Ability for AI Agents to Browse the Web – BrowseComp is a new benchmark by OpenAI designed to evaluate AI agents’ ability to persistently browse the web and retrieve complex information, revealing significant performance gaps in current models compared to human capabilities.

Liquid: Language Models are Scalable and Unified Multi-modal Generators – Liquid introduces a scalable, decoder-only architecture for multi-modal generation and understanding, demonstrating that large language models can efficiently handle both visual and language tasks with shared vocabulary space, achieving superior performance in image generation and visual understanding while maintaining strong linguistic capabilities.

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding – FUSION introduces a novel framework for deep integration of vision and language in multimodal learning, utilizing text-guided vision encoding, context-aware alignment, and a synthesized QA dataset to enhance performance and address embedding misalignment.

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens – OLMoTrace is a system that efficiently traces language model outputs back to their training data by using verbatim matching and a novel parallel algorithm, providing users with an interactive tool for exploring the origins of specific word sequences in model responses.

Sample, Don’t Search: Rethinking Test-Time Alignment for Language Models – QAlign, a novel test-time alignment method, improves language model performance by sampling from an optimal aligned distribution without requiring access to model weights, outperforming existing methods like BoN, MV, and WMV across various benchmarks.

One-Minute Video Generation with Test-Time Training – Test-Time Training (TTT) layers, integrated into a pre-trained Diffusion Transformer, enable the generation of coherent one-minute videos with complex, multi-scene stories by efficiently handling long context lengths and dynamic motion, outperforming existing RNN-based methods.

TransMamba: Flexibly Switching between Transformer and Mamba – TransMamba introduces a novel framework that flexibly switches between Transformer and Mamba models using shared parameters, optimizing performance and efficiency across varying sequence lengths and layers.

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model – Seaweed-7B demonstrates that a medium-sized video generation model can achieve competitive performance and cost-efficiency by optimizing design choices, training strategies, and architectural considerations, challenging the notion that only large models can excel in this domain.

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models – M1, a hybrid reasoning model, achieves comparable performance to large transformer models on math benchmarks while offering a 3x speedup in inference throughput by efficiently transferring reasoning capabilities and optimizing memory usage.

S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models – S1-Bench evaluates the system 1 thinking capabilities of large reasoning models, revealing their inefficiency and accuracy issues on simple tasks despite their advanced reasoning abilities.

Google’s newest AI model is designed to help study dolphin ‘speech’ – Google DeepMind’s DolphinGemma AI model, trained with data from the Wild Dolphin Project, aims to decipher and generate dolphin vocalizations, enhancing research on dolphin communication and enabling real-time interaction using Google’s Pixel smartphones.

Concerns

Phase Two of Military AI Has Arrived – The Pentagon’s integration of generative AI into military operations raises concerns about the effectiveness of human oversight, challenges in data classification, and the potential for AI to influence critical decision-making processes.

Generative AI Is Learning to Spy for the US Military – Generative AI tools developed by Vannevar Labs are being used by the US military to efficiently collect, interpret, and analyze vast amounts of intelligence data, enhancing decision-making capabilities in dynamic situations.

‘An Overwhelmingly Negative And Demoralizing Force’: What It’s Like Working For A Company That’s Forcing AI On Its Developers – AI technology is being increasingly forced upon video game developers, leading to demoralization and resistance as it threatens their creativity, expertise, and job security.

Source: Read MoreÂ

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

In MCP era API discoverability is now more important than ever

Google’s DeepMind CEO lists 2 AGI existential risks to society keeping him up at night — but claims “today’s AI systems” don’t warrant a pause on development

Anthropic researchers say next-generation AI models will reduce humans to “meat robots” in a spectrum of crazy futures

Xbox just quietly added two of the best RPGs of all time to Game Pass

7 reasons The Division 2 is a game you should be playing in 2025

Mastering TypeScript: How Complex Should Your Types Be?

Mastering TypeScript: How Complex Should Your Types Be?

IDMC – CDI Best Practices

PWC-IDMC Migration Gaps

Google’s DeepMind CEO lists 2 AGI existential risks to society keeping him up at night — but claims “today’s AI systems” don’t warrant a pause on development

Google’s DeepMind CEO lists 2 AGI existential risks to society keeping him up at night — but claims “today’s AI systems” don’t warrant a pause on development

Anthropic researchers say next-generation AI models will reduce humans to “meat robots” in a spectrum of crazy futures

Xbox just quietly added two of the best RPGs of all time to Game Pass

Last Week in AI #307 – GPT 4.1, o3, o4-mini, Gemini 2.5 Flash, Veo 2

Top News

OpenAI’s new GPT-4.1 AI models focus on coding

OpenAI launches a pair of AI reasoning models, o3 and o4-mini

Google’s newest Gemini AI model focuses on efficiency

Google rolls out its latest AI video generator to Gemini Advanced subscribers

Other News

Tools

Business

Research

Concerns

Markus Buehler receives 2025 Washington Award

3 Questions: Visualizing research in the age of AI

I tried ChatGPT’s new image generator, and it shattered my expectations

Too many tabs? Try these browsers with better tab management than Chrome

IT Pros also guilty of risqué selfies on mobiles

CVE-2025-47686 – DELUCKS SEO Cross-site Scripting

Researchers at Physical Intelligence Introduce π-0.5: A New AI Framework for Real-Time Adaptive Intelligence in Physical Systems

Unable to tap on Search on android keyboard using pressKeyCode(int) method

New Google Cloud Next Agents are Here, and This is What You Should Know

DRLQ: A Novel Deep Reinforcement Learning (DRL)-based Technique for Task Placement in Quantum Cloud Computing Environments

Last Week in AI #307 – GPT 4.1, o3, o4-mini, Gemini 2.5 Flash, Veo 2

Top News

Other News

Tools

Business

Research

Concerns

Related Posts