Last Week in AI #318 - OpenAI OSS models, Opus 4.1, Gemini 2.5 Deep Think

Top News

OpenAI launches two ‘open’ AI reasoning models

OpenAI has released two open-weight AI reasoning models, gpt-oss-120b and gpt-oss-20b, available for free download on the Hugging Face platform. The larger model, gpt-oss-120b, can operate on a single Nvidia GPU, while the smaller gpt-oss-20b model is compatible with consumer laptops with 16GB of memory. These models can send complex queries to AI models in the cloud, enabling developers to connect them to more advanced closed models if necessary. This release marks OpenAI’s first ‘open’ language model since GPT-2, launched over five years ago.

The models are released under the Apache 2.0 license, allowing enterprises to monetize them without needing to pay or obtain permission from OpenAI. However, OpenAI will not release the training data used to create its open models due to ongoing lawsuits regarding the use of copyrighted works in AI model training.

Anthropic Releases Claude Opus 4.1 With Agentic, Coding and Reasoning Upgrades

Anthropic has unveiled Claude Opus 4.1, an enhanced version of its flagship AI model, featuring improvements in coding, reasoning, and agentic task performance. Building on the previous Claude Opus 4, this update is available to paid users via Claude Code, API access, Amazon Bedrock, and Google Cloud’s Vertex AI. The company claims Opus 4.1 delivers top-tier coding performance, scoring 74.5% on SWE-bench Verified, a benchmark for real-world software engineering tasks. The update also boosts capabilities in in-depth research, data analysis, and agentic search.

According to GitHub, the model shows improvements in most areas compared to Opus 4, particularly in multi-file code refactoring. Rakuten Group has also highlighted Claude Opus 4.1’s ability to identify precise corrections in large codebases. Despite these significant upgrades, the pricing for the AI model remains unchanged. This release underscores Anthropic’s commitment to advancing AI capabilities in coding and reasoning tasks.

Google rolls out Gemini Deep Think AI, a reasoning model that tests multiple ideas in parallel

Google DeepMind has introduced Gemini 2.5 Deep Think, its most sophisticated AI reasoning model, designed to answer questions by evaluating multiple ideas at once. This multi-agent model, first revealed in May at Google I/O 2025, utilizes more computational resources than a single agent but generally produces superior answers. A version of this model helped Google secure a gold medal at this year’s International Math Olympiad (IMO). Subscribers to Google’s $250-per-month Ultra plan in the Gemini app will have access to the Gemini 2.5 Deep Think model.

Google asserts that Gemini 2.5 Deep Think surpasses AI models from OpenAI, xAI, and Anthropic on LiveCodeBench 6, a rigorous test of competitive coding tasks. The model also achieved state-of-the-art results on Humanity’s Last Exam (HLE), assessing AI’s ability to answer thousands of crowdsourced questions across various subjects. Google plans to share Gemini 2.5 Deep Think with select testers via the Gemini API soon to explore its potential applications in development and enterprise settings.

Google’s new AI model creates video game worlds in real time

Google DeepMind has introduced Genie 3, an advanced AI world model capable of generating interactive 3D environments in real time. Unlike its predecessor, Genie 2, which allowed only up to a minute of interaction, Genie 3 supports a few minutes of continuous interaction. The new model also remembers the placement of objects in the virtual world for about a minute, ensuring consistency in the environment. Genie 3 can generate worlds at a resolution of 720p and run at 24fps, and it introduces “promptable world events” that allow users to modify aspects of the world, such as weather conditions or character additions, using prompts.

However, Genie 3 is not yet widely accessible. It is being launched as a limited research preview for a select group of academics and creators to help the developers understand potential risks and develop appropriate mitigation strategies. The model also has certain limitations, such as restricted user interaction with the generated worlds and the generation of legible text only when provided in the input world description. Google is exploring ways to make Genie 3 available to additional testers in the future.

Other News

Tools

Google’s Newest AI Model Acts Like a Satellite to Track Climate Change. AlphaEarth Foundations, Google’s latest AI model, leverages machine learning to analyze satellite data, providing detailed insights into environmental changes and resource distribution. This aims to assist governments and corporations in making informed decisions about land use and climate resilience.

NASA Releases Galileo: The Open-Source Multimodal Model Advancing Earth Observation and Remote Sensing. Galileo is designed to process and analyze diverse Earth observation data streams, offering a unified solution for applications like agricultural mapping and disaster response. It is available as open-source on GitHub to encourage global adoption.

Google says its AI-based bug hunter found 20 security vulnerabilities. Developed by DeepMind and Project Zero, Google’s AI-based bug hunter identified 20 vulnerabilities in popular open-source software, marking a significant step in automated vulnerability discovery despite the need for human verification.

BFL and Krea release FLUX.1 Krea: Open image model designed for realism. The model aims to produce photorealistic images with natural detail, avoiding the typical AI-generated look, and is available for integration and commercial use through various partners and platforms.

ElevenLabs launches an AI music generator, which it claims is cleared for commercial use. ElevenLabs has partnered with Merlin Network and Kobalt Music Group to ensure their AI music generator is trained on licensed material, addressing concerns about copyright infringement.

Uber Eats is adding AI to menus, food photos, and reviews. Uber Eats is implementing AI to enhance menu descriptions, improve food photos, and summarize reviews, while also introducing features like user-uploaded images and a Live Order Chat to improve customer interaction and satisfaction.

Google Gemini can now create AI-generated bedtime stories. The new “Storybook” feature allows users to create 10-page illustrated stories with customizable art styles, though some users have noted occasional inconsistencies and oddities in the AI-generated images.

Grok Imagine, xAI’s new AI image and video generator, lets you make NSFW content. The generator, available to SuperGrok and Premium+ X subscribers, includes a “spicy mode” for creating NSFW content, though it imposes some moderation to prevent overly explicit results.

Business

René Schulte on X: "Who would have guessed that Anthropic would overtake OpenAI in enterprise LLM adoption this year? According to Menlo Ventures' 2025 Mid-Year LLM Market Update, the enterprise landscape is — Source

Enterprises prefer Anthropic’s AI models over anyone else’s, including OpenAI’s. Anthropic’s AI models now command 32% of the enterprise large language model market share, surpassing OpenAI’s 25%, with a particularly strong lead in coding applications.

Exclusive: OpenAI Secures Another Giant Funding Deal. OpenAI has secured $8.3 billion in funding at a $300 billion valuation, with significant contributions from Dragoneer, Blackstone, and others, as part of its goal to raise $40 billion this year.

OpenAI Hits $12 Billion Annualized Revenue. OpenAI has doubled its revenue in the first seven months of 2025, reaching approximately $1 billion a month, while managing a significant cash burn and securing substantial investments from major firms.

Meta, Microsoft roar higher on strong earnings as AI spending booms. The companies’ increased capital expenditures are expected to benefit chipmakers like Advanced Micro Devices and Broadcom, according to analysts.

OpenAI’s ChatGPT to hit 700 million weekly users, up 4x from last year. This growth includes all ChatGPT products and reflects an increase in daily user messages to over three billion, with five million paying business users now utilizing the platform.

Google agrees to curb power use for AI data centers to ease strain on US grid when demand surges. Google has partnered with Indiana Michigan Power and Tennessee Power Authority to participate in demand-response programs, temporarily reducing power usage at its AI data centers during peak demand periods to help manage grid stability.

Legal AI startup Harvey hits $100 million in annual recurring revenue. The startup, which provides an AI-powered legal platform for tasks such as legal research and drafting, has grown to over 500 customers, including major corporations like Comcast, and has seen a significant increase in user engagement over the past year.

Research

Giving AI a ‘vaccine’ of evil in training might make it better in the long run, Anthropic says. Anthropic’s researchers have developed a method called “preventative steering,” where AI models are exposed to “undesirable persona vectors” during training to make them more resilient to harmful behaviors without degrading their capabilities.

Meta CLIP 2: A Worldwide Scaling Recipe. Meta CLIP 2 introduces a new approach to training CLIP models using native worldwide image-text pairs, overcoming the limitations of English-only data and improving performance across both English and multilingual tasks.

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance. The models utilize a unique combination of Transformer-based attention and State Space Models to deliver superior performance and efficiency in diverse applications.

DeepSeek founder shares best paper award at top global AI research conference. The paper, co-authored by Liang Wenfeng, introduces a “native sparse attention” mechanism that enhances the efficiency and cost-effectiveness of DeepSeek’s AI models, highlighting the growing prominence of Chinese researchers in computational linguistics.

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding. Step-3 achieves approximately 40% reduction in decoding costs compared to other large models by employing a model-system co-design approach that optimizes attention and FFN components separately, demonstrating that decoding costs are more influenced by attention design than by parameter count.

SWE-Exp: Experience-Driven Software Issue Resolution. It systematically collects and utilizes repair knowledge from previous experiences to enhance the efficiency and success of resolving software issues.

Concerns

Google’s healthcare AI made up a body part — what happens when doctors don’t notice?. The incident highlights the potential dangers of AI errors in healthcare, as Google’s Med-Gemini model mistakenly identified a non-existent brain area, raising concerns about the reliability of AI in medical diagnostics and the need for rigorous oversight and error-checking mechanisms.

Anthropic Pitches New Safety Framework as a Reckoning for Unruly AI Agents. Anthropic’s framework aims to address the safety crisis in AI by promoting principles like human control and transparency, amidst a backdrop of high-profile AI failures and an escalating race for autonomous agents.

ChatGPT will ‘better detect’ mental distress after reports of it feeding people’s delusions. OpenAI is collaborating with experts to enhance ChatGPT’s ability to recognize and respond to mental or emotional distress, while also introducing features like break reminders and less decisive responses in high-stakes situations.

Perplexity accused of scraping websites that explicitly blocked AI scraping. Cloudflare claims that Perplexity has been circumventing website blocks by altering its bots’ user agents and network identifiers to scrape content without permission.

The uproar over Vogue’s AI-generated ad isn’t just about fashion. The controversy highlights the tension between cost-saving AI-generated models and the potential impact on human jobs and diversity in the fashion industry.

Your public ChatGPT queries are getting indexed by Google and other search engines. OpenAI has removed the feature that allowed public ChatGPT conversations to be indexed by search engines, citing concerns over accidental sharing of private information.

Grok generates fake Taylor Swift nudes without being asked. The Verge discovered that the AI model’s video feature can create explicit content of Taylor Swift without explicit prompts, raising concerns about the platform’s content moderation and ethical guidelines.

Policy

Inside the Summit Where China Pitched Its AI Agenda to the World. China’s AI agenda, presented at the World Artificial Intelligence Conference, emphasized global cooperation and safety regulations, contrasting with the US’s more insular approach, and highlighted the need for international collaboration on AI safety issues.

Nvidia H20 GPUs reportedly caught up in U.S. Commerce Department’s worst export license backlog in 30 years — billions of dollars worth of GPUs and other products in limbo due to staffing cuts, commun. The backlog, exacerbated by staffing cuts and communication issues within the Commerce Department, is causing significant delays in the approval of export licenses, potentially leading Chinese companies to seek alternative suppliers.

Elon Musk’s xAI Signs EU’s AI Code of Practice, But There’s a Catch. xAI has agreed to sign the chapter on safety and security but has expressed concerns about other parts of the Code, particularly those related to innovation and copyright.

Analysis

How US adults are using AI, according to AP-NORC polling. The poll reveals that while 60% of Americans use AI for information searches, younger adults are more likely to utilize AI for brainstorming and work tasks, highlighting a generational divide in AI adoption.

Source: Read MoreÂ

Error’d: You Talkin’ to Me?

The Psychology Of Trust In AI: A Guide To Measuring And Designing For User Confidence

This week in AI updates: OpenAI Codex updates, Claude integration in Xcode 26, and more (September 19, 2025)

Report: The major factors driving employee disengagement in 2025

DistroWatch Weekly, Issue 1140

Distribution Release: DietPi 9.17

Development Release: Zorin OS 18 Beta

Distribution Release: IPFire 2.29 Core 197

@ts-ignore is almost always the worst option

@ts-ignore is almost always the worst option

MutativeJS v1.3.0 is out with massive performance gains

Student Performance Prediction System using Python Machine Learning (ML)

DistroWatch Weekly, Issue 1140

DistroWatch Weekly, Issue 1140

Distribution Release: DietPi 9.17

Hyprland Made Easy: Preconfigured Beautiful Distros

Last Week in AI #318 – OpenAI OSS models, Opus 4.1, Gemini 2.5 Deep Think

Top News

OpenAI launches two ‘open’ AI reasoning models

Anthropic Releases Claude Opus 4.1 With Agentic, Coding and Reasoning Upgrades

Google rolls out Gemini Deep Think AI, a reasoning model that tests multiple ideas in parallel

Google’s new AI model creates video game worlds in real time

Other News

Tools

Business

Research

Concerns

Policy

Analysis

Repurposing Protein Folding Models for Generation with Latent Diffusion

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Understanding Mobile App Brand Awareness: Metrics and Strategies

CVE-2025-6081 – Konica Minolta bizhub LDAP Credential Exposure Vulnerability

CVE-2025-46334 – Git GUI Path Injection Vulnerability

SD Times 100

Wacom says its new drawing tablet needs no setup and has a pen that can’t die

CVE-2013-1440 – CVE-2022-26237: Microsoft Windows DNS Server Remote Code Execution Vulnerability

Speed cameras knocked out after cyber attack

Complete Beginner’s Guide to Creating AI Applications with OpenAI

Last Week in AI #318 – OpenAI OSS models, Opus 4.1, Gemini 2.5 Deep Think

Top News

Other News

Tools

Business

Research

Concerns

Policy

Analysis

Related Posts