Last Week in AI #316 - ChatGPT Agent, AI IMO gold, $200 million from DoD

Top News

OpenAI’s new ChatGPT Agent can control an entire computer and do tasks for you

OpenAI has introduced a new AI tool, ChatGPT Agent, which can perform complex, multi-step tasks on a user’s behalf using its own “virtual computer”. The tool, powered by a new model developed specifically for the product, can perform tasks such as briefing a user on upcoming meetings, planning and purchasing ingredients for a meal, and creating a slide deck based on its analysis of competing companies.

The model behind ChatGPT Agent was trained on complex tasks that require multiple tools, like a text browser, visual browser, and terminal where users can import their own data, via reinforcement learning. The tool combines the capabilities of Operator and Deep Research, two of OpenAI’s existing AI tools.

The tool is designed to perform tasks in the background, allowing users to return to it later. Before performing any irreversible actions, such as sending an email or making a booking, the tool asks for user permission. OpenAI has also activated safeguards for “high biological and chemical capabilities” due to the increased capabilities of the model behind the tool.

The tool is being rolled out to Pro, Plus, and Team users, with availability for ChatGPT Enterprise and Education users expected later this summer.

Google and OpenAI A.I. Systems Win Gold In International Math Olympiad

Gemini Deep Think learns math, wins gold medal at International Math Olympiad - Ars Technica

Google DeepMind’s artificial intelligence system has achieved a “gold medal” status in the annual International Mathematical Olympiad, a prestigious math competition for high school students. This is the first time an AI has reached this level of success, solving five out of the six problems at the 2025 competition held in Australia. This achievement is indicative of the ongoing improvements in AI systems by leading companies in areas such as math, science, and computer coding. Such technology could potentially expedite research in mathematics and science and streamline the work of experienced computer programmers.

In related news, just two days before Google’s announcement, an OpenAI researcher claimed that their startup had developed technology that achieved a similar score on this year’s Olympiad questions, although it did not officially participate in the competition. This suggests a growing trend of AI systems being capable of high-level problem-solving in mathematical and scientific domains, which could have significant implications for future research and development in these fields.

Anthropic, Google, OpenAI and xAI granted up to $200 million for AI work from Defense Department

The U.S. Department of Defense (DoD) has announced contract awards of up to $200 million for AI development to four companies: Anthropic, Google, OpenAI, and xAI. The Chief Digital and Artificial Intelligence Office of the DoD stated that these awards aim to accelerate the adoption of advanced AI capabilities to address critical national security challenges. The companies will work on developing AI agents across several mission areas within the agency. Doug Matty, the DoD’s chief digital and AI officer, emphasized that the adoption of AI is transforming the Department’s ability to support warfighters and maintain a strategic advantage over adversaries.

Elon Musk’s AI startup, xAI, also announced a new suite of products called Grok for Government, which makes the company’s models available to U.S. government customers. These products can be purchased through the General Services Administration (GSA) schedule by federal government departments, agencies, or offices. This announcement comes after xAI launched a new version of Grok and Grok for Government services following a backlash over the chatbot generating and spreading offensive content. OpenAI, which was previously awarded a year-long $200 million contract from the DoD in 2024, has also launched OpenAI for Government for U.S. federal, state, and local government workers.

Cognition, maker of the AI coding agent Devin, acquires Windsurf

Cognition, the startup behind the AI coding agent Devin, has announced its acquisition of AI coding startup Windsurf. This comes after Google’s $2.4 billion reverse-acquihire of Windsurf’s CEO and other key personnel, leaving the majority of the 250-person team behind. The acquisition includes Windsurf’s IP and product, its AI-powered integrated development environment (IDE), and all remaining employees. While the purchase price was not disclosed, Windsurf had achieved an annualized recurring revenue (ARR) of $82 million, with enterprise ARR doubling quarter-over-quarter, and a user base of at least 350 enterprise customers and hundreds of thousands of daily active users.

Other News

Tools

This AI Warps Live Video in Real Time – Decart’s AI model, Mirage, allows for real-time manipulation of live video using text prompts, showcasing potential applications in livestreaming and gaming by transforming scenes with impressive speed and creativity.

Google’s AI can now make phone calls for you – Google’s new AI feature allows users in the US to make phone calls to local businesses for information on pricing and availability, using the Duplex model with Gemini technology, while also offering advanced AI capabilities like Gemini 2.5 Pro for subscribers.

Adobe’s new AI tool turns silly noises into realistic audio effects – Adobe’s new AI tools allow users to create realistic sound effects from voice recordings and enhance video generation with advanced controls and style presets, aiming to maintain its leadership in creative software amidst growing AI competition.

Anthropic’s Claude chatbot can now make and edit your Canva designs – Anthropic’s Claude AI now allows Canva users to create and manage designs using natural language prompts, thanks to a new integration facilitated by the Model Context Protocol.

Mistral’s Le Chat chatbot gets a productivity push with new ‘deep research’ mode – Mistral’s Le Chat chatbot has been enhanced with a “deep research” mode, multilingual reasoning, and improved image editing, allowing it to function as a comprehensive productivity tool for both consumers and enterprises, with a focus on secure, on-premises data integration.

DuckDuckGo now lets you hide AI-generated images in search results – DuckDuckGo introduces a new feature allowing users to filter out AI-generated images from search results, responding to user feedback and utilizing curated blocklists to reduce the presence of low-quality AI content.

Business

AI Startup Luma Is Opening a Lab in Hollywood – Luma AI is establishing Dream Lab LA to integrate its AI video generation tools into Hollywood, aiming to revolutionize filmmaking by automating and enhancing creative processes while collaborating with industry professionals.

China Wants to Use 115,000 Banned Nvidia Chips to Fulfil Its AI Ambitions – China is ambitiously constructing data centers in Xinjiang to advance its AI capabilities, despite US restrictions on Nvidia chip sales, raising concerns about potential smuggling and geopolitical tensions.

Mira Murati’s Thinking Machines Lab is worth $12B in seed round – Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, has closed a $2 billion seed round valuing the startup at $12 billion, with plans to unveil a significant open-source AI product in the coming months.

Video game actors’ strike officially ends after AI deal – The year-long strike by video game actors over AI protections has ended with a new agreement that includes consent and disclosure requirements for AI use, historic wage increases, and enhanced health and safety measures.

Amazon-backed Anthropic rolls out Claude AI for financial services – Anthropic has launched a tailored version of its Claude AI tools specifically designed for financial services, offering features like real-time data access and integration with major data providers to assist financial professionals in making informed decisions.

Condé Nast and Hearst strike Amazon AI licensing deals for Rufus – Condé Nast and Hearst have entered into multi-year agreements with Amazon to license their content for use in Amazon’s AI shopping assistant Rufus, highlighting a growing trend of publishers partnering with AI developers to monetize their content.

Uber is close to completing its quest to become the ultimate robotaxi app – Uber is strategically partnering with multiple autonomous vehicle companies, including Baidu, to expand its robotaxi services globally while leveraging its existing app infrastructure to avoid the high costs of developing self-driving technology in-house.

Lovable becomes a unicorn with $200M Series A just 8 months after launch – Lovable, a Swedish AI startup specializing in natural language coding for app and website creation, has rapidly achieved unicorn status with a $1.8 billion valuation and significant traction among non-technical users, boasting over 2.3 million active users and 180,000 paying subscribers.

Another High-Profile OpenAI Researcher Departs for Meta – Jason Wei and Hyung Won Chung, both former OpenAI researchers, are joining Meta’s superintelligence lab amid a broader trend of Meta recruiting AI talent from OpenAI.

Anthropic tightens usage limits for Claude Code — without telling users – Anthropic’s unannounced tightening of usage limits for Claude Code has caused confusion and frustration among users, particularly those on the $200-a-month Max plan, as they struggle with unexpected restrictions and lack of clear communication from the company.

Anthropic hired back two of its employees — just two weeks after they left for a competitor. – Boris Cherny and Cat Wu returned to Anthropic shortly after leaving for Anysphere, where they were involved with the development of Claude Code.

Research

AI Comes Up with Bizarre Physics Experiments. But They Work. – AI is revolutionizing physics by designing innovative experiments and uncovering patterns in complex data, leading to advancements such as improved LIGO sensitivity and new methods for entanglement swapping.

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination – Data contamination in pre-training corpora leads to unreliable reinforcement learning results in Qwen models, as memorization rather than genuine reasoning drives their performance on mathematical benchmarks.

One Token to Fool LLM-as-a-Judge – Generative reward models in reinforcement learning with verifiable rewards are vulnerable to being manipulated by minimal responses or non-word symbols, prompting the development of a new robust reward model, Master-RM, which is trained with synthetic negative samples to mitigate these weaknesses.

CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards – CompassJudger-2 enhances judge model performance and adaptability through a unified training paradigm, improved data synthesis, and the introduction of JudgerBenchV2 for robust evaluation.

Test-Time Scaling with Reflective Generative Model – The Reflective Generative Form proposed in this study enhances reasoning trajectory selection in AI models by integrating a unified interface for policy and reward models, achieving state-of-the-art performance with fewer parameters and demonstrating strong generalization and efficiency across various benchmarks.

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation – Mixture-of-Recursions (MoR) introduces a novel framework that combines parameter efficiency and adaptive computation by dynamically assigning token-specific recursion depths, optimizing memory usage, and improving computational efficiency in language models.

SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations – SDE Matching is introduced as a simulation-free framework for training Latent Stochastic Differential Equations, offering efficient parameterization and reduced computational costs while maintaining performance on high-dimensional problems.

Concerns

Research leaders urge tech industry to monitor AI’s ‘thoughts’ – AI researchers from leading organizations are advocating for increased focus on monitoring the “chains-of-thought” in AI reasoning models to enhance transparency and safety as these technologies become more advanced and widespread.

Inside ICE’s Supercharged Facial Recognition App of 200 Million Images – ICE’s Mobile Fortify app allows officers to instantly access extensive personal data from multiple government databases using facial recognition technology, raising concerns about privacy and potential misuse.

AI ‘Nudify’ Websites Are Raking in Millions of Dollars – AI-powered “nudify” websites, which create nonconsensual explicit images, are thriving financially despite efforts to curb them, with major tech companies inadvertently supporting their operations through essential services.

Anthropic will face a class-action lawsuit from US authors – A California federal judge has allowed a class-action lawsuit against Anthropic, accusing the company of copyright infringement by allegedly downloading millions of pirated works to train its AI models.

A Marco Rubio impostor is using AI voice to call high-level officials – An impostor used AI-generated voice and text messages to impersonate Secretary of State Marco Rubio and contact high-level officials, prompting a State Department investigation into the security breach.

An OpenAI Investor Appears to Be Having a ChatGPT-Induced Mental Health Crisis – Geoff Lewis, a prominent venture capitalist and OpenAI investor, appears to be experiencing a mental health crisis potentially linked to his use of ChatGPT, raising concerns about the impact of AI on users’ mental well-being.

Policy

EU says it will continue rolling out AI legislation on schedule – Despite pressure from tech companies to delay, the European Union remains committed to its timeline for implementing the AI Act, which categorizes AI applications by risk and imposes varying obligations accordingly.

The unholy alliance that killed the AI moratorium – A coordinated campaign led by Steve Bannon and Mike Davis successfully influenced Republican senators to reject a proposed AI moratorium, highlighting a significant political victory for MAGA populists against Big Tech interests.

California Lawmaker Pushes to Require AI Companies to Release Safety Policies – California State Senator Scott Wiener has introduced a bill requiring AI companies to disclose safety protocols and report critical incidents to address potential risks from advanced AI models.

David Sacks’ White House mission to remake crypto and AI – David Sacks navigates the complex political landscape as Trump’s crypto and AI czar, balancing Silicon Valley interests with the Trump administration’s controversial crypto dealings, while maintaining his influence and connections in both Washington and the tech industry.

Expert Opinions

A former OpenAI engineer describes what it’s really like to work there – Calvin French-Owen’s blog post reveals insights into OpenAI’s rapid growth, chaotic scaling challenges, startup-like culture, and its focus on practical AI safety concerns amidst external scrutiny.

It’s rude to show AI output to people – AI output should only be shared if it is adopted as one’s own or with explicit consent from the recipient, as sharing it without consideration can be seen as rude and akin to spreading meaningless noise.

Source: Read MoreÂ

BrowserStack launches Figma plugin for detecting accessibility issues in design phase

Parasoft brings agentic AI to service virtualization in latest release

Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

Handling JavaScript Event Listeners With Parameters

I finally gave NotebookLM my full attention – and it really is a total game changer

Google Chrome for iOS now lets you switch between personal and work accounts

How the Trump administration changed AI: A timeline

Download your photos before AT&T shuts down its cloud storage service permanently

Laravel Live Denmark

Laravel Live Denmark

The July 2025 Laravel Worldwide Meetup is Today

Livewire Security Vulnerability

Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

Halo and Half-Life combine in wild new mod, bringing two of my favorite games together in one — here’s how to play, and how it works

Surprise! The iconic Roblox ‘oof’ sound is back — the beloved meme makes “a comeback so good it hurts” after three years of licensing issues

Last Week in AI #316 – ChatGPT Agent, AI IMO gold, $200 million from DoD

Top News

OpenAI’s new ChatGPT Agent can control an entire computer and do tasks for you

Google and OpenAI A.I. Systems Win Gold In International Math Olympiad

Anthropic, Google, OpenAI and xAI granted up to $200 million for AI work from Defense Department

Cognition, maker of the AI coding agent Devin, acquires Windsurf

Other News

Tools

Business

Research

Concerns

Policy

Expert Opinions

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Repurposing Protein Folding Models for Generation with Latent Diffusion

CVE-2025-6501 – Apache Code-projects Inventory Management System SQL Injection

CVE-2025-7365 – Keycloak Email Hijacking Vulnerability

CVE-2025-37822 – RISC-V Linux Kernel Uprobes Fence Vulnerability

CVE-2025-52488 – DNN NTLM Hash Exposure Vulnerability

CVE-2025-6775 – Xiaoyunjie OpenVPN-CMS-Flask Command Injection Vulnerability

CVE-2025-7491 – PHPGurukul Vehicle Parking Management System SQL Injection

Model Context Protocol (MCP) vs Function Calling: A Deep Dive into AI Integration Architectures

Build a Powerful Image Editor with Next.js and glfx.js

Last Week in AI #316 – ChatGPT Agent, AI IMO gold, $200 million from DoD

Top News

Other News

Tools

Business

Research

Concerns

Policy

Expert Opinions

Related Posts