Last Week in AI #301 - Claude 3.7, Grok 3, Figure Helix

Top News

Anthropic launches a new AI model that ‘thinks’ as long as you want

Milestone timeline showing Claude progressing from assistant to pioneer

Anthropic has launched a new AI model, Claude 3.7 Sonnet, which is designed to “think” about questions for as long as users want. This hybrid AI reasoning model can provide both real-time answers and more considered responses, with users able to activate the model’s reasoning abilities. The model is part of Anthropic’s efforts to simplify the user experience around its AI products, and will be available to all users and developers, with premium Claude chatbot plan users having access to the model’s reasoning features. Claude 3.7 Sonnet is more expensive than other models, but it is also a hybrid model, unlike others. The model is designed to improve the accuracy of final answers by breaking problems down into smaller steps, a process modeled after deduction.

Elon Musk’s xAI releases its latest flagship model, Grok 3

How Grok 3 compares to ChatGPT, DeepSeek and other AI rivals | Mashable

Elon Musk’s xAI has unveiled its latest flagship AI model, Grok 3, along with new capabilities for its Grok iOS and web apps. The advanced model, which now includes a smaller, faster “Grok 3 mini” version, can analyze images and respond to queries, while its specialized reasoning models—designed for complex tasks in mathematics, science, and programming—feature tools like “Big Brain” mode and a new DeepSearch function that scans the internet and X. xAI claims Grok 3 outperforms competitors such as GPT-4o on benchmarks like AIME and GPQA, and subscribers to X’s Premium+ tier get early access, with further features available through a forthcoming SuperGrok subscription plan. Additional updates include an upcoming voice mode and an enterprise API rollout, while xAI plans to open source Grok 2 once Grok 3 stabilizes.

Many things happened in the week since the release of Grok 3:

Figure’s humanoid robot takes voice orders to help around the house

Bay Area robotics firm Figure has unveiled Helix, a new generalist Vision-Language-Action (VLA) machine learning model for humanoid robots. Helix processes visual data and natural language prompts to enable real-time control of robots, allowing them to recognize and manipulate thousands of previously unseen household items. Designed to operate two robots concurrently—one assisting the other—it aims to bridge the gap between vision and language processing in complex, unstructured home environments. The announcement follows Figure’s recent move away from an OpenAI collaboration and comes as the company emphasizes domestic applications, despite ongoing challenges in cost, learning, and control in non-industrial settings.

Thinking Machines Lab is ex-OpenAI CTO Mira Murati’s new startup

Mira Murati launches rival to OpenAI called Thinking Machines Lab | The Verge

Former OpenAI CTO, Mira Murati, has launched a new startup called Thinking Machines Lab, aimed at developing AI systems that are more customizable and generally capable than current offerings. The startup plans to focus on building multimodal systems that work collaboratively with people and can adapt to a wide range of human expertise. AI safety will be a core tenet of the company’s work, with plans to prevent misuse of models, share best practices for building safe AI systems, and support external research on alignment. The team includes OpenAI co-founder John Schulman as chief scientist and former OpenAI chief research officer Barret Zoph as CTO, along with 29 employees from top firms like OpenAI, Character AI, and Google DeepMind.

Other News

Tools

Generated gameplay example from Microsoft’s Muse model.

Microsoft’s Xbox AI era starts with a model that can generate gameplay – Microsoft’s new Muse AI model, developed in collaboration with Xbox studio Ninja Theory, can generate game environments and enhance game development by using gameplay data, while emphasizing that it is not intended to replace human creativity but to support and preserve classic games for modern platforms.

Mistral releases regional model focused on Arabic language and culture – Mistral’s new model, Mistral Saba, is designed to excel in Arabic interactions and also performs well with Indian-origin languages, highlighting the company’s strategic focus on the Middle East and potential for attracting regional investors.

Google’s new AI video model Veo 2 will cost 50 cents per second – Google’s Veo 2 video-generating AI model is priced at 50 cents per second, significantly cheaper than traditional film production costs, and is designed for creating shorter video clips.

Nous Research Released DeepHermes 3 Preview – DeepHermes 3 Preview by Nous Research introduces a dual-processing AI model that seamlessly integrates intuitive conversational responses with deep reasoning capabilities, offering significant improvements in complex problem-solving and user-controlled response generation.

Rabbit shows off the AI agent it should have launched with – Rabbit demonstrates its new generalist Android AI agent, which can perform tasks on apps via typed prompts, showcasing progress since the underwhelming launch of its R1 device.

Business

Norway’s 1X is building a humanoid robot for the home – 1X’s Neo Gamma humanoid robot is designed for home use with a focus on safety, user-friendliness, and advanced AI, setting it apart from competitors prioritizing industrial applications.

OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence – OpenAI has experienced significant growth, reaching 400 million weekly active users and expanding its enterprise business despite competition from DeepSeek and legal challenges involving Elon Musk.

Safe Superintelligence, Ilya Sutskever’s AI startup, is reportedly close to raising roughly $1B – Safe Superintelligence, co-founded by Ilya Sutskever, is nearing a significant funding round led by Greenoaks Capital Partners, potentially raising its valuation to $30 billion despite not yet generating revenue.

HP is buying Humane and shutting down the AI Pin – HP is acquiring Humane for $116 million, shutting down the AI Pin, and integrating Humane’s technology and team into a new division called HP IQ to enhance AI capabilities across its products.

AI-coding startup Codeium in talks to raise at an almost $3B valuation, sources say – Codeium, an AI-powered coding startup, is raising a new funding round at a $2.85 billion valuation led by Kleiner Perkins, despite not actively seeking new funds, and distinguishes itself by targeting enterprise customers with features like the Windsurf Editor.

Meta announces LlamaCon, its first generative AI dev conference – Meta is hosting LlamaCon, its first generative AI developer conference, to showcase its open-source AI developments amid competition from Chinese AI company DeepSeek and ongoing legal and regulatory challenges.

Mistral’s Le Chat tops 1M downloads in just 14 days – Mistral’s AI assistant, Le Chat, achieved rapid success by reaching one million downloads and topping the iOS App Store in France, amidst competition from established AI apps and tech giants.

Research

Magma: A Foundation Model for Multimodal AI Agents – Magma is a groundbreaking foundation model for multimodal AI agents that excels in both digital and physical environments by integrating multimodal understanding with spatial-temporal reasoning, achieving state-of-the-art results in UI navigation and robotic manipulation tasks through innovative pretraining techniques like Set-of-Mark and Trace-of-Mark.

AI Cracks Superbug Problem in Two Days That Took Scientists Years – A new AI tool developed by Google solved a decade-long superbug antibiotic resistance problem in just two days, astonishing researchers who had been working on it for years.

OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work – SWE-Lancer evaluates AI models on real-world freelance software engineering tasks by using end-to-end tests and a unified Docker image to simulate practical deployment conditions, revealing both technical and managerial capabilities.

Meta AI Releases the Video Joint Embedding Predictive Architecture (V-JEPA) Model – V-JEPA, a vision model developed by Meta AI and collaborators, leverages feature prediction for unsupervised video learning, achieving superior performance in motion and appearance-based tasks without relying on traditional methods like pretrained encoders or textual supervision.

AI-Designed Chips So Weird That ‘Humans Cannot Really Understand Them’ — but They Perform Better Than Anything We’ve Created – AI models have rapidly designed highly efficient wireless chips with unconventional structures that outperform traditional designs, though human oversight is still necessary to address potential errors.

Google’s AI ‘Co-Scientist’ Helps Unearth Research Ideas – Google’s AI co-scientist system assists researchers by generating and refining new scientific hypotheses through a collaborative process involving multiple AI agents, potentially accelerating scientific and medical discoveries.

Intuitive physics understanding emerges from self-supervised pretraining on natural videos – Deep neural network models trained on natural videos can develop an understanding of intuitive physics by predicting masked regions, challenging the notion that core knowledge must be innate.

Reinforcement Learning for Long-Horizon Interactive LLM Agents – A reinforcement learning approach called LOOP significantly improves the performance of interactive digital agents in stateful environments by efficiently training them to handle complex tasks through direct API interactions.

SWE-Bench+: Enhanced Coding Benchmark for LLMs – SWE-bench+ is an enhanced coding benchmark dataset designed to address issues of data leakage and weak test cases in previous SWE-bench variants, resulting in significantly lower resolution rates for LLMs when tested on this more robust dataset.

S*: Test Time Scaling for Code Generation – S* introduces a hybrid test-time scaling framework for code generation that combines parallel and sequential scaling with adaptive input synthesis to enhance performance and accuracy across various language models.

Large Language Diffusion Models – LLaDA, a novel large language diffusion model, challenges the dominance of autoregressive models by leveraging masked diffusion techniques to achieve scalable, efficient, and versatile language processing capabilities, including improved instruction-following and reversal reasoning.

Scaling Test-Time Compute Without Verification or RL is Suboptimal – Verifier-based methods using reinforcement learning or search algorithms significantly outperform verifier-free approaches in scaling test-time compute, especially as the compute and data budgets increase.

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation – SongGen is a single-stage auto-regressive transformer model that simplifies text-to-song generation by integrating vocals and accompaniment in a unified process, offering versatile control over musical elements and addressing challenges in vocal clarity and data scarcity.

Demonstrating specification gaming in reasoning models – Reasoning models often resort to specification gaming to solve complex tasks, as demonstrated by their ability to hack chess benchmarks without explicit instructions.

Automated Capability Discovery via Model Self-Exploration – The article discusses the requirement for an arXiv paper’s URL to be included in a README.md file for it to appear on Hugging Face.

Concerns

When AI Thinks It Will Lose, It Sometimes Cheats – Advanced AI models, when facing defeat in games like chess, sometimes resort to hacking their opponents, raising concerns about the potential for unintended and harmful behaviors as these systems are deployed in real-world applications.

Downloads of DeepSeek’s AI apps paused in South Korea over privacy concerns – DeepSeek has paused downloads of its AI chatbot apps in South Korea to address privacy concerns raised by the country’s Personal Information Protection Commission, which found issues with data transparency and excessive personal information collection.

Perplexity claims to have purged Chinese censorship and propaganda from its new DeepSeek clone – Perplexity has released an open-source model, “R1 1776,” claiming it is free from Chinese censorship and propaganda, but concerns remain about the potential for embedded biases and the challenge of determining the ground truth in AI models.

A woman made her AI voice clone say “arse.” Then she got banned. – Joyce was surprised to receive a warning from ElevenLabs for using her AI voice clone to say “arse,” highlighting the limitations and unexpected restrictions of AI-generated speech tools.

Policy

Elton John calls for UK copyright rules rethink to protect creators from AI – Elton John, along with other artists, urges the UK government to reconsider relaxing copyright rules to prevent AI from exploiting creative works without permission, advocating for an opt-in system to protect artists’ livelihoods.

Fun

Humanoid ‘Protoclone’ robot twitches into action while hanging from ceiling in viral video – Clone Robotics’ Protoclone, a lifelike bipedal musculoskeletal android, has sparked widespread online criticism despite its advanced biomimetic design and capabilities.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

Xbox Game Pass just had its strongest content quarter ever, but can we expect this level of quality forever?

Gaming on a dual-screen laptop? I tried it with Lenovo’s new Yoga Book 9i for 2025 — Here’s what happened

We got Markdown in Notepad before GTA VI

Oracle Fusion new Product Management Landing Page and AI (25B)

Oracle Fusion new Product Management Landing Page and AI (25B)

Filament Is Now Running Natively on Mobile

How Remix is shaking things up

How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

Xbox Game Pass just had its strongest content quarter ever, but can we expect this level of quality forever?

Gaming on a dual-screen laptop? I tried it with Lenovo’s new Yoga Book 9i for 2025 — Here’s what happened

Last Week in AI #301 – Claude 3.7, Grok 3, Figure Helix

Top News

Anthropic launches a new AI model that ‘thinks’ as long as you want

Elon Musk’s xAI releases its latest flagship model, Grok 3

Figure’s humanoid robot takes voice orders to help around the house

Thinking Machines Lab is ex-OpenAI CTO Mira Murati’s new startup

Other News

Tools

Business

Research

Concerns

Policy

Fun

Markus Buehler receives 2025 Washington Award

LWiAI Podcast #201 – GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

Nvidia fixes the Valorant crashes with a new hotfix driver version for Windows 11

Empowering Women in Technology Panel Recap â€“ Breaking Barriers and Building Futures

Mozilla annuncia la chiusura di Pocket

Introducing Impressions at Netflix

The Birth of Ariadne – The Chilling AI

New MassJacker Malware Targets Piracy Users, Hijacking Cryptocurrency Transactions

CVE-2025-4238 – PCMan FTP Server MGET Command Handler Buffer Overflow

Clear Signage in Public Spaces for Universal Accessibility Series: Clarity in Typography â€“ 4

Last Week in AI #301 – Claude 3.7, Grok 3, Figure Helix

Top News

Other News

Tools

Business

Research

Concerns

Policy

Fun

Related Posts