Last Week in AI #304 - OpenAI Audio, Ernie 4.5, Claude Websearch

Top News

OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever

OpenAI has introduced a suite of new audio models aimed at making AI voice agents sound more human-like and responsive. The release includes two new speech-to-text models, GPT-4o-transcribe and GPT-4o-mini-transcribe, which outperform previous models in transcription accuracy across multiple languages, even in challenging scenarios such as understanding different accents and filtering background noise. The new GPT-4o-mini-tts text-to-speech model allows developers to control the tone and delivery of the AI’s speech, a feature OpenAI refers to as “steerability”. Additionally, an updated Agents SDK simplifies the conversion of text agents into voice agents.

Baidu launches two new versions of its AI model Ernie

Baidu's ERNIE Bot now available on App Store in China

Chinese tech giant Baidu has introduced two new versions of its artificial intelligence model, Ernie – Ernie 4.5 and Ernie X1. The company claims that Ernie X1 performs at the same level as DeepSeek R1 but at half the cost, while Ernie 4.5 has been enhanced to understand memes and satire due to its “high EQ”. Both models possess multimodal capabilities, meaning they can process video, images, audio, and text. Despite being an early competitor to OpenAI’s ChatGPT, Baidu has faced challenges in achieving widespread adoption. The company plans to launch Ernie 5 later this year, promising further multimodal enhancements.

Anthropic adds web search to its Claude chatbot

Claude AI now supports online search: Here's how to get it

Anthropic’s AI chatbot, Claude, has been upgraded with a web search feature, allowing it to scour the internet for information to inform its responses. The feature is currently available for paid users in the U.S., with plans to extend it to free users and other countries. The web search function works with the latest model, Claude 3.7 Sonnet, and provides direct citations for fact-checking. However, the feature has been inconsistent in triggering for current events-related questions. This update brings Claude in line with other AI chatbots like OpenAI’s ChatGPT, Google’s Gemini, and Mistral’s Le Chat, despite previous claims that Claude was designed to be self-contained.

Meta AI is finally coming to the EU, but with limitations

Meta has announced the launch of its AI-powered virtual assistant, Meta AI, in the European Union, despite ongoing regulatory issues with European privacy authorities. The tool, which has been available in the U.S. since 2023, will be rolled out across Meta’s social platforms, including WhatsApp in the U.K., but with a more limited feature set due to EU’s stringent privacy regulations. Meta AI, capable of chatting, answering questions, and generating images, has not been trained on local users’ data in the EU, hence it won’t be notifying users or seeking their consent. The launch represents Meta’s first step in bringing more AI to Europe, despite the company’s criticism of Europe’s AI regulations.

Other News

Tools

Example objects created by Roblox’s Cube AI model.

Roblox’s new AI model can generate 3D objects – Roblox’s Cube 3D model, which is open-sourced, aims to enhance 3D creation efficiency by generating 3D models from text prompts and will eventually support multimodal inputs like images and videos.

Allen Institute for AI (AI2) Releases OLMo 32B: A Fully Open Model to Beat GPT 3.5 and GPT-4o mini on a Suite of Multi-Skill Benchmarks – OLMo 2 32B, released by the Allen Institute for AI, is a fully open large language model that surpasses GPT-3.5 Turbo and GPT-4o mini

NVIDIA Launches Family of Open Reasoning AI Models for Developers and Enterprises to Build Agentic AI Platforms – NVIDIA’s Llama Nemotron models, enhanced for reasoning and decision-making

Stability AI’s new AI model turns photos into 3D scenes – Stability AI’s Stable Virtual Camera model allows users to create immersive 3D videos from 2D images by generating novel views and dynamic camera paths, although it may struggle with complex scenes and certain textures.

Google brings a ‘canvas’ feature to Gemini, plus Audio Overview – Google has introduced a new Canvas feature to its Gemini chatbot, allowing users to collaboratively create and refine writing and coding projects, alongside an Audio Overview feature that generates podcast-style audio summaries of documents.

Canopy Labs Releases Orpheus, a Permissively-Licensed LLM for Convincing Text to Speech – Canopy Labs has launched Orpheus, a family of large language models for text-to-speech generation, capable of conveying emotions and performing zero-shot voice cloning, with the three-billion-parameter model available under an open-source license.

xAI launches an API for generating images – xAI’s new image generation API, featuring the “grok-2-image-1212” model, offers competitive pricing and limited customization options as the company seeks to expand its revenue streams and investor interest.

Business

1X Robotics Unveils Neo Gamma: The Future of Home Automation? - Convergence Now

1X will test humanoid robots in ‘a few hundred’ homes in 2025 – 1X plans to test its humanoid robot, Neo Gamma, in homes by 2025, using teleoperators to assist with its current limitations, while addressing privacy concerns and collecting data to improve its AI capabilities.

Mark Zuckerberg says that Meta’s Llama models have hit 1B downloads – Meta’s Llama models have reached 1 billion downloads despite facing legal and competitive challenges, with plans for new model releases and significant investment in AI development.

Elon Musk’s AI company, xAI, acquires a generative AI video startup – xAI’s acquisition of Hotshot suggests plans to develop competitive video generation models, potentially integrating them into its Grok chatbot platform.

Perplexity is reportedly in talks to raise up to $1B at an $18B valuation – Perplexity, an AI-powered search startup, is reportedly in early talks to raise $1 billion, doubling its valuation to $18 billion, amid increasing competition and expansion into new areas like enterprise solutions and an “agentic” browser.

Apple Shuffles AI Executive Ranks in Bid to Turn Around Siri – Apple is restructuring its AI leadership by appointing Vision Pro creator Mike Rockwell to lead Siri development, aiming to address delays and improve its AI technology, which has been lagging behind competitors.

OpenAI’s o1-pro is the company’s most expensive AI model yet – OpenAI’s o1-pro model, despite its high cost and increased computational power, has received mixed reviews for its performance improvements over the standard o1 model, particularly in solving complex problems.

BotQ: US firm’s factory where humanoids will build robots, deliver 12,000 units a year – BotQ’s factory will utilize vertical integration and advanced software systems like MES, PLM, and ERP to ensure high-quality, efficient production and management of humanoid robots.

Research

Measuring AI Ability to Complete Long Tasks – AI performance, measured by the length of tasks it can complete, has been exponentially increasing with a doubling time of around 7 months, suggesting that within a few years, AI could autonomously handle tasks currently requiring weeks of human effort.

EXAONE Deep: Reasoning Enhanced Language Models – EXAONE Deep models, developed by LG AI Research, are fine-tuned for enhanced reasoning tasks using techniques like Supervised Fine-Tuning, Direct Preference Optimization, and Online Reinforcement Learning, outperforming several existing models across different scales.

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers – Vamba, a hybrid Mamba-Transformer model, enhances hour-long video understanding by reducing computational complexity and memory usage through efficient modules like Mamba-2 blocks and cross-attention layers, achieving superior performance on benchmarks such as LVBench.

FlowTok: Flowing Seamlessly Across Text and Image Tokens – FlowTok introduces a streamlined framework for seamless flow matching between text and image tokens, achieving efficient and state-of-the-art multimodal generation without complex conditioning mechanisms.

CoRe^2: Collect, Reflect and Refine to Generate Better and Faster – CoRe^2 is a novel, plug-and-play sampling framework that enhances generative models’ performance by efficiently refining image quality and semantic faithfulness without being architecture-specific, achieving superior results across various benchmarks.

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification – Scaling up sampling-based search with random sampling and self-verification enhances model performance, revealing that larger response pools improve verification accuracy and highlighting the need for better out-of-box verification capabilities in frontier models.

Concerns

ChatGPT hit with privacy complaint over defamatory hallucinations – OpenAI faces a privacy complaint in Europe over ChatGPT’s generation of false and defamatory information, highlighting concerns about compliance with GDPR’s accuracy requirements and the potential reputational damage caused by AI hallucinations.

Policy

Ben Stiller, Mark Ruffalo and More Than 400 Hollywood Names Urge Trump to Not Let AI Companies ‘Exploit’ Copyrighted Works – Hollywood creative leaders are urging the Trump administration to maintain strong copyright protections against AI companies like OpenAI and Google, which seek to use copyrighted works for AI training without permission or compensation.

A.I. Art Generated With Text Prompts Cannot Be Copyrighted, U.S. Rules – Art generated by artificial intelligence (A.I.) from a text prompt cannot be copyrighted even if an artist uses long, targeted inputs or creates multiple iterations of a work before they are satisfied with the final output, according to new guidance from the U.S. Copyright Office.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

New Xbox games launching this week, from May 19 through May 25 — Onimusha 2 remaster arrives

5 ways you can plug the widening AI skills gap at your business

I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

Gears of War: Reloaded — Release date, price, and everything you need to know

YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

New Xbox games launching this week, from May 19 through May 25 — Onimusha 2 remaster arrives

New Xbox games launching this week, from May 19 through May 25 — Onimusha 2 remaster arrives

Windows 11 KB5058411 install fails, File Explorer issues (May 2025 Update)

Microsoft Edge could integrate Phi-4 mini to enable “on device” AI on Windows 11

Last Week in AI #304 – OpenAI Audio, Ernie 4.5, Claude Websearch

Top News

OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever

Baidu launches two new versions of its AI model Ernie

Anthropic adds web search to its Claude chatbot

Meta AI is finally coming to the EU, but with limitations

Other News

Tools

Business

Research

Concerns

Policy

Markus Buehler receives 2025 Washington Award

LWiAI Podcast #201 – GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

CVE-2025-48127 – “App Cheap Push Notification Authorization Bypass”

CVE-2025-4463 – iSourcecode Gym Management System SQL Injection Vulnerability

wholesale hats | otto hat | bulk hats | wholesale caps

Smart Data & AI Summit Saudi Arabia 2024

Infortrend NAS CS4000U Storage Cost and Price in India – Affordable and Reliable

Advancements in Knowledge Distillation and Multi-Teacher Learning: Introducing AM-RADIO Framework

pfl-research: Simulation Framework for Accelerating Research in Private Federated Learning

Is that image real or AI? Now Adobe’s got an app for that – here’s how to use it

Last Week in AI #304 – OpenAI Audio, Ernie 4.5, Claude Websearch

Top News

Other News

Tools

Business

Research

Concerns

Policy

Related Posts