Last Week in AI #269: Better evals for multimodal AI, new OpenAI lawsuits, Meta's AI ads tool troubles, AI startups focus on enterprise, and more!

Top News

Vibe-Eval: A new open and hard evaluation suite for measuring progress of multimodal language models

Reka AI introduces Vibe-Eval, a new evaluation suite designed to measure the progress of multimodal language models. Researchers from the company have created a set of challenging prompts to test the capabilities of these models, particularly focusing on their inability to perform certain multimodal reasoning tasks. The evaluation setup is designed to be a more comprehensive alternative to the MMMU multiple choice benchmark, focusing on controlled and consistent experiments. The authors also discuss the challenges of creating hard prompts and the trade-offs between human and model-based automatic evaluation. They propose a lightweight automatic evaluation protocol based on Reka Core, which they found to strongly correlate with human judgement. With current models, the authors find that GPT-4V and Gemini 1.5 Pro are comparable and â€œare in a tier above all the other models.â€ The authors also share their findings on the difficulty of creating and evaluating hard prompts, and the phenomenon of inverse scaling, where larger models fail tasks that smaller models can complete.

Eight newspaper publishers sue OpenAI over copyright infringement

Eight U.S. newspaper publishers, all under the ownership of hedge fund Alden Global Capital, have filed a lawsuit against Microsoft and OpenAI, alleging copyright infringement. The publishers, which include the New York Daily News, the Chicago Tribune, and The Denver Post, among others, claim that OpenAI’s language models, GPT-2 and GPT-3, and Microsoft’s Copilot assistant have been using their copyrighted articles without permission or payment. The lawsuit also states that Microsoft uses information from their newspapers for the Bing search index, which informs answers in Copilot, but often fails to provide links back to the original articles. This lawsuit follows a similar one filed by The New York Times against OpenAI four months prior, and comes amidst OpenAI’s recent partnerships with media companies like Axel Springer and the Financial Times to improve its AI models.

Metaâ€™s â€˜set it and forget itâ€™ AI ad tools are misfiring and blowing through cash

Meta’s automated ad tool, Advantage Plus, has been overspending on ad budgets and failing to deliver sales, causing frustration among marketers and businesses. The tool, which was pitched as a more efficient alternative to manual ad campaigns, has been unpredictable, with costs per impression (CPMs) inflating significantly. Despite complaints and evidence of the tool’s malfunctioning, Meta insists that Advantage Plus is functioning as intended. The company’s lack of transparency and accountability has led to uncertainty and dissatisfaction among users, with some businesses completely halting their use of the tool. Despite these issues, Meta’s ad revenue for the first quarter amounted to $35.64 billion, a 27% increase from the same period in 2023.

The Unsexy Future of Generative AI Is Enterprise Apps

AI startups that initially garnered attention with innovative generative AI products are now shifting their focus towards enterprise customers to enhance revenue streams. Tome, a San Francisco startup offering presentation software infused with generative AI, faced revenue challenges despite substantial venture capital backing. To adapt, they downsized staff and repositioned their product towards enterprise clients, particularly targeting sales teams with a premium pricing model. This strategic pivot mirrors a broader trend among AI startups, including Perplexity and Sierra, which are refining their offerings for business applications to offset soaring cloud API costs and attract steady recurring revenue. The move towards enterprise solutions presents both opportunities and challenges, requiring startups to navigate issues of accuracy, privacy, and security while also contending with potential competition from larger AI companies like OpenAI. Despite these complexities, startups like Tome remain committed to serving specific customer needs within the enterprise market.

Eight newspaper publishers sue OpenAI over copyright infringement

Other News

Tools

China unveils Sora challenger able to produce videos from text similar to OpenAI tool, though much shorter – China has developed a new AI tool called Vidu, which can produce 16-second videos from text prompts, aiming to catch up with global AI leaders like OpenAI’s Sora.

Siri for iOS 18 to gain massive AI upgrade via Apple’s Ajax LLM – Apple is incorporating generative AI into its upcoming operating systems to enhance Siri, Safari, Spotlight Search, and Messages with features like text summarization, document analysis, and AI-enhanced search options, all while prioritizing user privacy and on-device processing.

Meeting Astribot: the AI Humanoid that Can Cook – Astribot S1, a new AI robot developed by Stardust, is a humanoid robot with a wheeled base that can perform complex tasks like cooking, folding clothes, and sorting items, and is designed to interact safely with humans and its environment.

Copilot Workspace is GitHubâ€™s take on AI-powered software engineering – GitHub introduces Copilot Workspace, an AI-powered dev environment that aims to reduce the friction for developers in getting started and collaborating on code, despite concerns about the quality and security of AI-generated code.

Nvidiaâ€™s AI chatbot now supports Googleâ€™s Gemma model, voice queries, and more – Nvidia updates its ChatRTX chatbot with new AI models, including Google’s Gemma, ChatGLM3, and OpenAI’s CLIP, to provide a powerful search tool for RTX GPU owners.

Anthropic launches a free Claude iOS app and Team, its first enterprise plan – Anthropic launches a free iOS app and Team enterprise plan for its Claude family of large language models, offering features like chat history synchronization, picture uploads, and access to all three versions of the Claude 3 AI model, with plans to add more collaboration features in the future.

Gemini shortcut coming to Chrome, mobile app expands language support – Chrome introduces @gemini shortcut for faster access to gemini.google.com, while the Gemini mobile app expands language support and availability in more countries.

â€˜I will never go backâ€™: Ontario family doctor says new AI notetaking saved her job – AI notetaking software has revolutionized the workload of family physicians, allowing them to spend less time on paperwork and more time with patients.

Business

Read the email to Satya Nadella and Bill Gates that shows Microsoft’s CTO was ‘very worried’ about Google’s AI progress in 2019 – Microsoft’s CTO expressed concern about Google’s AI capabilities in an email to Satya Nadella and Bill Gates, prompting Microsoft to invest in OpenAI.

Financial Times announces strategic partnership with OpenAI – Financial Times partners with OpenAI to enhance ChatGPT with FT journalism, improve AI models, and develop new AI products for readers.

Muskâ€™s China trip ends with Tesla-Baidu partnership for FSD launch, Bloomberg reports – Elon Musk’s surprise visit to China results in a partnership between Tesla and Baidu for the launch of Full Self Driving (FSD) in China, overcoming regulatory barriers and competition from local EV startups.

Eric Schmidt-backed Augment, a GitHub Copilot rival, launches out of stealth with $252M – AI-powered coding platform Augment, backed by Eric Schmidt, emerges with $252 million in funding, aiming to revolutionize the market for generative AI coding technologies and improve software quality and team productivity.

Cloud Computing Startup CoreWeave Nears $8.6 Billion in Funding – Cloud computing startup CoreWeave is close to securing $8.6 billion in funding, with a significant portion earmarked for advancing its position in the competitive artificial intelligence industry.

Microsoft Boosts Responsible AI Team From 350 to 400 Personnel – Microsoft expanded its responsible AI team from 350 to 400 personnel to ensure the safety of its artificial intelligence products.

Ads on Facebook, Instagram for explicit ‘AI girlfriends’ prompt Meta crackdown – Meta is cracking down on tens of thousands of racy ads for AI-generated “girlfriends” on its platforms, removing explicit content violating its policies.

AI Startup Anthropic Debuts Claude Chatbot as an iPhone App – Anthropic introduces its first smartphone app, Claude chatbot, signaling a push to make AI more accessible to users.

Friends From the Old Neighborhood Turn Rivals in Big Techâ€™s A.I. Race – Childhood friends from a rough London neighborhood become powerful tech executives leading rival companies in the race to develop artificial intelligence.

Unauthorized AI Voice Clones of Taylor Swift Face Removal From TikTok – AI-generated Taylor Swift voice clones on TikTok are facing removal due to an agreement with Universal Music Group NV.

JPMorgan Unveils IndexGPT in Next Wall Street Bid to Tap AI Boom – JPMorgan introduces IndexGPT, a series of investment baskets developed with OpenAI’s GPT-4 model, as part of their efforts to leverage AI advancements in finance.

Fei-Fei Li Computer Expert, Builds AI Startup To Create Advanced Artificial Intelligence – Fei-Fei Li, a prominent computer science expert, is building a startup that uses human-like visual data processing to create artificial intelligence capable of continuing reasoning, aiming to address the limitations of current AI technology.

Research

AI that determines risk of death helps save lives in hospital trial – AI system trained on heart’s electrical activity reduces deaths in high-risk patients by 31% in hospital trial, proving its potential to save lives.

With huge patient dataset, AI accurately predicts treatment outcomes – AI model accurately predicts treatment outcomes for preventing stroke in people with heart disease by emulating randomized clinical trials using a massive patient dataset, outperforming existing models and showing potential to accelerate clinical trials and personalize patient care.

OpenVoice V2: Evolving Multilingual Voice Cloning with Enhanced Style Control and Cross-Lingual Capabilities – OpenVoice V2 is a groundbreaking text-to-speech model that enables voice cloning across languages, offering enhanced style control and zero-shot cross-lingual capabilities, surpassing its predecessor and providing its source code and model weights for future research.

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models – Evaluating the quality of Large Language Models (LLMs) using a Panel of LLm evaluators (PoLL) composed of a larger number of smaller models outperforms a single large judge, exhibits less intra-model bias, and is over seven times less expensive.

Paint by Inpaint: Learning to Add Image Objects by Removing Them First – AI model learns to add image objects by first removing them, leveraging segmentation mask datasets and inpainting models to train a diffusion model that effectively adds objects into images based on textual instructions.

A Careful Examination of Large Language Model Performance on Grade School Arithmetic – Large language models’ success in mathematical reasoning benchmarks may be influenced by dataset contamination, as evidenced by accuracy drops and overfitting in Grade School Math 1000 evaluation.

Concerns

AI engineers report burnout, rushed rollouts as ‘rat race’ to stay competitive hits tech industry – AI engineers in the tech industry are experiencing burnout and rushed rollouts due to the intense competition and pressure to stay ahead in the generative AI race, leading to a lack of concern for real-world effects and ethical risks.

Flood of AI-Generated Submissions â€˜Final Strawâ€™ for Small 22-Year-Old Publisher – Small publisher closing after 22 years due to influx of AI-generated submissions and other challenges.

Policy

Japanâ€™s Kishida unveils a framework for global regulation of generative AI – Japan’s Prime Minister Fumio Kishida unveils an international framework for the regulation and use of generative AI, emphasizing the need to address the potential risks and promote cooperation for safe and trustworthy AI.

NIST launches a new platform to assess generative AI – NIST launches NIST GenAI to assess and address the growing issue of AI-generated misinformation and disinformation, with a focus on developing systems to detect deepfakes and promote information integrity

Microsoft bans U.S. police departments from using enterprise AI tool for facial recognition – Microsoft bans U.S. police from using enterprise AI tool for facial recognition, including real-time technology on mobile cameras, due to concerns about potential pitfalls and racial biases.

Google urges US to update immigration rules to attract more AI talent – Google urges US to update immigration rules to attract more AI talent, emphasizing the need for flexibility and faster updates to policies like Schedule A to meet the demand in technologies like AI and cybersecurity.

Analysis

Mysterious â€œgpt2-chatbotâ€ AI model appears suddenly, confuses experts – A mysterious new chatbot named “gpt2-chatbot” has appeared, sparking speculation that it could be a secret test version of OpenAI’s upcoming GPT-4.5 or GPT-5, but initial testing suggests it may not represent a significant leap beyond GPT-4.

Fun

Washed Out Shares New Song â€œThe Hardest Partâ€ And Video Made With OpenAIâ€™s Sora – Washed Out releases a new song and video made with OpenAI’s Sora, exploring the surreal and hallucinatory aspects of AI and its potential to supplement artists’ ideas.

The article introduces Vibe-Eval, a new evaluation suite designed to measure the progress of multimodal language models. The authors, who are AI experts, have created a set of challenging prompts to test the capabilities of these models, particularly focusing on their inability to perform certain tasks. The evaluation setup is designed to be a more comprehensive alternative to the MMMU multiple choice benchmark, focusing on controlled and consistent experiments. The authors also discuss the challenges of creating hard prompts and the trade-offs between human and model-based automatic evaluation. They propose a lightweight automatic evaluation protocol based on Reka Core, which they found to strongly correlate with human judgement. The authors also share their findings on the difficulty of creating and evaluating hard prompts, and the phenomenon of inverse scaling, where larger models fail tasks that smaller models can complete.

Other News

Tools

Business

Financial Times announces strategic partnership with OpenAI – Financial Times partners with OpenAI to enhance ChatGPT with FT journalism, improve AI models, and develop new AI products for readers.

Microsoft Boosts Responsible AI Team From 350 to 400 Personnel – Microsoft expanded its responsible AI team from 350 to 400 personnel to ensure the safety of its artificial intelligence products.

AI Startup Anthropic Debuts Claude Chatbot as an iPhone App – Anthropic introduces its first smartphone app, Claude chatbot, signaling a push to make AI more accessible to users.

Unauthorized AI Voice Clones of Taylor Swift Face Removal From TikTok – AI-generated Taylor Swift voice clones on TikTok are facing removal due to an agreement with Universal Music Group NV.

Flood of AI-Generated Submissions â€˜Final Strawâ€™ for Small 22-Year-Old Publisher – Small publisher closing after 22 years due to influx of AI-generated submissions and other challenges.

Research

Concerns

Policy

Analysis

Fun

Source: Read MoreÂ

Last Week in AI #269: Better evals for multimodal AI, new OpenAI lawsuits, Meta’s AI ads tool troubles, AI startups focus on enterprise, and more!

Top News

Other News

Tools

Business

Research

Concerns

Policy

Analysis

Fun

Other News

Tools

Business

Research

Concerns

Policy

Analysis

Fun

Related Posts