Last Week in AI #266: 2024 AI Index Report, Devin's misleading demo, Texas to use AI to grade exams, the future of robotics, and more!

Top News

AI Index Report

The AI Index Report 2024 highlights significant trends and milestones in the AI sector. It points out that while AI has overtaken humans in tasks such as image classification and language understanding, it falls short in more intricate areas like advanced math and commonsense reasoning. The report emphasizes the dominance of industry over academia in producing cutting-edge AI models, with industry leading in both model creation and investment, especially in the increasingly significant field of generative AI. Financially, the cost of training top-tier AI models has soared, with some models requiring investments of up to $191 million. The United States remains the principal contributor to AI innovation, significantly outperforming other regions like the EU and China. However, the report criticizes the lack of standardized evaluations for AI responsibility, which complicates comparisons of model limitations and risks. On a societal level, AI is seen as enhancing productivity and work quality, though its broader impacts are causing increased public concern and regulatory attention. Lastly, AI’s role in accelerating scientific progress continues to grow, demonstrating its transformative potential across various sectors.

Debunking Devin: “First AI Software Engineer” Upwork lie exposed!

The video critiques a claim made Cognition Labs, the company that claimed its â€œAI software engineerâ€, Devin, could autonomously complete freelance software engineering tasks on Upwork. The video argues that this claim is a lie – upon closer inspection of the Devin demo, one can see that Devin only partially completed a cherry-picked task, failed to address the customer’s actual requirements, and wrote nonsensical code. A series of strange, common-sense mistakes made by Devin brings its software engineering capabilities to question – itâ€™s obvious that the AI, at best, only has a superficial level of understanding of how to understand and write code. Most importantly, Cognition Labsâ€™ video demo did not actually show Devin â€œmake money on messy Upwork tasks.â€ The video concludes with a call for AI product companies and influencers to be transparent and truthful about AI capabilities, and for users to remain skeptical of hyped claims surrounding AI technology.

Texas is replacing thousands of human exam graders with AI

The Texas Education Agency (TEA) is implementing an AI-powered scoring system to grade the State of Texas Assessments of Academic Readiness (STAAR) exams. The system, which uses an LLM chatbot like ChatGPT, is expected to save $15-20 million annually by reducing the need for human graders. The AI system was trained using 3,000 exam responses that had already been graded by humans, and safety measures have been put in place, such as having a quarter of all AI-graded results rescored by humans. However, some educators have expressed concerns about the system after a significant increase in zero scores was observed during a limited trial in December 2023.

Is robotics about to have its own ChatGPT moment?

Is robotics about to experience its own ChatGPT moment, where a broadly capable AI system becomes widely deployed in real-world use cases? The Evans family, who have hosted household robots for years due to Henry’s condition, illustrate the slow but steady growth of home robotics. Progress was historically hindered by the limitations of robotics hardware and the difficulty of programming machines to adapt to unpredictable settings. However, AI has revolutionized this space in recent years. Instead of pre-programming robots for specific tasks, engineers are using deep learning, reinforcement learning, and generative AI to enable robots to learn new skills and adapt to their environments. Though robots are still clumsy and lack common sense, advancements are accelerating. Researchers believe building a general-purpose home robot is now within reach. This offers huge potential not only to assist people but also to mark a major milestone in achieving human-level machine intelligence.

Other News

Tools

Google announces the Cloud TPU v5p, its most powerful AI accelerator yet – Google announces the launch of its new Gemini large language model (LLM) and Cloud TPU v5p, an updated version of its Cloud TPU v5e, promising significantly faster performance and cost-effectiveness for training large language models.

Googleâ€™s Gemini Pro 1.5 enters public preview on Vertex AI – Googleâ€™s Gemini Pro 1.5, the most capable generative AI model, is now available in public preview on Vertex AI, offering a large context window for tasks such as analyzing code libraries, reasoning across lengthy documents, and holding long conversations with a chatbot.

Googleâ€™s Gemini 1.5 Pro can now hear – Google’s Gemini 1.5 Pro update allows the model to listen to audio files and extract information without relying on written transcripts, surpassing the performance of its predecessor and introducing new features like inpainting and outpainting for image generation.

Mistral AI Stuns With Surprise Launch of New Mixtral 8x22B Model – Mistral AI surprises with the launch of its new Mixtral 8x22B model, boasting 176 billion parameters and a context length of 65,000 tokens, expected to outperform its predecessors and revolutionize various industries.

Google announces Axion, its first custom Arm-based data center processor – Google Cloud announces its first custom-built Arm processor, Axion, which offers better performance and energy efficiency than competitors’ instances, with technical documentation and availability details to come later this year.

Introducing Shop the look: eBay curating personalized outfits with AI – eBay introduces ‘Shop the look’, a generative AI-powered feature that curates personalized outfits based on customers’ shopping history, offering an immersive and tailored fashion experience while also fostering a circular fashion economy.

AI editing tools are coming to all Google Photos users – Google Photos is making AI-powered editing tools available to all users, allowing them to enhance their pictures without pro-level editing skills.

Meta Employs New On-Device AI to Blur Nude Photos – Meta introduces new on-device AI to automatically blur and detect nude photos on Instagram, with additional features to protect against sextortion and provide safety reminders for users.

Business

AI-Music Arms Race: Meet Udio, the Other ChatGPT for Music – AI models capable of generating high-fidelity songs from text prompts are in an arms race, with Udio being a new competitor that produces music comparable to Suno, but with potentially crisper sound, and aims to provide a tool for musicians to create and profit from their music.

Intel unveils latest AI chip as Nvidia competition heats up – Intel unveils its latest AI chip, Gaudi 3, to compete with Nvidia’s dominance in the AI chip market, offering improved power efficiency and speed for training and deploying big AI models.

Meta unveils its newest custom AI chip as it races to catch up – Meta unveils its newest custom AI chip, the “next-gen” Meta Training and Inference Accelerator (MTIA), which is designed to complement GPUs and deliver better performance, as the company races to catch up with its rivals in the generative AI space.

Nvidia and Georgia Tech announce first AI supercomputer for students – Nvidia and Georgia Tech have collaborated to introduce the first AI supercomputer for student use, aiming to democratize access to supercomputing resources and train the next-generation workforce on AI.

Microsoft AI gets a new London hub fronted by former Inflection and Deepmind scientist Jordan Hoffmann – Microsoft announces new London hub for consumer AI division, led by former Inflection and DeepMind scientist Jordan Hoffmann, as part of a larger investment in the U.K.’s AI economy.

Waymo will launch paid robotaxi service in Los Angeles on Wednesday – Waymo is launching a paid robotaxi service in Los Angeles, despite pushback from unions and local officials, as the company seeks to expand its autonomous taxi operations.

Kaiser Permanente uses AI to redirect ‘simple’ patient messages from physician inboxes – AI categorization of patient messages at Kaiser Permanente helps reduce physician workload by diverting simple inquiries to a regional team, allowing doctors to focus on more complex patient care.

Collaborative Robotics raises $100 mln amid robots funding boom – U.S. startup Collaborative Robotics (Cobot) raises $100 million in a Series B funding round, led by General Catalyst, to develop general-purpose robots with AI capabilities for various industries.

Adobe Is Buying Videos for $3 Per Minute to Build AI Model – Adobe is purchasing videos at $3 per minute to build an AI text-to-video generator, aiming to catch up to competitors like OpenAI.

Now Hiring: Sophisticated (but Part-Time) Chatbot Tutors – Training artificial intelligence models for a website called Data Annotation Tech has become a lucrative side hustle for individuals like Chelsea Becker, who can earn over $10,000 in a few months by interacting with A.I.-powered chatbots.

Archetype AI Introduces Foundation Model to Pioneer Physical AI – Archetype AI introduces Newton, a foundation model that understands the physical world by fusing multimodal temporal data and natural language, aiming to solve real-world problems and empower organizations with a new level of understanding.

Axios Sees A.I. Coming, and Shifts Its Strategy – Axios CEO believes AI will transform media and emphasizes the need for trusted content and human connection to survive.

Research

AI makes retinal imaging 100 times faster, compared to manual method – AI significantly improves retinal imaging, making it 100 times faster and enhancing image contrast, which could revolutionize the diagnosis and treatment of retinal diseases.

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models – Transformers in language models can learn to dynamically allocate FLOPs to specific positions in a sequence, optimizing the allocation along the sequence for different layers across the model depth, resulting in efficient compute expenditure.

RULER: What’s the Real Context Size of Your Long-Context Language Models? – Long-context language models are evaluated using the RULER benchmark, which expands upon the NIAH test to assess performance across diverse types and quantities of needles, revealing significant performance drops as context length increases.

Concerns

Teen Girls Confront an Epidemic of Deepfake Nudes in Schools – Teen girls at Westfield High School discovered that boys in their class used AI to create and circulate fake sexually explicit images of them, prompting concerns about the school’s response and policies regarding exploitative AI use.

How Tech Giants Cut Corners to Harvest Data for A.I. – Tech giants are using innovative methods, such as creating a speech recognition tool to transcribe YouTube videos, to harvest more data for training their AI systems, despite potential conflicts with platform rules.

A.I. Made These Movies Sharper. Critics Say It Ruined Them. – AI has been used to remove imperfections from classic films, leading to debate over whether the enhanced versions are an improvement or a distortion of the original.

Early Reviews of Humane AI Pin Arenâ€™t Impressed – Early reviews of the long-hyped Humane AI Pin are unimpressed, citing its slow response, lack of features, and occasional misinformation, leading to concerns about its high price and incomplete state.

Policy

Trudeau Unveils $1.8 Billion Plan to Boost AI Sector in Canada – Canada unveils a $1.8 billion plan to boost its artificial intelligence sector, including funding for computing capabilities and a new AI safety institute.

Analysis

I tried two demos of machine learning AI NPCs, and they didn’t convince me AI will lead to anything that immersive sims like Deus Ex haven’t already done better – AI-powered NPCs in video games, while impressive, may not offer the same level of immersive and meaningful interactions as the meticulously crafted world of immersive sims like Deus Ex.

Expert Opinions

What War by A.I. Actually Looks Like – The use of artificial intelligence in military operations, including autonomous weapons systems, is already a reality and poses significant ethical and strategic challenges.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Intel’s new CEO requests “brutal honesty” from partners in his first keynote speech — Determined to build a “world-class” foundry

Xbox fans, I wasn’t ready for $80 games, but Nintendo Switch 2’s Mario Kart World just set the tone

The Nintendo Switch 2 has game sharing and a camera — sound familiar?

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Perficient Included in IDC Market Glance: Payer, 1Q25

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Intel’s new CEO requests “brutal honesty” from partners in his first keynote speech — Determined to build a “world-class” foundry

Xbox fans, I wasn’t ready for $80 games, but Nintendo Switch 2’s Mario Kart World just set the tone

Last Week in AI #266: 2024 AI Index Report, Devin’s misleading demo, Texas to use AI to grade exams, the future of robotics, and more!

Top News

AI Index Report

Debunking Devin: “First AI Software Engineer” Upwork lie exposed!

Texas is replacing thousands of human exam graders with AI

Is robotics about to have its own ChatGPT moment?

Other News

Tools

Business

Research

Concerns

Policy

Analysis

Expert Opinions

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

20+ Best Free Halftone Photoshop Brush Sets for Digital Artists

Fota Wildlife Park Confirms Cyberattack, Investigates Data Exposure

SPIEF 2024 Allegedly Endures Cyberattack by IT Army of Ukraine

What is DISMTools, and how do you get started? Windows 11 (and 10) image GUI manager explained.

Researchers Uncover UEFI Vulnerability Affecting Multiple Intel CPUs

TOYOTA AVALON VS CAMRY: WHICH SEDAN WINS?

Perficient is a 2024 Top Workplace in Dallas for the 4th consecutive year!

How DJI’s affordable new goggles can transform your drone flights

Last Week in AI #266: 2024 AI Index Report, Devin’s misleading demo, Texas to use AI to grade exams, the future of robotics, and more!

Top News

Other News

Tools

Business

Research

Concerns

Policy

Analysis

Expert Opinions

Related Posts