Top News
TII Unveils Falcon Mamba 7B, A New Open-Source State Space Language Model
The Technology Innovation Institute (TII) has introduced Falcon Mamba 7B, a new large language model that uses a State Space Language Model (SSLM) architecture, marking a shift from traditional transformer-based designs. SSLMs, a newer approach to natural language processing, can process longer text sequences more efficiently, require less memory, and maintain consistent performance regardless of input size. Falcon Mamba 7B has been independently verified by Hugging Face as the top-performing open-source SSLM globally, outperforming established transformer-based models in benchmark tests. Available on the Hugging Face platform, Falcon Mamba 7B represents a significant step in language model development, potentially offering more efficient and capable AI systems for a wide range of text-based tasks.
Introducing Falcon Mamba 7B: A New Open-Source Model for Text Generation Tasks
Figure’s new humanoid robot leverages OpenAI for natural speech conversations
Figure has introduced its latest humanoid robot, Figure 02, which is designed to work alongside humans in a factory setting. The robot, developed in partnership with OpenAI, is equipped with speakers and microphones for natural language conversations, making it easier for humans to instruct the robot and understand its actions. The robot also features six RGB cameras, an onboard visual language model, improved CPU/GPU computing, and improved hands with 16 degrees of freedom. Figure has begun pilot programs with BMW, and while there is no timeline for a wider rollout, the company hints at future applications beyond the factory floor, including in the home.
BMW tests Figure 02 humanoid on production line – BMW Group tests humanoid robots in production for the first time, successfully fitting sheet metal parts into precise fittings during a two-week pilot at a plant in Spartanburg, South Carolina.
AI-driven technique can generate quality 3D assets from 2D images ‘in seconds’ — VFusion3D aims to transform VR, gaming, and digital design
A research paper by scientists from Meta and Oxford University introduces VFusion3D, an AI-driven technique capable of generating high-quality 3D models from 2D images in seconds. The technology, which could revolutionize the gaming, VR, and design industries, is trained on text, images, and videos, rather than existing 3D models. The VFusion3D pipeline uses a small amount of 3D data to fine-tune a video diffusion model, with videos providing various angles of an object for accurate 3D reproductions. The team used a video model called EMU Video, trained with a variety of videos, to create VFusion3D, which can generate 3D assets from a single image, regardless of the viewing angle. The researchers have also compared VFusion3D’s performance and quality against other 3D generative models.
Artists’ lawsuit against generative AI makers can go forward, judge says
A class action lawsuit against AI companies Stability, Runway, and DeviantArt, filed by artists alleging copyright infringement, has been partially approved to proceed by a judge. The lawsuit accuses these companies of illegally training their AI systems on copyrighted works. While some of the plaintiffs’ claims were dismissed, others were allowed to continue, potentially leading to a trial. This development is unfavorable for the AI companies, as even a victory would entail a costly and lengthy legal process, and it highlights a broader issue of copyright claims faced by many companies in the AI industry.
Other News
Tools
OpenAI Launches Structured Outputs, Slashes Prices for GPT-4o – OpenAI introduces Structured Outputs, a new API feature ensuring adherence to developer-supplied JSON schemas in model responses, along with price reductions for the latest GPT-4o model.
Reddit to test AI-powered search result pages – Reddit plans to test AI-powered search result pages to provide AI-generated summaries at the top of search results, aiming to help users “dive deeper” into content and discover new Reddit communities.
Good news — your Google Meet call will soon be able to take notes for you – Google Meet will soon have an AI-powered tool that automatically takes notes during meetings, aiming to improve productivity and efficiency.
Audible is testing an AI-powered search feature – Audible is testing an AI-powered search feature called “Maven” to provide tailored audiobook recommendations based on users’ specific requests.
Shadows of Doubt, the procgen private-eye immersive sim, is leaving early access next month – Shadows of Doubt, a cyberpunk detective simulation game, is leaving early access after 17 months of improvements and additions, offering procedurally generated cities and intricate crime-solving puzzles.
Sakana AI Releases AI Scientist which Writes Scientific Papers for $15 – Sakana AI introduces ‘The AI Scientist,’ an innovative system that enables LLMs to autonomously conduct scientific research and write papers for less than $15.
Business
China’s autonomous vehicle startup WeRide seeks US IPO at $5B valuation – WeRide, a Chinese autonomous vehicle company, is seeking a $5.02 billion valuation in its U.S. IPO, aiming to raise about $96 million from the offering and attracting investments from various sources.
AI Startup ProRata Inks Major Media Partnerships With Promise of Compensation and Accurate Attribution – AI startup ProRata.ai secures major media partnerships, promising accurate attribution and revenue sharing with publishers, aiming to revolutionize the industry standard for content compensation and ethical AI use.
Sonova launches hearing aid with real-time AI, first in market – Sonova launches the first hearing aid with real-time AI to improve speech clarity from background noise, introducing a new platform and product that is expected to accelerate growth in the market.
JPMorgan Chase is giving its employees an AI assistant powered by ChatGPT maker OpenAI – JPMorgan Chase has rolled out a generative AI assistant to tens of thousands of its employees, designed to be as ubiquitous as Zoom, and is using the technology for tasks like writing emails, creating marketing content, and preventing fraud.
Waymo Is Unleashing Robotaxis on Bay Area Freeways This Week – Waymo is set to test its self-driving vehicles on Bay Area freeways, with plans to eventually expand to other cities, despite some challenges and controversies.
Anthropic and Caylent partner to slash AI deployment times in half – Anthropic and Caylent partner to accelerate AI deployment, aiming to cut implementation times in half and reshape the enterprise AI landscape by leveraging Caylent’s cloud expertise and Anthropic’s cutting-edge AI models.
Britain cancels $1.7 billion of computing projects in setback for global AI ambitions – Britain cancels $1.7 billion of computing projects, including AI infrastructure initiatives, in a setback for its global AI ambitions.
As Alexa turns 10, Amazon looks to generative AI – Amazon is shifting its focus to generative AI as it faces significant financial losses from its Alexa division, aiming to improve the assistant’s conversational skills and revitalize customer interest.
Research
Body Transformer: Leveraging Robot Embodiment for Policy Learning – Leveraging the robot embodiment, the Body Transformer (BoT) architecture outperforms vanilla transformers and multilayer perceptrons in robot learning tasks.
New supercomputing network could lead to AGI, scientists hope, with 1st node coming online within weeks – New supercomputing network aims to accelerate the development of artificial general intelligence (AGI) through a worldwide network of powerful computers, with the first node set to come online in September.
Trinity-2-Codestral-22B and Tess-3-Mistral-Large-2-123B Released: Pioneering Open Source Advances in Computational Power and AI Integration – Migel Tissera has recently unveiled two groundbreaking projects on Hugging Face: Trinity-2-Codestral-22B and Tess-3-Mistral-Large-2-123B. These projects represent a leap forward in advanced computational systems and AI-driven technologies.
Language Model Can Listen While Speaking – A language model has been developed to enable real-time interaction in speech-based conversational AI, allowing for interruption and turn-taking in spoken scenarios.
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases – Bridging Large Language Models and Code Repositories via Code Graph Databases, CodexGraph integrates LLM agents with graph database interfaces to enable precise context retrieval and code navigation, demonstrating competitive performance in both academic and real-world environments.
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters – Scaling test-time compute optimally can significantly improve the performance of language models, with implications for LLM pretraining and the tradeoff between inference-time and pre-training compute.
ControlNeXt: Powerful and Efficient Control for Image and Video Generation – A new method called ControlNeXt is proposed for controllable image and video generation, offering powerful and efficient control with reduced computational resources and improved training stability.
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents – Introducing VisualAgentBench, a benchmark designed to train and evaluate Large Multimodal Models as visual foundation agents across diverse scenarios, showcasing their considerable yet developing capabilities.
VITA: Towards Open-Source Interactive Omni Multimodal LLM – VITA is the first open-source Multimodal Large Language Model (MLLM) with advanced capabilities in processing and analyzing Video, Image, Text, and Audio modalities, as well as providing a strong multimodal interactive experience.
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities – A stateful, conversational, interactive evaluation benchmark for LLM tool use capabilities is discussed, along with recommendations for similar papers and a call for feedback.
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling – Visual reasoning in AI requires rethinking beyond scaling, as scaling training data or model size may not significantly improve reasoning or relations, and more precise interventions such as data quality or tailored-learning objectives offer more promise.
BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion – BRAT introduces a new method for textual inversion using bonus tokens and a vision transformer, improving adherence to source images and prompts without relying on the UNet.
Agent Q – Agent Q combines guided Monte Carlo Tree Search and AI self-critique with iterative fine-tuning, leveraging reinforcement learning for human feedback methods like the Direct Preference Optimization algorithm to enhance generalization capabilities in multi-step reasoning tasks.
Concerns
OpenAI is worried that ChatGPT-4o users are developing feelings for the chatbot – OpenAI is concerned about users developing emotional attachments to the GPT-4o chatbot, warning of potential negative impacts on human interactions and the potential for misuse.
Generative AI’s Slop Era – Tech companies are racing to replace traditional search engines with generative AI bots, but concerns arise about the use of copyrighted material and the potential for false information.
Policy
The FCC wants the AI voice calling you to say it’s a deepfake – FCC proposes regulations to combat AI-generated robocalls by requiring disclosure of AI-generated voices and words, aiming to protect consumers from falling victim to AI-generated scams.
Colorado schools have AI roadmap to guide students and teachers into brave new world – Colorado Education Initiative releases AI roadmap to help integrate AI into education policy and curriculums, focusing on teaching and learning, advancing equity, and developing policy for transparent and ethical use.
Analysis
Can You Be Emotionally Reliant on an A.I. Voice? OpenAI Says Yes. – OpenAI’s report reveals the potential for users to form an emotional reliance on its new, humanlike voice mode, featured in the popular artificial intelligence chatbot, ChatGPT.
Expert Opinions
Replika CEO Eugenia Kuyda says it’s okay if we end up marrying AI chatbots – Replika CEO Eugenia Kuyda discusses the company’s AI companion app, its multimodal features, and the decision-making process behind adding and removing certain conversation capabilities, as well as addressing concerns from regulators and the public.
Here’s how people are actually using AI – We’re seeing a giant, real-world experiment unfold, and it’s still uncertain what impact these AI companions will have either on us individually or on society as a whole, argue Robert Mahari, a joint JD-PhD candidate at the MIT Media Lab and Harvard Law School, and Pat Pataranutaporn, a research
Source: Read MoreÂ