Last Week in AI #290 – OpenAI’s massive VC round and DevDay, Flux 1.1, NotebookLM

Top News

OpenAI closes the largest VC round of all time

OpenAI has raised $6.6 billion in a new funding round, led by Thrive Capital, valuing the company at $157 billion. Major investors include Microsoft, Khosla Ventures, Nvidia, and SoftBank. Microsoft contributed $750 million on top of its previous $13 billion investment. The funding will drive AI research and expand OpenAIâ€™s computing capacity as the company builds on its success with tools like ChatGPT, which has gained 250 million weekly active users.

This deal positions OpenAI among the largest venture-backed startups, alongside SpaceX and ByteDance, and highlights the tech industry’s belief in AI’s potential. OpenAI is also exploring a transition from a nonprofit to a for-profit structure, potentially awarding CEO Sam Altman equity. The company faces growing competition from rivals such as Google and Amazon, but has discouraged investors from supporting competitors like Anthropic and xAI.

OpenAIâ€™s DevDay brings Realtime API and other treats for AI app developers

Source

OpenAI has announced several new tools at its 2024 DevDay, including a public beta of its “Realtime API” for building apps with low-latency, AI-generated voice responses. The Realtime API allows developers to create nearly real-time, speech-to-speech experiences in their apps, with six distinct voices provided by OpenAI. The company also introduced vision fine-tuning in its API, enabling developers to use images and text to fine-tune their applications of GPT-4o, and a model distillation feature to improve the performance of smaller AI models. Despite these advancements, OpenAI did not announce any new AI models during DevDay this year, and developers awaiting the release of OpenAI o1 or the video generation model, Sora, will have to wait longer.

People are using Google study software to make AI podcastsâ€”and theyâ€™re weird and amazing

Google’s study software, NotebookLM, is being utilized by users to create AI-generated podcasts, with the tool generating a podcast called Deep Dive that features realistic male and female voices discussing uploaded content. The AI system is designed to create engaging audio in an “upbeat, hyper-interested tone,” according to Raiza Martin, the product lead for NotebookLM. The company is now working on adding more customization options, such as changing the length, format, voices, and languages. Despite its success, the tool is not immune from issues that affect generative AI, such as hallucinations and bias. Notably, Andrej Karpathy, a member of OpenAIâ€™s founding team and former director of AI at Tesla, used NotebookLM to create his own AI podcast series, Histories of Mysteries, in just two hours.

Black Forest Labs releases Flux 1.1 Pro and an API

Black Forest Labs (BFL), the startup behind the Stable Diffusion AI image generation model, has released a new, faster text-to-image model called Flux 1.1 Pro, along with a paid application programming interface (API). The new model, which is six times faster than its predecessor, Flux 1.0 Pro, improves image quality, prompt adherence, and diversity, and is available through partners like together.ai, Replicate, fal.ai, and Freepik. The BFL API allows developers to integrate the company’s generative capabilities into their own applications, with pricing starting at 4 cents per image. This release comes amid a contentious legal landscape, with generative AI companies like Stability AI and Midjourney facing lawsuits over their training datasets, and positions BFL as a major player in the AI-driven media space.

Other News

Tools

OpenAI launches new â€˜Canvasâ€™ ChatGPT interface tailored to writing and coding projects – The interface opens a separate workspace window where users can generate writing or code, and then select sections for the AI model to edit. Canvas is currently in beta and is being rolled out to ChatGPT Plus and Teams users, with Enterprise and Edu users to follow.

MicrosoftÂ gives Copilot a voice and vision in its biggest redesign yet – Microsoft’s Copilot AI assistant is undergoing a major redesign, adding voice and vision capabilities to create a more personalized and interactive experience.

Meta announces Movie Gen, an AI-powered video generator – Meta announces Movie Gen, an AI-powered video generator that produces high-definition footage complete with sound and can edit existing footage or still images using text inputs.

Pika 1.5 is now live â€” AI video generator just got major upgrades – Pika Labs has launched Pika 1.5, an advanced AI video generation model with a strong focus on hyper-realism, offering lifelike human and creature movements and sophisticated camera techniques to enhance video creation.

Google Photos is rolling out AI-powered search now â€“ and it could be its biggest upgrade in years – Google Photos is rolling out a Gemini-powered upgrade that allows users to search their photo library with natural language questions, potentially replacing traditional search methods and raising privacy concerns.

Pinterest rolls out genAI tools for product imagery to advertisers – Pinterest introduces genAI tools for advertisers to enhance product imagery, attract more clicks, and create campaigns with less input, resulting in higher clickthrough rates and lower cost-per-click.

Googleâ€™s Visual Search Can Now Answer Even More Complex Questions – Google’s visual search tool, Google Lens, has evolved to support multimodal searches, expanded shopping features, and real-time video capture, potentially paving the way for a new kind of smart glasses.

Hacking Generative AI for Fun and Profit – Generative AI can be powerful for prototyping new tools, as demonstrated by a group at Sundai Club using it to build a tool to help journalists identify interesting research papers.

Business

What the Heck Is Going On At OpenAI? – Executives leaving OpenAI express concerns about the company’s accelerationist approach to AI development and its shift to a for-profit model, raising worries about safety and control of artificial general intelligence.

Anthropic hires OpenAI co-founder Durk Kingma – Durk Kingma, co-founder of OpenAI, has announced his move to Anthropic, expressing excitement to contribute to the development of responsible AI systems.

Cerebras, an A.I. Chipmaker Trying to Take On Nvidia, Files for an I.P.O. – Cerebras, a chip company, is set to debut on the stock market, aiming to challenge Nvidia in the artificial intelligence industry.

AI coding startup Poolside raises $500M from eBay, Nvidia and others – AI coding startup Poolside raises $500M from eBay, Nvidia, and others, bringing its total raised to $626 million and its valuation to $3 billion, with plans to use the funding to train future AI models and bolster go-to-market and R&D efforts.

We have to build a very strong revenue engine’: Poolside AI gets ready to release product after a year of secrecy – AI startup Poolside, after securing significant funding, is preparing to release its GenAI model to the market and focus on building a strong revenue engine, while also expanding its team across Europe and the US.

Y Combinator is being criticized after it backed an AI startup that admits it basically cloned another AI startup – Y Combinator-backed AI startup PearAI faces criticism for cloning another AI project and initially using a closed license, sparking controversy over open source principles and YC’s selection process.

The Race to Block OpenAIâ€™s Scraping Bots Is Slowing Down – AI companies are striking deals with publishers to prevent their web crawlers from being blocked, with OpenAI scoring a clear win as its crawlers are no longer getting blocked at the rate they once were.

Waymo to add Hyundai EVs to robotaxi fleet under new multiyear deal – Waymo and Hyundai have formed a strategic partnership, with Waymo adding Hyundai’s Ioniq 5 electric vehicle to its robotaxi fleet and integrating its autonomous technology, with initial on-road testing to begin by late 2025.

Waymo hires Teslaâ€™s head of vehicle programs ahead of Robotaxi unveiling – Waymo hires Teslaâ€™s former head of vehicle programs, Daniel Ho, just ahead of Teslaâ€™s Robotaxi unveiling, as Ho joins Waymo to accelerate autonomous vehicle technology.

Character.ai leaves LLM building behind due to expense – Character.ai shifts focus from building larger LLMs to enhancing consumer products.

India to fabricate its first chip in two years as Nvidia, AMD and Micron pledge to expand to the country – Goyal added that Indian behemoth Tata and other domestic companies are working to make India’s semiconductor dream a reality.

Research

AI simulation gives people a glimpse of their potential future self – AI simulation allows users to interact with a virtual version of their future selves, providing insights and advice based on the user’s input, ultimately helping to improve their sense of future self-continuity and reduce anxiety about the future.

Apple releases Depth Pro, an AI model that rewrites the rules of 3D vision – Apple’s new AI model, Depth Pro, revolutionizes 3D vision by generating high-resolution depth maps from single 2D images in just 0.3 seconds, without relying on camera data, and excels in accuracy and detail, making it versatile for applications like augmented reality and autonomous vehicles.

â€˜In aweâ€™: scientists impressed by latest ChatGPT model o1 – OpenAI’s latest large language model, OpenAI o1, impresses scientists with its detailed and coherent responses in the field of quantum physics, its ability to beat PhD-level scholars in challenging tests, and its potential to accelerate scientific research.

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions – EMOVA empowers large language models with end-to-end speech capabilities and achieves state-of-the-art performance on vision-language and speech benchmarks while supporting omni-modal spoken dialogue with vivid emotions.

Were RNNs All We Needed? – Decade-old RNNs like LSTMs and GRUs can be efficiently trained in parallel by removing their hidden state dependencies, leading to minimal versions that achieve comparable performance to recent sequence models.

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models – Large language models struggle to generate responses of specific lengths, but a new model-agnostic approach called Ruler, using Meta Length Tokens, enhances their ability to adhere to length constraints, demonstrating versatility and generalization.

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning – New family of multimodal large language models (MLLMs) MM1.5 is designed to enhance capabilities in text-rich image understanding, visual referring, and multi-image reasoning through careful data curation and training strategies.

MIO: A Foundation Model on Multimodal Tokens – Introducing MIO, a foundation model capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner, showcasing competitive and superior performance compared to previous baselines.

Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation – Flex3D is a two-stage framework that leverages a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, curating high-quality and reliable views for 3D reconstruction, and employing a Flexible Reconstruction Model (FlexRM) to achieve superior performance in 3D generation tasks.

Concerns

AI Safety Culture Confronts Capitalism – AI Safety culture confronts capitalism as leading AI labs grapple with the challenge of prioritizing safety over profit in the development and deployment of AI systems, leading to cultural clashes, high-profile exits, and regulatory debates.

OpenAIâ€™s newest creation is raising shock, alarm, and horror among staffers: a new logo – OpenAI’s staff is shocked and alarmed by the proposed new logo, preferring to keep the current hexagonal flower symbol, as the company undergoes a redesign and corporate restructuring.

Policy

A Narrow Path – To ensure the safe development of artificial intelligence, “A Narrow Path” proposes a three-phase approach involving international governance, safety measures, and scientific understanding, aiming to prevent the emergence of superintelligent AI within the next two decades.

Judge blocks Californiaâ€™s new AI law in case over Kamala Harris deepfake Musk reposted – California’s new AI law, AB 2839, which aimed to target the spread of AI deepfakes on social media, has been temporarily blocked by a federal judge due to concerns about its broad and potentially unconstitutional nature.

Artificial intelligence presented in California does not come to European soil: severely limited options for Google Pixel 9 and iPhone 16 – European legislation severely limits the advanced AI features of the iPhone 16 and Google Pixel 9, restricting access to personal data and hindering functions like Siri Suggestions and Google Assistant Gemini, making it questionable whether these devices are worth investing in for European users.

Analysis

What Kind of Writer Is ChatGPT? – Using ChatGPT for writing assistance raises questions about originality and plagiarism, as it serves more as a sounding board than a perfect plagiarism tool, allowing writers to explore ideas and sharpen prose.

Source: Read MoreÂ

CodeSOD: Enterprise Code Coverage

CodeSOD: Ready Xor Not

CodeSOD: A Set of Mistakes

CodeSOD: While This Works

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

I tried an ultra-thin iPhone case, and here’s how my daunting experience went

I found one of the fastest-charging portable batteries for home backups – and it’s on sale

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PECL Releases (12.17.2024)

Windows 11’s Microsoft 365 app is taking a new AI-first approach with Copilot

Windows 11’s Microsoft 365 app is taking a new AI-first approach with Copilot

5 Compelling Reasons to Choose Linux Over Windows

Rilasciato DXVK 2.5.2: Ottimizzazioni e Correzioni per i Giochi Windows su GNU/Linux

Last Week in AI #290 – OpenAI’s massive VC round and DevDay, Flux 1.1, NotebookLM

Top News

OpenAI closes the largest VC round of all time

OpenAIâ€™s DevDay brings Realtime API and other treats for AI app developers

People are using Google study software to make AI podcastsâ€”and theyâ€™re weird and amazing

Black Forest Labs releases Flux 1.1 Pro and an API

Other News

Tools

Business

Research

Concerns

Policy

Analysis

Why developers needn’t fear CSS – with the King of CSS himself Kevin Powell [Podcast #154]

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

Researchers from Stanford and Duolingo Demonstrate Effective Strategies for Generating at a Desired Proficiency Level Using Proprietary Models such as GPT4 and Open-Source Techniques

The Green Galaxy Above the Ceiling

Is Reflection 70B the most powerful open-source LLM or a scam?

LinkedIn’s new search filter aims to protect you from suspicious job postings

Boost productivity with video conferencing transcripts and summaries with the Amazon Chime SDK Meeting Summarizer solution

CodeSOD: Looks Guid to Me

From Compliance to Competitive Advantage: How SaaS Application Security Testing Boosts Market PositionÂ

The last day of ’12 days of OpenAI’ is expected to bring biggest drop yet

Last Week in AI #290 – OpenAI’s massive VC round and DevDay, Flux 1.1, NotebookLM

Top News

Other News

Tools

Business

Research

Concerns

Policy

Analysis

Related Posts