As generative AI adoption accelerates across enterprises, maintaining safe, responsible, and compliant AI interactions has never been more critical. Amazon…
Machine Learning
Retrieval-augmented generation (RAG) has emerged as a powerful paradigm for enhancing the capabilities of large language models (LLMs). By combining…
Today, we are excited to announce that the NeMo Retriever Llama3.2 Text Embedding and Reranking NVIDIA NIM microservices are available…
This post is cowritten with Abdullahi Olaoye, Akshit Arora and Eliuth Triana Isaza at NVIDIA. As enterprises continue to push…
VLMs have shown notable progress in perception-driven tasks such as visual question answering (VQA) and document-based visual reasoning. However, their…
Multimodal reasoning is an evolving field that integrates visual and textual data to enhance machine intelligence. Traditional artificial intelligence models…
Machine Translation (MT) has emerged as a critical component of Natural Language Processing, facilitating automatic text conversion between languages to…
Lowe’s, a leading home improvement retailer with 1,700 stores and 300,000 associates, is establishing itself as a pioneer in AI…
At NVIDIA GTC25, Gnani.ai experts unveiled groundbreaking advancements in voice AI, focusing on the development and deployment of Speech-to-Speech Foundation…
Reinforcement learning (RL) has become central to advancing Large Language Models (LLMs), empowering them with improved reasoning capabilities necessary for…
Large language models (LLMs) have revolutionized the field of natural language processing, enabling machines to understand and generate human-like text…
Optical Character Recognition (OCR) is a powerful technology that converts images of text into machine-readable content. With the growing need…
Machine learning has expanded beyond traditional Euclidean spaces in recent years, exploring representations in more complex geometric structures. Non-Euclidean representation…
Modern VLMs struggle with tasks requiring complex visual reasoning, where understanding an image alone is insufficient, and deeper interpretation is…
Stereo depth estimation plays a crucial role in computer vision by allowing machines to infer depth from two images. This…
Artificial Neural Networks (ANNs) have revolutionized computer vision with great performance, but their “black-box” nature creates significant challenges in domains…
In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and…
Residual transformations enhance the representational depth and expressive power of large language models (LLMs). However, applying static residual transformations across…
This study explores using embedding rank as an unsupervised evaluation metric for general-purpose speech encoders trained via self-supervised learning (SSL).…
This post was co-authored with Johann Wildgruber, Dr. Jens Kohl, Thilo Bindel, and Luisa-Sophie Gloger from BMW Group. The BMW Group—headquartered…