MixedBread AI Introduces Binary MRL: A Novel Embeddings Compression Method, Making Vector Search Scalable and Enable Embeddings-based Applications

Mixedbread.ai recently introduced Binary MRL, a 64-byte embedding to address the challenge of scaling embeddings in natural language processing (NLP) applications due to their memory-intensive nature. In natural language processing (NLP), embeddings play a vital role in various tasks, such as recommendation systems, retrieval, and similarity search. However, the memory requirements of embeddings pose a significant challenge, particularly when dealing with massive datasets. The method aims to find a way to decrease the memory use for embeddings while maintaining their utility and effectiveness in NLP applications.

Currently, state-of-the-art models produce embeddings with high dimensions (e.g., 1024 dimensions), encoded in float32 format, requiring large memory for storage and retrieval. To address these limitations, researchers at mixedbread.ai have found two main approaches: Matryoshka Representation Learning (MRL) and Vector Quantization. MRL focuses on reducing the number of output dimensions of an embedding model while preserving accuracy. This is done by putting more important data in the earlier dimensions of the embedding, which lets the less important dimensions be cut off. On the other hand, Vector Quantization aims to reduce the size of each dimension by representing them as binary values instead of floating-point numbers.Â

The proposed approach, Binary MRL, combines both methods to achieve simultaneous dimensionality reduction and compression of embeddings. By integrating MRL and Vector Quantization, Binary MRL aims to retain the semantic information encoded in embeddings while significantly reducing their memory footprint.

Binary MRL achieves compression by first reducing the number of output dimensions of the embedding model using MRL techniques. This involves training the model to preserve important information in fewer dimensions, thereby allowing for the truncation of less relevant dimensions. Then, Vector Quantization is used to show each dimension of the reduced-dimensional embedding as a binary value. This binary representation significantly reduces the memory usage of embeddings while retaining semantic information. The evaluation of Binary MRL on various datasets demonstrates that the method can achieve over 90% of the performance of the original model while using significantly smaller embeddings.

In conclusion, Binary MRL represents a novel approach to addressing the scalability challenges of embeddings in NLP applications. By combining techniques from MRL and Vector Quantization, Binary MRL achieves significant compression of embeddings while preserving their utility and effectiveness. Not only does this method reduce the costs of large-scale retrieval, but it also makes new tasks possible that were not possible before because of memory limits.

Follow-up on binary embeddings: 64 bytes per embedding, yee-haw

Reduces memory usage of our embedding model by more than 98% (64x) while retaining over 90% of model performance with binary

Model: https://t.co/ZlbEJf3DKi
Blog: https://t.co/ZaalEm0U92

â€” mixedbreadai (@mixedbreadai) April 12, 2024

The post MixedBread AI Introduces Binary MRL: A Novel Embeddings Compression Method, Making Vector Search Scalable and Enable Embeddings-based Applications appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

MixedBread AI Introduces Binary MRL: A Novel Embeddings Compression Method, Making Vector Search Scalable and Enable Embeddings-based Applications

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

Women Take Center Stage at World CyberCon: Panel Explores AI for Threat Detection

Understanding Designersâ€™ Frustrations

The best Black Friday deals 2024: Early sales live now

Gemini Live is finally available. Here’s how you can access it (and why you’ll want to)

Digital meets Physical: Risograph Printing with WebGL

git-chglog â€“ CHANGELOG generator implemented in Go

Weâ€™re Accelerating Digital Transformation Like Never Before

This Pixel feature can improve your phone’s battery health – how to turn it on

MixedBread AI Introduces Binary MRL: A Novel Embeddings Compression Method, Making Vector Search Scalable and Enable Embeddings-based Applications

Related Posts