Indian language RAG with Cohere multilingual embeddings and Anthropic Claude 3 on Amazon Bedrock

Media and entertainment companies serve multilingual audiences with a wide range of content catering to diverse audience segments. These enterprises have access to massive amounts of data collected over their many years of operations. Much of this data is unstructured text and images. Conventional approaches to analyzing unstructured data for generating new content rely on the use of keyword or synonym matching. These approaches donâ€™t capture the full semantic context of a document, making them less effective for usersâ€™ search, content creation, and several other downstream tasks.

Text embeddings use machine learning (ML) capabilities to capture the essence of unstructured data. These embeddings are generated by language models that map natural language text into their numerical representations and, in the process, encode contextual information in the natural language document. Generating text embeddings is the first step to many natural language processing (NLP) applications powered by large language models (LLMs) such as Retrieval Augmented Generation (RAG), text generation, entity extraction, and several other downstream business processes.

Converting text to embeddings using cohere multilingual embedding model

Despite the rising popularity and capabilities of LLMs, the language most often used to converse with the LLM, often through a chat-like interface, is English. And although progress has been made in adapting open source models to comprehend and respond in Indian languages, such efforts fall short of the English language capabilities displayed among larger, state-of-the-art LLMs. This makes it difficult to adopt such models for RAG applications based on Indian languages.

In this post, we showcase a RAG application that can search and query across multiple Indian languages using the Cohere Embed â€“ Multilingual model and Anthropic Claude 3 on Amazon Bedrock. This post focuses on Indian languages, but you can use the approach with other languages that are supported by the LLM.

Solution overview

We use the Flores dataset [1], a benchmark dataset for machine translation between English and low-resource languages. This also serves as a parallel corpus, which is a collection of texts that have been translated into one or more languages.

With the Flores dataset, we can demonstrate that the embeddings and, subsequently, the documents retrieved from the retriever, are relevant for the same question being asked in multiple languages. However, given the sparsity of the dataset (approximately 1,000 lines per language from more than 200 languages), the nature and number of questions that can be asked against the dataset is limited.

After you have downloaded the data, load the data into the pandas data frame for processing. For this demo, we are restricting ourselves to Bengali, Kannada, Malayalam, Tamil, Telugu, Hindi, Marathi, and English. If you are looking to adopt this approach for other languages, make sure the language is supported by both the embedding model and the LLM thatâ€™s being used in the RAG setup.

Load the data with the following code:

import pandas as pd

df_ben = pd.read_csv(‘./data/Flores/dev/dev.ben_Beng’, sep=’t’)
df_kan = pd.read_csv(‘./data/Flores/dev/dev.kan_Knda’, sep=’t’)
df_mal = pd.read_csv(‘./data/Flores/dev/dev.mal_Mlym’, sep=’t’)
df_tam = pd.read_csv(‘./data/Flores/dev/dev.tam_Taml’, sep=’t’)
df_tel = pd.read_csv(‘./data/Flores/dev/dev.tel_Telu’, sep=’t’)
df_hin = pd.read_csv(‘./data/Flores/dev/dev.hin_Deva’, sep=’t’)
df_mar = pd.read_csv(‘./data/Flores/dev/dev.mar_Deva’, sep=’t’)
df_eng = pd.read_csv(‘./data/Flores/dev/dev.eng_Latn’, sep=’t’)
# Choose fewer/more languages if needed

df_all_Langs = pd.concat([df_ben, df_kan, df_mal, df_tam, df_tel, df_hin, df_mar,df_eng], axis=1)
df_all_Langs.columns = [‘Bengali’, ‘Kannada’, ‘Malayalam’, ‘Tamil’, ‘Telugu’, ‘Hindi’, ‘Marathi’,’English’]

df_all_Langs.shape #(996,8)

df = df_all_Langs
stacked_df = df.stack().reset_index() # for ease of handling

# select only the required columns, rename them
stacked_df = stacked_df.iloc[:,[1,2]]
stacked_df.columns = [‘language’,’text’]

The Cohere multilingual embedding model

Cohere is a leading enterprise artificial intelligence (AI) platform that builds world-class LLMs and LLM-powered solutions that allow computers to search, capture meaning, and converse in text. They provide ease of use and strong security and privacy controls.

The Cohere Embed â€“ Multilingual model generates vector representations of documents for over 100 languages and is available on Amazon Bedrock. With Amazon Bedrock, you can access the embedding model through an API call, which eliminates the need to manage the underlying infrastructure and makes sure sensitive information remains securely managed and protected.

The multilingual embedding model groups text with similar meanings by assigning them positions in the semantic vector space that are close to each other. Developers can process text in multiple languages without switching between different models. This makes processing more efficient and improves performance for multilingual applications.

Text embeddings turn unstructured data into a structured form. This allows you to objectively compare, dissect, and derive insights from all these documents. Cohereâ€™s new embedding models have a new required input parameter, input_type, which must be set for every API call and include one of the following four values, which align towards the most frequent use cases for text embeddings:

input_type=â€search_documentâ€ â€“ Use this for texts (documents) you want to store in your vector database
input_type=â€search_queryâ€ â€“ Use this for search queries to find the most relevant documents in your vector database
input_type=â€classificationâ€ â€“ Use this if you use the embeddings as input for a classification system
input_type=â€clusteringâ€ â€“ Use this if you use the embeddings for text clustering

Using these input types provides the highest possible quality for the respective tasks. If you want to use the embeddings for multiple use cases, we recommend using input_type=”search_document”.

Prerequisites

To use the Claude 3 Sonnet LLM and the Cohere multilingual embeddings model on this dataset, ensure that you have access to the models in your AWS account under Amazon Bedrock, Model Access section and then proceed with installing the following packages. The following code has been tested to work with the Amazon SageMaker Data Science 3.0 Image, backed by an ml.t3.medium instance.

! apt-get update
! apt-get install build-essential -y # for the hnswlib package below
! pip install hnswlib

Create a search index

With all of the prerequisites in place, you can now convert the multilingual corpus into embeddings and store those in hnswlib, a header-only C++ Hierarchical Navigable Small Worlds (HNSW) implementation with Python bindings, insertions, and updates. HNSWLib is an in-memory vector store that can be saved to a file, which should be sufficient for the small dataset we are working with. Use the following code:

import hnswlib
import os
import json
import botocore
import boto3

boto3_bedrock = boto3.client(‘bedrock’)
bedrock_runtime = boto3.client(‘bedrock-runtime’)

# Create a search index
index = hnswlib.Index(space=’ip’, dim=1024)
index.init_index(max_elements=10000, ef_construction=512, M=64)

all_text = stacked_df[‘text’].to_list()
all_text_lang = stacked_df[‘language’].to_list()

Embed and index documents

To embed and store the small multilingual dataset, use the Cohere embed-multilingual-v3.0 model, which creates embeddings with 1,024 dimensions, using the Amazon Bedrock runtime API:

modelId=”cohere.embed-multilingual-v3″
contentType= “application/json”
accept = “*/*”

df_chunk_size = 80
chunk_embeddings = []
for i in range(0,len(all_text), df_chunk_size):
chunk = all_text[i:i+df_chunk_size]
body=json.dumps(
{“texts”:chunk,”input_type”:”search_document”} # search documents
)
response = bedrock_runtime.invoke_model(body=body,
modelId=modelId,
accept=accept,
contentType=contentType)
response_body = json.loads(response.get(‘body’).read())
index.add_items(response_body[’embeddings’])

Verify that the embeddings work

To test the solution, write a function that takes a query as input, embeds it, and finds the top N documents most closely related to it:

# Retrieval of closest N docs to query
def retrieval(query, num_docs_to_return=10):
modelId=”cohere.embed-multilingual-v3″
contentType= “application/json”
accept = “*/*”
body=json.dumps(
{“texts”:[query],”input_type”:”search_query”} # search query
)
response = bedrock_runtime.invoke_model(body=body,
modelId=modelId,
accept=accept,
contentType=contentType)
response_body = json.loads(response.get(‘body’).read())
doc_ids = index.knn_query(response_body[’embeddings’],
k=num_docs_to_return)[0][0]
print(f”Query: {query} n”)
retrieved_docs = []

for doc_id in doc_ids:
# Append results
retrieved_docs.append(all_text[doc_id]) # original vernacular language docs

# Print results
print(f”Original Flores Text {all_text[doc_id]}”)
print(“-“*30)

print(“END OF RESULTS nn”)
return retrieved_docs

You can explore what the RAG stack does with a couple of queries in different languages, such as Hindi:

queries = [
“à¤®à¥à¤à¥‡ à¤¸à¤¿à¤‚à¤§à¥ à¤¨à¤¦à¥€ à¤˜à¤¾à¤Ÿà¥€ à¤¸à¤à¥à¤¯à¤¤à¤¾ à¤•à¥‡ à¤¬à¤¾à¤°à¥‡ à¤®à¥‡à¤‚ à¤¬à¤¤à¤¾à¤‡à¤”,”
]
# translation: tell me about Indus Valley Civilization
for query in queries:
retrieval(query)

The index returns documents relevant to the search query from across languages:

You can now use these documents retrieved from the index as context while calling the Anthropic Claude 3 Sonnet model on Amazon Bedrock. In production settings with datasets that are several orders of magnitude larger than the Flores dataset, we can make the search results from the index even more relevant by using Cohereâ€™s Rerank models.

Use the system prompt to outline how you want the LLM to process your query:

# Retrieval of docs relevant to the query
def context_retrieval(query, num_docs_to_return=10):

modelId=”cohere.embed-multilingual-v3″
contentType= “application/json”
accept = “*/*”
body=json.dumps(
{“texts”:[query],”input_type”:”search_query”} # search query
)
response = bedrock_runtime.invoke_model(body=body,
modelId=modelId,
accept=accept,
contentType=contentType)
response_body = json.loads(response.get(‘body’).read())
doc_ids = index.knn_query(response_body[’embeddings’],
k=num_docs_to_return)[0][0]
retrieved_docs = []

for doc_id in doc_ids:
retrieved_docs.append(all_text[doc_id])
return ” “.join(retrieved_docs)

def query_rag_bedrock(query, model_id = ‘anthropic.claude-3-sonnet-20240229-v1:0’):

system_prompt = ”’
You are a helpful emphathetic multilingual assitant.
Identify the language of the user query, and respond to the user query in the same language.

For example
if the user query is in English your response will be in English,
if the user query is in Malayalam, your response will be in Malayalam,
if the user query is in Tamil, your response will be in Tamil
and so on…

if you cannot identify the language: Say you cannot idenitify the language

You will use only the data provided within the <context> </context> tags, that matches the user’s query’s language, to answer the user’s query
If there is no data provided within the <context> </context> tags, Say that you do not have enough information to answer the question

Restrict your response to a paragraph of less than 400 words avoid bullet points
”’
max_tokens = 1000

messages = [{“role”: “user”, “content”: f”’
query : {query}
<context>
{context_retrieval(query)}
</context>
”’}]

body=json.dumps(
{
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: max_tokens,
“system”: system_prompt,
“messages”: messages
}
)

response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(response.get(‘body’).read())
return response_body[‘content’][0][‘text’]

Letâ€™s pass in the same query in multiple Indian languages:

for query in queries:
print(query_rag_bedrock(query))
print(‘_’*20)

The query is in English, so I will respond in English.

Conclusion

This post presented a walkthrough for using Cohereâ€™s multilingual embedding model along with Anthropic Claude 3 Sonnet on Amazon Bedrock. In particular, we showed how the same question asked in multiple Indian languages, is getting answered using relevant documents retrieved from a vector store

Cohereâ€™s multilingual embedding model supports over 100 languages. It removes the complexity of building applications that require working with a corpus of documents in different languages. The Cohere Embed model is trained to deliver results in real-world applications. It handles noisy data as inputs, adapts to complex RAG systems, and delivers cost-efficiency from its compression-aware training method.

Start building with Cohereâ€™s multilingual embedding model and Anthropic Claude 3 Sonnet on Amazon Bedrock today.

References

[1] Flores Dataset: https://github.com/facebookresearch/flores/tree/main/flores200

About the Author

Rony K RoyÂ is a Sr. Specialist Solutions Architect, Specializing in AI/ML. Rony helps partners build AI/ML solutions on AWS.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

This $449 Lenovo convertible laptop gets up to 13 hours of battery life

I’ll never forget these three Windows apps that changed my life forever — So, where are they now as Microsoft turns 50?

Rebellion’s Atomfall has already reached 1.5 million players

Craft new mines in Minecraft to mine and craft more in the April Fool’s Day update you can actually play

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

What is Libuv: The Engine Powering Node.js and Beyond

This $449 Lenovo convertible laptop gets up to 13 hours of battery life

This $449 Lenovo convertible laptop gets up to 13 hours of battery life

I’ll never forget these three Windows apps that changed my life forever — So, where are they now as Microsoft turns 50?

Rebellion’s Atomfall has already reached 1.5 million players

Indian language RAG with Cohere multilingual embeddings and Anthropic Claude 3 on Amazon Bedrock

Solution overview

The Cohere multilingual embedding model

Prerequisites

Create a search index

Embed and index documents

Verify that the embeddings work

Conclusion

References

About the Author

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

The Dumbest Thing in Security This Week: U Did WUT?

IT Leader’s Guide to Virtualization

Expanding HaloAI Footprint, SmartBear Boosts Code Quality through Transformative Automation

You can now upload files to ChatGPT when using o1 and o3-mini, but not yet for everyone

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

Amap – Gather Info in Easy Way

Stealthier GodFather Malware Uses Native Code to Target 500 Banking and Crypto Apps

Google Patches 47 Android Security Flaws, Including Actively Exploited CVE-2024-53104

Indian language RAG with Cohere multilingual embeddings and Anthropic Claude 3 on Amazon Bedrock

Solution overview

The Cohere multilingual embedding model

Prerequisites

Create a search index

Embed and index documents

Verify that the embeddings work

Conclusion

References

About the Author

Related Posts