Fine-Tuning NVIDIA NV-Embed-v1 on Amazon Polarity Dataset Using LoRA and PEFT: A Memory-Efficient Approach with Transformers and Hugging Face

In this tutorial, we explore how to fine-tune NVIDIA’s NV-Embed-v1 model on the Amazon Polarity dataset using LoRA (Low-Rank Adaptation) with PEFT (Parameter-Efficient Fine-Tuning) from Hugging Face. By leveraging LoRA, we efficiently adapt the model without modifying all its parameters, making fine-tuning feasible on low-VRAM GPUs.
Steps to the implementation in this tutorial can be broken into the following steps:

Authenticating with Hugging Face to access NV-Embed-v1
Loading and configuring the model efficiently
Applying LoRA fine-tuning using PEFT
Preprocessing the Amazon Polarity dataset for training
Optimizing GPU memory usage with `device_map=”auto”`
Training and evaluating the model on sentiment classification

By the end of this guide, you’ll have a fine-tuned NV-Embed-v1 model optimized for binary sentiment classification, demonstrating how to apply efficient fine-tuning techniques to real-world NLP tasks.

Copy CodeCopiedUse a different Browser

from huggingface_hub import login


login()  # Enter your Hugging Face token when prompted


import os
HF_TOKEN = "...."  # Replace with your actual token
os.environ["HF_TOKEN"] = HF_TOKEN


import torch
import torch.distributed as dist
from transformers import AutoModel, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset
from peft import LoraConfig, get_peft_model

First, we log into the Hugging Face Hub using your API token, set the token as an environment variable, and import various libraries needed for distributed training and fine-tuning transformer models with techniques like LoRA.

Copy CodeCopiedUse a different Browser

MODEL_NAME = "nvidia/NV-Embed-v1"
HF_TOKEN = "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"  # Replace with your actual token


tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, token=HF_TOKEN)
model = AutoModel.from_pretrained(
    MODEL_NAME,
    device_map="auto",  # Enable efficient GPU placement
    torch_dtype=torch.float16,  # Use FP16 for efficiency
    token=HF_TOKEN
)

This snippet sets a specific model name and authentication token, then loads the corresponding pretrained tokenizer and model from Hugging Face’s model hub. It also configures the model to use automatic GPU allocation and FP16 precision for improved efficiency.

Copy CodeCopiedUse a different Browser

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["self_attn.q_proj", "self_attn.v_proj"],  
    lora_dropout=0.1,
    bias="none",
    task_type="FEATURE_EXTRACTION",
)


model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

With the above code, we configure a LoRA setup with specified parameters (like r=16, lora_alpha=32, and a dropout of 0.1) targeting the self-attention mechanism’s query and value projection layers. It then integrates this configuration into the model using PEFT so that only these LoRA layers are trainable for feature extraction, and finally, the trainable parameters are printed.

Copy CodeCopiedUse a different Browser

dataset = load_dataset("amazon_polarity")


def tokenize_function(examples):
    return tokenizer(examples["content"], padding="max_length", truncation=True)


tokenized_datasets = dataset.map(tokenize_function, batched=True)

Here, we load the Amazon Polarity dataset, define a function to tokenize its “content” field with padding and truncation, and applies this function to convert the dataset into a tokenized format for model training.

Copy CodeCopiedUse a different Browser

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    save_strategy="epoch",
    save_total_limit=1,
    logging_dir="./logs",
    logging_steps=10,
    fp16=True,  # Mixed precision
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)

trainer.train()

With the above code, we set up training parameters—like output directories, batch sizes, logging, and FP16 mixed precision—using TrainingArguments, create a Trainer with the model and tokenized train/test datasets, and finally initiate the training process.

Copy CodeCopiedUse a different Browser

model.save_pretrained("./fine_tuned_nv_embed")
tokenizer.save_pretrained("./fine_tuned_nv_embed")
print(" Training Complete! Model Saved.")

Finally, we save the fine-tuned model and its tokenizer to the specified directory and then print a confirmation message indicating that training is complete and the model is saved.

By the end of this tutorial, we successfully fine-tuned NV-Embed-v1 on the Amazon Polarity dataset using LoRA and PEFT, ensuring efficient memory usage and scalable adaptation. This tutorial highlights the power of parameter-efficient fine-tuning, enabling domain adaptation of large models without requiring massive computational resources. This approach can be extended to other transformer-based models, making it valuable for custom embeddings, sentiment analysis, and NLP-driven applications. Whether you’re working on product review classification, AI-driven recommendation systems, or domain-specific search engines, this method allows you to fine-tune large-scale models on a budget efficiently.

Here is the Colab Notebook for the above project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

The post Fine-Tuning NVIDIA NV-Embed-v1 on Amazon Polarity Dataset Using LoRA and PEFT: A Memory-Efficient Approach with Transformers and Hugging Face appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Does Elden Ring Nightreign have crossplay or cross-platform play?

Cyberpunk 2077 sequel enters pre-production as Phantom Liberty crosses 10 million copies sold

EA has canceled yet another game, shuttered its developer, and started more layoffs

The Witcher 3: Wild Hunt reaches 60 million copies sold as work continues on The Witcher 4

Filament Is Now Running Natively on Mobile

Filament Is Now Running Natively on Mobile

How Remix is shaking things up

Perficient at Kscope25: Let’s Meet in Texas!

How I Run JavaScript in VS Code

How I Run JavaScript in VS Code

slimzsh is a small, usable configuration for Zsh

Does Elden Ring Nightreign have crossplay or cross-platform play?

Fine-Tuning NVIDIA NV-Embed-v1 on Amazon Polarity Dataset Using LoRA and PEFT: A Memory-Efficient Approach with Transformers and Hugging Face

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

Multimodal Foundation Models Fall Short on Physical Reasoning: PHYX Benchmark Highlights Key Limitations in Visual and Symbolic Integration

Matrix Botnet Exploits IoT Devices in Widespread DDoS Botnet Campaign

Treemacs – tree layout file explorer for Emacs

A glimpse of the next generation of AlphaFold

Bluefin – Fedora based Linux distribution

Adaptive Training Distributions with Scalable Online Bilevel Optimization

Mobile App Development Policy

CVE-2024-24780 – Apache IoTDB Untrusted URI Remote Code Execution Vulnerability

Firefox to Use AI to Generate Link Previews on Hover: Hands-On

Fine-Tuning NVIDIA NV-Embed-v1 on Amazon Polarity Dataset Using LoRA and PEFT: A Memory-Efficient Approach with Transformers and Hugging Face

Related Posts