Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 30, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 30, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 30, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 30, 2025

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025

      Cyberpunk 2077 sequel enters pre-production as Phantom Liberty crosses 10 million copies sold

      May 30, 2025

      EA has canceled yet another game, shuttered its developer, and started more layoffs

      May 30, 2025

      The Witcher 3: Wild Hunt reaches 60 million copies sold as work continues on The Witcher 4

      May 30, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      How Remix is shaking things up

      May 30, 2025
      Recent

      How Remix is shaking things up

      May 30, 2025

      Perficient at Kscope25: Let’s Meet in Texas!

      May 30, 2025

      Salesforce + Informatica: What It Means for Data Cloud and Our Customers

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025
      Recent

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025

      Cyberpunk 2077 sequel enters pre-production as Phantom Liberty crosses 10 million copies sold

      May 30, 2025

      EA has canceled yet another game, shuttered its developer, and started more layoffs

      May 30, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Fine-Tuning of Llama-2 7B Chat for Python Code Generation: Using QLoRA, SFTTrainer, and Gradient Checkpointing on the Alpaca-14k Dataset

    Fine-Tuning of Llama-2 7B Chat for Python Code Generation: Using QLoRA, SFTTrainer, and Gradient Checkpointing on the Alpaca-14k Dataset

    February 8, 2025

    In this tutorial, we demonstrate how to efficiently fine-tune the Llama-2 7B Chat model for Python code generation using advanced techniques such as QLoRA, gradient checkpointing, and supervised fine-tuning with the SFTTrainer. Leveraging the Alpaca-14k dataset, we walk through setting up the environment, configuring LoRA parameters, and applying memory optimization strategies to train a model that excels in generating high-quality Python code. This step-by-step guide is designed for practitioners seeking to harness the power of LLMs with minimal computational overhead.

    Copy CodeCopiedUse a different Browser
    !pip install -q accelerate
    !pip install -q peft
    !pip install -q transformers
    !pip install -q trl

    First, install the required libraries for our project. They include accelerate, peft, transformers, and trl from the Python Package Index. The -q flag (quiet mode) keeps the output minimal.

    Copy CodeCopiedUse a different Browser
    import os
    from datasets import load_dataset
    from transformers import (
        AutoModelForCausalLM,
        AutoTokenizer,
        HfArgumentParser,
        TrainingArguments,
        pipeline,
        logging,
    )
    from peft import LoraConfig, PeftModel
    from trl import SFTTrainer

    Import the essential modules for our training setup. They include utilities for dataset loading, model/tokenizer, training arguments, logging, LoRA configuration, and the SFTTrainer.

    Copy CodeCopiedUse a different Browser
    # The model to train from the Hugging Face hub
    model_name = "NousResearch/llama-2-7b-chat-hf"
    # The instruction dataset to use
    dataset_name = "user/minipython-Alpaca-14k"
    
    
    # Fine-tuned model name
    new_model = "/kaggle/working/llama-2-7b-codeAlpaca"

    We specify the base model from the Hugging Face hub, the instruction dataset, and the new model’s name.

    Copy CodeCopiedUse a different Browser
    # QLoRA parameters
    # LoRA attention dimension
    lora_r = 64
    # Alpha parameter for LoRA scaling
    lora_alpha = 16
    # Dropout probability for LoRA layers
    lora_dropout = 0.1

    Define the LoRA parameters for our model. `lora_r` sets the LoRA attention dimension, `lora_alpha` scales LoRA updates, and `lora_dropout` controls dropout probability.

    Copy CodeCopiedUse a different Browser
    # TrainingArguments parameters
    
    
    # Output directory where the model predictions and checkpoints will be stored
    output_dir = "/kaggle/working/llama-2-7b-codeAlpaca"
    # Number of training epochs
    num_train_epochs = 1
    # Enable fp16 training (set to True for mixed precision training)
    fp16 = True
    # Batch size per GPU for training
    per_device_train_batch_size = 8
    # Batch size per GPU for evaluation
    per_device_eval_batch_size = 8
    # Number of update steps to accumulate the gradients for
    gradient_accumulation_steps = 2
    # Enable gradient checkpointing
    gradient_checkpointing = True
    # Maximum gradient norm (gradient clipping)
    max_grad_norm = 0.3
    # Initial learning rate (AdamW optimizer)
    learning_rate = 2e-4
    # Weight decay to apply to all layers except bias/LayerNorm weights
    weight_decay = 0.001
    # Optimizer to use
    optim = "adamw_torch"
    # Learning rate schedule
    lr_scheduler_type = "constant"
    # Group sequences into batches with the same length
    # Saves memory and speeds up training considerably
    group_by_length = True
    # Ratio of steps for a linear warmup
    warmup_ratio = 0.03
    # Save checkpoint every X updates steps
    save_steps = 100
    # Log every X updates steps
    logging_steps = 10

    These parameters configure the training process. They include output paths, number of epochs, precision (fp16), batch sizes, gradient accumulation, and checkpointing. Additional settings like learning rate, optimizer, and scheduling help fine-tune training behavior. Warmup and logging settings control how the model starts training and how we monitor progress.

    Copy CodeCopiedUse a different Browser
    import torch
    print("PyTorch Version:", torch.__version__)
    print("CUDA Version:", torch.version.cuda)

    Import PyTorch and print both the installed PyTorch version and the corresponding CUDA version.

    Copy CodeCopiedUse a different Browser
    !nvidia-smi

    This command shows the GPU information, including driver version, CUDA version, and current GPU usage.

    Copy CodeCopiedUse a different Browser
    # SFT parameters
    
    
    # Maximum sequence length to use
    max_seq_length = None
    # Pack multiple short examples in the same input sequence to increase efficiency
    packing = False
    # Load the entire model on the GPU 0
    device_map = {"": 0}

    Define SFT parameters, such as the maximum sequence length, whether to pack multiple examples, and mapping the entire model to GPU 0.

    Copy CodeCopiedUse a different Browser
    # SFT parameters
    
    
    # Maximum sequence length to use
    max_seq_length = None
    # Pack multiple short examples in the same input sequence to increase efficiency
    packing = False
    # Load dataset
    dataset = load_dataset(dataset_name, split="train")
    
    
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"
    # Load base model with 8-bit quantization
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        device_map="auto",
    )
    
    
    # Prepare model for training
    model.gradient_checkpointing_enable()
    model.enable_input_require_grads()
    

    Set additional SFT parameters and load our dataset and tokenizer. We configure padding tokens for the tokenizer and load the base model with 8-bit quantization. Finally, we enable gradient checkpointing and ensure the model requires input gradients for training.

    Copy CodeCopiedUse a different Browser
    from peft import get_peft_model

    Import the `get_peft_model` function, which applies parameter-efficient fine-tuning (PEFT) to our base model.

    Copy CodeCopiedUse a different Browser
    # Load LoRA configuration
    peft_config = LoraConfig(
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
        r=lora_r,
        bias="none",
        task_type="CAUSAL_LM",
    )
    
    
    # Apply LoRA to the model
    model = get_peft_model(model, peft_config)
    # Set training parameters
    training_arguments = TrainingArguments(
        output_dir=output_dir,
        num_train_epochs=num_train_epochs,
        per_device_train_batch_size=per_device_train_batch_size,
        gradient_accumulation_steps=gradient_accumulation_steps,
        optim=optim,
        save_steps=save_steps,
        logging_steps=logging_steps,
        learning_rate=learning_rate,
        weight_decay=weight_decay,
        fp16=fp16,
        max_grad_norm=max_grad_norm,
        warmup_ratio=warmup_ratio,
        group_by_length=True,
        lr_scheduler_type=lr_scheduler_type,
    )
    # Set supervised fine-tuning parameters
    trainer = SFTTrainer(
        model=model,
        train_dataset=dataset,
        dataset_text_field="text",
        max_seq_length=max_seq_length,
        tokenizer=tokenizer,
        args=training_arguments,
        packing=packing,
    )
    

    Configure and apply LoRA to our model using `LoraConfig` and `get_peft_model`. We then create `TrainingArguments` for model training, specifying epoch counts, batch sizes, and optimization settings. Lastly, we set up the `SFTTrainer`, passing it the model, dataset, tokenizer, and training arguments.

    Copy CodeCopiedUse a different Browser
    # Train model
    trainer.train()
    # Save trained model
    trainer.model.save_pretrained(new_model)
    

    Initiate the supervised fine-tuning process (`trainer.train()`) and then save the trained LoRA model to the specified directory.

    Copy CodeCopiedUse a different Browser
    # Run text generation pipeline with the fine-tuned model
    prompt = "How can I write a Python program that calculates the mean, standard deviation, and coefficient of variation of a dataset from a CSV file?"
    pipe = pipeline(task="text-generation", model=trainer.model, tokenizer=tokenizer, max_length=400)
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    print(result[0]['generated_text'])

    Create a text generation pipeline using our fine-tuned model and tokenizer. Then, we provide a prompt, generate text using the pipeline, and print the output.

    Copy CodeCopiedUse a different Browser
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    secret_value_0 = user_secrets.get_secret("HF_TOKEN")

    Access Kaggle Secrets to retrieve a stored Hugging Face token (`HF_TOKEN`). This token is used for authentication with the Hugging Face Hub.

    Copy CodeCopiedUse a different Browser
    # Empty VRAM
    # del model
    # del pipe
    # del trainer
    # del dataset
    del tokenizer
    import gc
    gc.collect()
    gc.collect()
    torch.cuda.empty_cache()
    

    The above snippet shows how to free up GPU memory by deleting references and clearing caches. We delete the tokenizer, run garbage collection, and empty the CUDA cache to reduce VRAM usage.

    Copy CodeCopiedUse a different Browser
    import torch
    
    
    # Check the number of GPUs available
    num_gpus = torch.cuda.device_count()
    print(f"Number of GPUs available: {num_gpus}")
    
    
    # Check if CUDA device 1 is available
    if num_gpus > 1:
        print("cuda:1 is available.")
    else:
        print("cuda:1 is not available.")

    We import PyTorch and check the number of GPUs detected. Then, we print the count and conditionally report whether the GPU with ID 1 is available.

    Copy CodeCopiedUse a different Browser
    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    from peft import PeftModel
    
    
    # Specify the device ID for your desired GPU (e.g., 0 for the first GPU, 1 for the second GPU)
    device_id = 1  # Change this based on your available GPUs
    device = f"cuda:{device_id}"
    # Load the base model on the specified GPU
    base_model = AutoModelForCausalLM.from_pretrained(
        model_name,
        low_cpu_mem_usage=True,
        return_dict=True,
        torch_dtype=torch.float16,
        device_map="auto",  # Use auto to load on the available device
    )
    # Load the LoRA weights
    lora_model = PeftModel.from_pretrained(base_model, new_model)
    # Move LoRA model to the specified GPU
    lora_model.to(device)
    # Merge the LoRA weights with the base model weights
    model = lora_model.merge_and_unload()
    # Ensure the merged model is on the correct device
    model.to(device)
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    Select a GPU device (device_id 1) and load the base model with specified precision and memory optimizations. Then, load and merge LoRA weights into the base model, ensuring the merged model is moved to the designated GPU. Finally, load the tokenizer and configure it with appropriate padding settings.

    In conclusion, following this tutorial, you have successfully fine-tuned the Llama-2 7B Chat model to specialize in Python code generation. Integrating QLoRA, gradient checkpointing, and SFTTrainer demonstrates a practical approach to managing resource constraints while achieving high performance.


    Download the Colab Notebook here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

    🚨 Marktechpost is inviting AI Companies/Startups/Groups to partner for its upcoming AI Magazines on ‘Open Source AI in Production’ and ‘Agentic AI’.

    The post Fine-Tuning of Llama-2 7B Chat for Python Code Generation: Using QLoRA, SFTTrainer, and Gradient Checkpointing on the Alpaca-14k Dataset appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSundial: A New Era for Time Series Foundation Models with Generative AI
    Next Article Meet ZebraLogic: A Comprehensive AI Evaluation Framework for Assessing LLM Reasoning Performance on Logic Grid Puzzles Derived from Constraint Satisfaction Problems (CSPs)

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 30, 2025
    Machine Learning

    World-Consistent Video Diffusion With Explicit 3D Modeling

    May 30, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CISA Says 4-Year-Old Apache Flink Vulnerability Still Under Active Exploitation

    Development

    Grok gets an impressive upgrade – and unchecked AI image generation apparently

    Development

    Generate compliant content with Amazon Bedrock and ConstitutionalChain

    Machine Learning

    I’m a pro Linux user, and this distribution is one of the most unique I’ve tried

    Development

    Highlights

    Barefoot Shoes Market Poised to Witness High Growth Due to Rising Health Consciousness

    April 19, 2025

    Post Content Source: Read More 

    From Suburbs to Skyscrapers: The Evolution of Codebases

    August 15, 2024

    AWS Solutions Architect Professional (SAP-C02) Certification Course

    December 20, 2024

    How to Build a Real-Time Intrusion Detection System with Python and Open-Source Libraries

    January 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.