Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Enhancing Text Embeddings in Small Language Models: A Contrastive Fine-Tuning Approach with MiniCPM

    Enhancing Text Embeddings in Small Language Models: A Contrastive Fine-Tuning Approach with MiniCPM

    August 6, 2024

    LLMs excel in natural language understanding but are resource-intensive, limiting their accessibility. Smaller models like MiniCPM offer better scalability but often need targeted optimization to perform. Text embeddings, vector representations that capture semantic information, are essential for tasks like document classification and information retrieval. While LLMs such as GPT-4, LLaMA, and Mistral achieve strong performance due to extensive training, smaller models like Gemma, Phi, and MiniCPM require specific optimizations to close the performance gap and remain efficient.

    Tsinghua University’s researchers investigated ways to enhance smaller language models by improving their text embeddings. They focused on three models—MiniCPM, Phi-2, and Gemma—and applied contrastive fine-tuning using the NLI dataset. The findings revealed that this method significantly improved text embedding quality across various benchmarks, with MiniCPM showing a notable 56.33% performance gain. This research addresses the lack of focus on smaller models. It aims to make MiniCPM more effective for resource-limited applications, demonstrating its potential alongside other models like Gemma and Phi-2 after fine-tuning. 

    Text embeddings are low-dimensional vector representations of text that capture semantic meaning, supporting tasks like information retrieval, classification, and similarity matching. Traditional models like SBERT and Sentence T5 aim to provide versatile text encoding, while more recent methods such as Contriever and E5 enhance embeddings through multi-stage training strategies. Contrastive representation learning, involving techniques like triplet loss and InfoNCE, focuses on learning effective representations by contrasting similar and dissimilar data points. Lightweight language models like Phi, Gemma, and MiniCPM address the resource demands of large-scale models by offering more efficient alternatives. Fine-tuning methods like Adapter modules and LoRA enable task-specific adaptation of pre-trained models with reduced computational costs.

    The methodology addresses Semantic Textual Similarity (STS) in English by leveraging smaller language models to create an efficient and scalable solution. The approach uses contrastive fine-tuning to enhance text embeddings, training the model to differentiate between similar and dissimilar text pairs, thus producing more accurate and contextually relevant embeddings. Low-rank adaptation (LoRA) is employed during fine-tuning to maintain computational efficiency. The study uses a processed NLI dataset with 275k samples, and experiments are conducted on smaller models, including Gemma, Phi-2, and MiniCPM. The fine-tuning process uses the InfoNCE objective with in-batch and hard negatives to improve embedding quality.

    Experiments focus on measuring the similarity score of embeddings for sentence pairs using cosine similarity and Spearman correlations. MiniCPM, Gemma, and Phi-2 are evaluated across nine benchmarks, including STS12-17, STSBenchmark, BIOSSES, and SICK-R. Results show that MiniCPM consistently outperforms the other models, achieving the highest Spearman correlations across all datasets. Fine-tuning with LoRA significantly enhances performance, with MiniCPM showing a 56-point improvement. Ablation studies reveal the impact of learning rate, prompting, and hard negatives on performance, indicating that MiniCPM benefits greatly from contrastive fine-tuning and hard negative penalization.

    The study successfully enhanced MiniCPM’s text embedding capabilities using contrastive fine-tuning on the NLI dataset. The fine-tuning led to a notable 56.33% performance improvement, allowing MiniCPM to outperform other models like Gemma and Phi-2 across nine STS benchmarks. Multiple ablation studies were conducted to explore the impact of prompt tuning, training efficiency, and the incorporation of hard negative penalties. The research improves the robustness and reliability of text embeddings in smaller-scale language models, offering a scalable and resource-efficient alternative to larger models while maintaining high performance in natural language understanding tasks.

    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 47k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

    The post Enhancing Text Embeddings in Small Language Models: A Contrastive Fine-Tuning Approach with MiniCPM appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleThe Open-Source Release of OpenPerplex.com: An AI-Powered Search Engine
    Next Article Bytedance Researchers Present Cross Language Agent – Simultaneous Interpretation (CLASI): A High-Quality And Human-Like Simultaneous Speech Translation (SiST) System

    Related Posts

    Machine Learning

    Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

    May 16, 2025
    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Multiple WordPress Plugins Compromised: Hackers Create Rogue Admin Accounts

    Development

    The CyberPower UPS Vulnerability Threatening Critical Systems Across Sectors

    Development

    Forget Ring: This 2-in-1 smart lock ditches the subscription fees while offering more features

    News & Updates

    Cyble Showcases Next-Gen Cybersecurity Technologies at RSA Conference 2024

    Development

    Highlights

    Development

    HackGATE: Setting New Standards for Visibility and Control in Penetration Testing Projects

    January 21, 2025

    Imagine receiving a penetration test report that leaves you with more questions than answers. Questions…

    CVE-2025-47774 – Vyper Uninitialized Side Effect Elision in Slice Builtin

    May 15, 2025

    I Paesi Europei Sviluppano un Supercomputer Basato su RISC-V: Tutto su EPAC1.5

    February 3, 2025

    Critical Next.js Vulnerability Allows Attackers to Bypass Middleware Authorization Checks

    March 24, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.