Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Researchers at Google DeepMind Present Gecko: A Compact and Versatile Embedding Model Powered by the Vast World Knowledge of LLMs

    Researchers at Google DeepMind Present Gecko: A Compact and Versatile Embedding Model Powered by the Vast World Knowledge of LLMs

    April 2, 2024

    The efforts to create models that can understand and process text with human-like accuracy are ongoing in natural language processing. Among the famous challenges, one stands out: crafting models that can efficiently convert vast amounts of textual information into a form that machines can understand and act upon. Text embedding models serve this purpose by transforming text into dense vectors, thereby enabling machines to gauge semantic similarity, classify documents, and retrieve information based on content relevance. However, creating such models previously relied on large, manually annotated datasets, a time- and resource-intensive process.

    Researchers from Google DeepMind introduced Gecko, an innovative text embedding model. Gecko distinguishes itself by leveraging large language models (LLMs) for knowledge distillation. Unlike traditional models that depend on extensive labeled datasets, Gecko initiates its learning process by generating synthetic paired data through an LLM. This initial step produces a broad range of query-passage pairs that lay the groundwork for a diverse and comprehensive training dataset. 

    The team further refines the quality of this synthetic dataset by employing the LLM to relabel the passages, ensuring each query matches the most relevant passage. This relabeling process is critical, as it weeds out less relevant data and highlights the passages that truly resonate with the corresponding queries, a method that traditional models, limited by their datasets, often fail to achieve.

    When benchmarked on the Massive Text Embedding Benchmark (MTEB), it demonstrated exceptional performance, outpacing models with larger embedding sizes. Gecko with 256 embedding dimensions outperformed all entries with 768 embedding sizes, and when expanded to 768 dimensions, it scored an average of 66.31. These figures are particularly impressive, considering Gecko competes against models seven times its size and with embedding dimensions five times higher.

    Gecko’s main breakthrough lies in FRet, a synthetic dataset ingeniously crafted using LLMs. This dataset emerges from a two-tiered process in which LLMs first generate a broad spectrum of query-passage pairs, simulating diverse retrieval scenarios. These pairs are then refined, with passages relabeled for accuracy, ensuring each query aligns with the most relevant passage. FRet leverages the vast knowledge within LLMs to produce a diverse and precisely tailored dataset for advanced language understanding tasks.

    In conclusion, Gecko’s development marks a notable advancement in employing LLMs to generate and refine its training dataset. It cuts the limitations of traditional dataset dependencies and sets a new benchmark for the efficiency and versatility of text embedding models. The model’s exceptional performance on the MTEB, coupled with its innovative approach to data generation and refinement, underscores the potential of LLMs.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 39k+ ML SubReddit

    The post Researchers at Google DeepMind Present Gecko: A Compact and Versatile Embedding Model Powered by the Vast World Knowledge of LLMs appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDRAGIN: A Novel Machine Learning Framework for Dynamic Retrieval Augmentation in Large Language Models and Outperforming Conventional Methods
    Next Article Anthropic Explores Many-Shot Jailbreaking: Exposing AI’s Newest Weak Spot

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Hello 0-Days, My Old Friend: A 2024 Zero-Day Exploitation Analysis

    Security

    Strix Halo might end up in desktop PCs, AMD CEO confirms in an interview

    Operating Systems

    Eye of MATE – simple graphics viewer for the MATE Desktop Environment

    Linux

    How to build a LiveKit app with real-time Speech-to-Text

    Artificial Intelligence

    Highlights

    Leveraging AI for Stronger Design Systems and Innovation

    June 16, 2024

    While the assets are important, they’re merely one part of a broader picture. It’s the…

    Rumors say Final Fantasy 7 Remake is coming to Xbox in 2025 — as more Xbox games head to PS5 and Nintendo

    January 11, 2025

    Advocating for Dev Mode changed my team’s workflow. Here’s how you can do the same.

    November 27, 2024

    spatie/laravel-open-telemetry

    November 9, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.