Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»DeepSeek API Introduces Context Caching on Disk: Reducing Input Token Price to 1/10

    DeepSeek API Introduces Context Caching on Disk: Reducing Input Token Price to 1/10

    August 9, 2024

    Large Language Models (LLMs) are advancing rapidly resulting in more complex architecture. The high cost of LLMs has been a major barrier to their widespread adoption in various industries. Businesses and developers have been hesitant to invest in these models due to the substantial operational expenses. A significant portion of these costs arises from the repetitive processing of input data or “context,” as many user inputs, particularly in applications like customer support chatbots, frequently reuse similar patterns or prefixes. Traditional LLMs process this repetitive data multiple times, leading to unnecessary computational expenses and increased operational costs, thereby limiting the accessibility and scalability of these models.

    Current LLM APIs typically process every user input afresh, including the repetitive inputs or identical to previous ones. This leads to inefficiencies, both in terms of computational resources and costs. DeepSeek addresses this issue with its innovative Context Caching on Disk technology. This method involves caching frequently used input context on a distributed disk array rather than storing it in more expensive memory. The cached content is then reused for subsequent inputs that share identical prefixes, thus bypassing the need for recomputation. This approach reduces service latency and significantly cuts overall usage costs, with potential savings of up to 90% for users.

    DeepSeek’s Context Caching on Disk technology works by first analyzing incoming requests to identify patterns and recurring contexts. Frequently used context is then stored on a distributed disk array, and when a new request arrives, the system checks the cache for matching contexts. If a match is found, the cached data is retrieved and used, avoiding the need for recomputation. This process is dynamically managed to ensure optimal performance and storage efficiency. The impact of this approach not only accelerates response times—cutting first-token latency from 13 seconds to just 500 milliseconds in some cases—but also increases the system’s throughput, enabling it to handle a higher volume of requests simultaneously. Moreover, the costs are drastically reduced, with the service charging only $0.014 per million tokens for cache hits, compared to $0.14 for non-cache tokens.

    In conclusion, DeepSeek’s Context Caching on Disk represents a significant advancement in the field of LLMs, addressing the critical issue of high operational costs by leveraging disk-based caching. This method reduces computational expenses and enhances system performance, making LLM technology more accessible and scalable. By reducing latency and increasing throughput, this method could democratize access to LLMs and stimulate new applications across various industries.

    Check out the Details here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

    The post DeepSeek API Introduces Context Caching on Disk: Reducing Input Token Price to 1/10 appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop Calendar Tools For Meetings (2024)
    Next Article POA: A Novel Self-Supervised Learning Paradigm for Efficient Multi-Scale Model Pre-Training

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2025-45885 – PHPGURUKUL Vehicle Parking Management System SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    Enhancing User Experience in Salesforce Through Cyclone Testing

    Development

    Just – very small v8 JavaScript runtime

    Development

    Google Play Console gets a makeover to provide app developers with easier access to insights into performance and quality

    Tech & Work

    Highlights

    Artificial Intelligence

    GitHub Copilot’s Brand New Agent Mode: A Glimpse Into the Future of Autonomous Coding

    February 8, 2025

    As a tech enthusiast and coder, I’m always on the lookout for the next big…

    Understanding Data Labeling (Guide)

    November 20, 2024

    Monitorets – system resource monitor

    December 15, 2024

    Arch Linux approda ufficialmente su Windows Subsystem for Linux

    April 28, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.