Anthropic introduces prompt caching to reduce latency and costs

Anthropic has introduced a new feature to some of its Claude models that will allow developers to cut down on prompt costs and latency.

Prompt caching allows users to cache frequently used context so that it can be used in future API calls. According to the company, by equipping the model with background knowledge and example outputs from the past, costs can be reduced by up to 90% and latency by up to 85% for long prompts.

There are several use cases where prompt caching would be useful, including being able to keep a summarized version of a codebase for coding assistants to use, providing long-form documents in prompts, and providing detailed instruction sets with several examples of desired outputs.Â

Users could also use it to essentially converse with long-form content like books, papers, documentation, and podcast transcripts.Â According to Anthropicâ€™s testing, chatting with a book with 100,000 tokens cached takes 2.4 seconds, whereas doing the same without information cached takes 11.5 seconds. This equates to a 79% reduction in latency.Â

It costs 25% more to cache an input token compared to the base input token price, but costs 10% less to actually use that cached content. Actual prices vary based on the specific model.

Prompt caching is now available as a public beta on Claude 3.5 Sonnet and Claude 3 Haiku, and Claude 3 Opus will be supported soon.

You may also likeâ€¦

Anthropic adds prompt evaluation feature to Console

Anthropic updates Claude with new features to improve collaboration

The post Anthropic introduces prompt caching to reduce latency and costs appeared first on SD Times.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

This $4 Steam Deck game includes the most-played classics from my childhood — and it will save you paper

Microsoft shares rare look at radical Windows 11 Start menu designs it explored before settling on the least interesting one of the bunch

NVIDIA’s new GPU driver adds DOOM: The Dark Ages support and improves DLSS in Microsoft Flight Simulator 2024

How to install and use Ollama to run AI LLMs on your Windows 11 PC

Community News: Latest PECL Releases (05.13.2025)

Community News: Latest PECL Releases (05.13.2025)

How We Use Epic Branches. Without Breaking Our Flow.

I think the ergonomics of generators is growing on me.

This $4 Steam Deck game includes the most-played classics from my childhood — and it will save you paper

This $4 Steam Deck game includes the most-played classics from my childhood — and it will save you paper

Microsoft shares rare look at radical Windows 11 Start menu designs it explored before settling on the least interesting one of the bunch

NVIDIA’s new GPU driver adds DOOM: The Dark Ages support and improves DLSS in Microsoft Flight Simulator 2024

Anthropic introduces prompt caching to reduce latency and costs

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-3744 – Nomad Sentinel Policy Bypass

CVE-2025-3358 – CVE-2022-36337 Oracle WebLogic Server Cross-Site Scripting

CSS Hover Effects: 40 Engaging Animations To Try

Privacy and security post-Snowden: Pew Research parallels ESET findings

Threat Actor Offers Unauthorized Korean National Police Agency (KNPA) Access for $4000

Google Cloud Is the New Way to the Cloud

Lua – Laravel powered open-source URL shortener

A pattern for composable UI in Flask

Not all Echo devices will get Alexa+ – see if yours made the list

Anthropic introduces prompt caching to reduce latency and costs

Related Posts