Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Stop writing tests: Automate fully with Generative AI

      August 19, 2025

      Opsera’s Codeglide.ai lets developers easily turn legacy APIs into MCP servers

      August 19, 2025

      Black Duck Security GitHub App, NuGet MCP Server preview, and more – Daily News Digest

      August 19, 2025

      10 Ways Node.js Development Boosts AI & Real-Time Data (2025-2026 Edition)

      August 18, 2025

      This new Coros watch has 3 weeks of battery life and tracks way more – even fly fishing

      August 20, 2025

      5 ways automation can speed up your daily workflow – and implementation is easy

      August 20, 2025

      This new C-suite role is more important than ever in the AI era – here’s why

      August 20, 2025

      iPhone users may finally be able to send encrypted texts to Android friends with iOS 26

      August 20, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Creating Dynamic Real-Time Features with Laravel Broadcasting

      August 20, 2025
      Recent

      Creating Dynamic Real-Time Features with Laravel Broadcasting

      August 20, 2025

      Understanding Tailwind CSS Safelist: Keep Your Dynamic Classes Safe!

      August 19, 2025

      Sitecore’s Content SDK: Everything You Need to Know

      August 19, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Why GNOME Replaced Eye of GNOME with Loupe as the Default Image Viewer

      August 19, 2025
      Recent

      Why GNOME Replaced Eye of GNOME with Loupe as the Default Image Viewer

      August 19, 2025

      Microsoft admits it broke “Reset this PC” in Windows 11 23H2 KB5063875, Windows 10 KB5063709

      August 19, 2025

      How to Fix “EA AntiCheat Has Detected an Incompatible Driver” on Windows 11?

      August 19, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Avoiding Metadata Contention in Unity Catalog

    Avoiding Metadata Contention in Unity Catalog

    April 7, 2025

    Metadata contention in Unity Catalog can occur in high-throughput Databricks environments, slowing down user queries and impacting performance across the platform. Our Finops strategy shifts left on performance. However, we have found scenarios where clients are still experiencing query slowdowns intermittently and even on optimized queries. As our client’s lakehouse footprint grows, we are seeing an emerging pattern where stress on Unity Catalog can have a downstream drag on performance across the workspace. In some cases, we have identified metadata contention in Unity Catalog as a contributor to unexpected reductions in response times after controlling for more targeted optimizations.

    How Metadata Contention Can Slow Down User Queries

    When data ingestion and transformation pipelines rely on structural metadata changes, they introduce several stress points across Unity Catalog’s architecture. These are not isolated to the ingestion job—they ripple across the control plane and affect all users.

    • Control Plane Saturation – Control plane saturation, often seen in distributed systems like Databricks, refers to the state when administrative functions (like schema updates, access control enforcement, and lineage tracking) overwhelm their processing capacity. Every structural table modification—especially those via CREATE OR REPLACE TABLE—adds to the metadata transaction load in Unity Catalog. This leads to:
      • Delayed responses from the catalog API
      • Increased latency in permission resolution
      • Slower query planning, even for unrelated queries
    • Metastore Lock Contention – Each table creation or replacement operation requires exclusive locks on the underlying metastore objects. When many jobs concurrently attempt these operations:
      • Other jobs or queries needing read access are queued
      • Delta transaction commits are delayed
      • Pipeline parallelism is reduced
    • Query Plan Invalidation Cascade – CREATE OR REPLACE TABLE invalidates the current logical and physical plan cache for all compute clusters referencing the old version. This leads to:
      • Increased query planning time across clusters
      • Unpredictable performance for dashboards or interactive workloads
      • Reduced cache utilization across Spark executors
    • Schema Propagation Overhead – Structural changes to a table (e.g., column additions, type changes) must propagate to all services relying on schema consistency. This includes:
      • Databricks SQL endpoints
      • Unity Catalog lineage services
      • Compute clusters running long-lived jobs
    • Multi-tenant Cross-Job Interference – Unity Catalog is a shared control plane. When one tenant (or set of jobs) aggressively replaces tables, the metadata operations can delay or block unrelated tenants. This leads to:
      • Slow query startup times for interactive users
      • Cluster spin-up delays due to metadata prefetch slowness
      • Support escalation from unrelated teams

    The CREATE OR REPLACE Reset

    In other blogs, I have said that predictive optimization is the reward for investing in good governance practices with Unity Catalog. One of the key enablers of predictive optimzation is a current, cached logical and physical plan. Every time a table is created, a new logical and physical plan for this and related tables is created. This means that ever time you execute CREATE OR REPLACE TABLE, you are back to step one for performance optimization. The DROP TABLE + CREATE TABLE pattern will have the same net result.

    This is not to say that CREATE OR REPLACE TABLE is inherently an anti-pattern. It only becomes a potential performance issue at scales, think thousands of jobs rather than hundreds. Its also not the only cuplrit. ALTER TABLE with structural changes have a similar effect. CREATE OR REPLACE TABLE is ubiquitous in data ingestion pipelines and it doesn’t start to cause a noticeable issue until is deeply ingrained in your developer’s muscle memory. There are alternatives, though.

    Summary of Alternatives

    There are different techniques you can use that will not invalidate the plan cache.

    • Use CREATE TABLE IF NOT EXISTS +  INSERT OVERWRITE is probably my first choice because there is a straight code migration path.
    CREATE TABLE IF NOT EXISTS catalog.schema.table (
    id INT,
    name STRING
    ) USING DELTA;
    INSERT OVERWRITE catalog.schema.table
    SELECT * FROM staging_table;
    • Both MERGE INTO and  COPY INTO have the metadata advantages of the prior solution and support schema evolution as well as concurrency-safe ingestion.
    MERGE INTO catalog.schema.table t
    USING (SELECT * FROM staging_table) s
    ON t.id = s.id
    WHEN MATCHED THEN UPDATE SET *
    WHEN NOT MATCHED THEN INSERT *;
    COPY INTO catalog.schema.table
    FROM '/mnt/source/'
    FILEFORMAT = PARQUET
    FORMAT_OPTIONS ('mergeSchema' = 'true');
    • Consider whether you need to be persisting the data beyond the life of the job. If not, consider temporary views or tables. This will avoid Unity Catalog entirely as there is no metadata overhead.
    df.createOrReplaceTempView("job_tmp_view")
    • While I prefer Unity Catalog to handle partitioning strategies in the Silver and Gold layer, you can implement a partitioning scheme with your ingestion logic to keep the metadata stable. This is helpful for high-concurrency workloads.
    CREATE TABLE IF NOT EXISTS catalog.schema.import_data (
    id STRING,
    source STRING,
    load_date DATE
    ) PARTITIONED BY (source, load_date);
    INSERT INTO catalog.schema.import_data
    PARTITION (source = 'job_xyz', load_date = current_date())
    SELECT * FROM staging;

    I have summarized the different techniques you can use to minimize plan invalidation. In general, I think INSER OVERWRITE usually works well as a drop-in replacement. You get schema evolution with MERGE INTO and COPY INTO. I am often surprised at how many tables that should be considered temporary are stored. This is just a good exercise to go through with your jobs. Finally, there are occasions when the Partition + INSERT paradigm is preferable to INSERT OVERWRITE, particularly for high-concurrency workloads.

    TechniqueMetadata CostPlan InvalidationConcurrency-SafeSchema EvolutionNotes
    CREATE OR REPLACE TABLEHighYesNoYesUse with caution in production
    INSERT OVERWRITELowNoYesNoFast for full refreshes
    MERGE INTOMediumNoYesYesIdeal for idempotent loads
    COPY INTOLowNoYesYesGreat with Auto Loader
    TEMP VIEW / TEMP TABLENoneNoYesN/ABest for intermittent pipeline stages
    Partition + INSERTLowNoYesNoEfficient for batch-style jobs

    Conclusion

    Tuning the performance characteristics of a platform is more complex than single-application performance tuning. Distributed performance is even more complicated at scale, sice strategies and patterns may start to break down as volume and velocity increase.

    Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential across your enterprise.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSitecore Search Source Types – Part I
    Next Article Migrating from Eloqua to Salesforce Marketing Cloud: A Step-by-Step Guide

    Related Posts

    Development

    Creating Dynamic Real-Time Features with Laravel Broadcasting

    August 20, 2025
    Repurposing Protein Folding Models for Generation with Latent Diffusion
    Artificial Intelligence

    Repurposing Protein Folding Models for Generation with Latent Diffusion

    August 20, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    You can now generate images with ChatGPT on WhatsApp – here’s how

    News & Updates

    Is James Bond 007 First Light on Xbox?

    News & Updates

    The next leap naming the future before it arrives and it’s called “India’s Human AI – Srinidhi Ranganathan

    Artificial Intelligence

    Stretch Break – take regular breaks

    Linux

    Highlights

    Artificial Intelligence

    How we’re supporting better tropical cyclone prediction with AI

    June 12, 2025

    We’re launching Weather Lab, featuring our experimental cyclone predictions, and we’re partnering with the U.S.…

    CVE-2025-47905 – Varnish Cache HTTP/1 Chunk Boundary CRLF Injection

    May 13, 2025

    T-Mobile settlement payouts begin this month – how much you could get

    April 2, 2025

    MongoDB Atlas 與生成式AI完美結合,永豐銀行數位金融服務再進化

    May 5, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.