Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 20, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 20, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 20, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 20, 2025

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025

      Qualcomm’s new Adreno Control Panel will let you fine-tune the GPU for certain games on Snapdragon X Elite devices

      May 20, 2025

      Samsung takes on LG’s best gaming TVs — adds NVIDIA G-SYNC support to 2025 flagship

      May 20, 2025

      The biggest unanswered questions about Xbox’s next-gen consoles

      May 20, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      HCL Commerce V9.1 – The Power of HCL Commerce Search

      May 20, 2025
      Recent

      HCL Commerce V9.1 – The Power of HCL Commerce Search

      May 20, 2025

      Community News: Latest PECL Releases (05.20.2025)

      May 20, 2025

      Getting Started with Personalization in Sitecore XM Cloud: Enable, Extend, and Execute

      May 20, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025
      Recent

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025

      Qualcomm’s new Adreno Control Panel will let you fine-tune the GPU for certain games on Snapdragon X Elite devices

      May 20, 2025

      Samsung takes on LG’s best gaming TVs — adds NVIDIA G-SYNC support to 2025 flagship

      May 20, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»End-to-End Lineage and External Raw Data Access in Databricks

    End-to-End Lineage and External Raw Data Access in Databricks

    March 31, 2025

    Achieving end-to-end lineage in Databricks while allowing external users to access raw data can be a challenging task. In Databricks, leveraging Unity Catalog for end-to-end lineage is a best practice. However, enabling external users to access raw data while maintaining security and lineage integrity requires a well-thought-out architecture. This blog outlines a reference architecture to achieve this balance.

    Key Requirements

    To meet the needs of both internal and external users, the architecture must:

    1. Maintain end-to-end lineage within Databricks using Unity Catalog.
    2. Allow external users to access raw data without compromising governance.
    3. Secure data while maintaining flexibility for different use cases.

    Recommended Architecture

    1. Shared Raw Data Lake (Pre-Bronze)

    The architecture starts with a shared data lake as a landing zone for raw, unprocessed data from various sources. This data lake is located in external cloud storage, such as AWS S3 or Azure Data Lake, and is independent of Databricks. Access to this data is managed using IAM roles and policies, allowing both Databricks and external users to interact with the data without overlapping permissions.

    Benefits:

    • External users can access raw data without direct entry into the Databricks Lakehouse.
    • Secure and isolated raw data management.
    • Maintains data availability for non-Databricks consumers.

    2. Bronze Layer (Managed by Databricks)

    The bronze layer ingests raw data from the shared data lake into Databricks. Using Delta Live Tables (DLT), data is processed and stored as managed or external Delta tables. Unity Catalog governs these tables, enforcing fine-grained access control to maintain data security and lineage. End-to-end lineage and Databricks begins with the bronse layer and can be easily maintained throughout silver and gold by using DLTs.

    Governance:

    Hostinger
    • Permissions are enforced through Unity Catalog.
    • Data versioning and lineage tracking are maintained within Databricks.

    3. Silver and Gold Layers (Processed Data)

    Subsequent data processing transforms bronze data into refined (silver) and aggregated (gold) tables. These layers are exclusively managed within Databricks to ensure lineage continuity, leveraging Delta Lake’s optimization features.

    Access:

    • Internal users access data through Unity Catalog with appropriate permissions.
    • External users do not have direct access to these curated layers, preserving data quality.

    Access Patterns

    • External Users: Access raw data from the shared data lake through configured IAM policies. No direct access to Databricks-managed bronze tables.
    • Internal Users: Access the full data pipeline from bronze to gold within Databricks, leveraging Unity Catalog for secure and controlled access.

    Why This Architecture Works

    • Security: Separates raw data from managed bronze, reducing exposure.
    • Governance: Unity Catalog maintains strict access control and lineage.
    • Performance: Internal data processing benefits from Delta Lake optimizations, while raw data remains easily accessible for external systems.

    End-to-end lineage in Databricks

    This reference architecture offers a balanced approach to handling raw data access while maintaining governance and lineage within Databricks. By isolating raw data in a shared lake and managing processed data within Databricks, organizations can effectively support both internal analytics and external data sharing.

    Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential across your enterprise.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleWe Are Perficient: Transforming the Digital Strategies with Adobe
    Next Article Perficient Publishes 2024 Community Impact Report

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 21, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-5011 – MoonlightL Hexo-Boot Cross-Site Scripting Vulnerability

    May 21, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    The AI Fix #41: Can AIs be psychopaths, and why we should be AI optimists

    Development

    CVE-2025-47769 – Apache Struts Deserialization Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Where to buy NVIDIA RTX 5060 Ti: Launch day stock alerts for the new desktop GPU

    News & Updates

    CVE-2025-4082 – Mozilla Firefox WebGL Out-of-Bounds Read RCE

    Common Vulnerabilities and Exposures (CVEs)
    Hostinger

    Highlights

    Development

    How To Defend Your Design Process

    August 15, 2024

    Maybe you’ve been there before: You’re in the middle of the design process, and stakeholders…

    Optimize reasoning models like DeepSeek with Prompt Optimization on Amazon Bedrock

    March 16, 2025

    Patch Now! Center for Cybersecurity Belgium Warns About Critical Vulnerabilities in Telerik Report Server

    June 6, 2024

    Enterprise AI Without GPU Burn: Salesforce’s xGen-small Optimizes for Context, Cost, and Privacy

    May 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.