Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Prompting Is A Design Act: How To Brief, Guide And Iterate With AI

      August 29, 2025

      Best React.js Development Services in 2025: Features, Benefits & What to Look For

      August 29, 2025

      August 2025: AI updates from the past month

      August 29, 2025

      UI automation: Why “try, try again”is your mantra

      August 29, 2025

      AI is returning to Taco Bell and McDonald’s drive-thrus – will customers bite this time?

      August 30, 2025

      I deciphered Apple’s iPhone 17 event invite – my 3 biggest theories for what’s expected

      August 30, 2025

      This Milwaukee 9-tool kit is $200 off for Labor Day – here’s what’s included

      August 30, 2025

      Massive TransUnion breach leaks personal data of 4.4 million customers – what to do now

      August 30, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Streamlining Application Automation with Laravel’s Task Scheduler

      August 30, 2025
      Recent

      Streamlining Application Automation with Laravel’s Task Scheduler

      August 30, 2025

      A Fluent Path Builder for PHP and Laravel

      August 30, 2025

      Planning Sitecore Migration: Things to consider

      August 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      From Novice to Pro: Mastering Lightweight Linux for Your Kubernetes Projects

      August 30, 2025
      Recent

      From Novice to Pro: Mastering Lightweight Linux for Your Kubernetes Projects

      August 30, 2025

      Microsoft AI launches MAI-Voice-1 and previews MAI-1 foundation model

      August 29, 2025

      Clipchamp Tutorial: Cut and Split Videos Quickly

      August 29, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Salesforce to Databricks: A Deep Dive into Integration Strategies

    Salesforce to Databricks: A Deep Dive into Integration Strategies

    July 15, 2025

    Supplementing Salesforce with Databricks as an enterprise Lakehouse solution brings advantages for various personas across an organization. Customer experience data is highly valued when it comes to driving personalized customer journeys leveraging company-wide applications beyond Salesforce. From enhanced customer satisfaction to tailored engagements and offerings that drive business renewals and expansions, the advantages are hard to miss. Databricks maps data from a variety of enterprise apps, including those used by Sales, Marketing and Finance. Consequently, layering Databricks Generative AI and predictive ML capabilities provide easily accessible best-fit recommendations that help eliminate challenges and highlight success areas within your company’s customer base.

    In this blog, I elaborate on the different methods whereby Salesforce data is made accessible from within Databricks. While accessing Databricks data from Salesforce is possible, it is not the topic of this post and will perhaps be tackled in a later blog. I have focused on the built-in capabilities within both Salesforce and Databricks and have therefore excluded 3rd party data integration platforms. There are three main ways to achieve this integration:

    1. Databricks Lakeflow Ingestion from Salesforce
    2. Databricks Query Federation from Salesforce Data Cloud
    3. Databricks Files Sharing from Salesforce Data Cloud

    Choosing the best approach to use depends on your use case. The decision is driven by several factors, such as the expected latency of accessing the latest Salesforce data, the complexity of the data transformations needed, and the volume of Salesforce data of interest. And it may very well be that more than one method is implemented to cater for different requirements.

    While the first method copies the raw Salesforce data over to Databricks, methods 2 and 3 offer no-copy alternatives, thus leveraging Salesforce Data Cloud itself as the raw data layer. The no-copy alternatives are great in that they leverage Salesforce’s native capability of managing its own data lake thus eliminating overhead by redoing that effort. However, there are limitations to doing that, depending on the use case. The matrix below presents how each method compares when factoring in the key criteria for integration.

    MethodLakeflow IngestionSalesforce Data Cloud Query FederationSalesforce Data Cloud File Sharing
    TypeData IngestionZero-CopyZero-Copy
    Supports Salesforce Data Cloud as a Source?✔︎ Yes✔︎ Yes✔︎ Yes
    Incremental Data Refreshes✔︎ Automated processing into Databricks based on SF standard timestamp fields. Formula fields always require a full refresh of the formulas.✔︎ Automated in SF Data Cloud
    (Requires custom handling if copying to Databricks)
    ✔︎ Automated in SF Data Cloud
    (Requires custom handling if copying to Databricks)
    Processing of Soft Deletes✔︎ Yes Supported incrementally✔︎ Automated in SF Data Cloud
    (Requires custom handling if copying to Databricks)
    ✔︎ Automated in SF Data Cloud
    (Requires custom handling if copying to Databricks)
    Processing of Hard Deletes✘ Requires a full refresh✔︎ Automated in SF Data Cloud
    (Requires custom handling if copying to Databricks)
    ✔︎ Automated in SF Data Cloud
    (Requires custom handling if copying to Databricks)
    Query Response Time✔︎ Best as data is queried from a local copy and processed within Databricks⚠ Slower as query response is dependent on SF Data Cloud, and data has to travel across networks⚠ Slower as data travels across networks
    Supports Real-Time Querying?✘ No

    The pipeline runs on a schedule to copy data for example, hourly, daily, etc.

    ✔︎ Yes

    Live query execution on SF Data Cloud
    (Data Cloud DLO is refreshed from Salesforce modules either in batches, streaming (every 3 min), or in real-time.)

    ✔︎ Yes

    Live data sourced from SF Data Cloud
    (Data Cloud DLO is refreshed from Salesforce modules either in batches, streaming (every 3 min), or in real-time.)

    Supports Databricks Streaming Pipelines?✔︎ Yes, With Declarative Pipelines into Streaming tables (DLT) (runs as micro-batch jobs)✘ No✘ No
    Suitable for High Data Volume?✔︎ Yes
    SF Bulk API is called for high data volumes such as initial loads, and SF REST API is used for lower data volumes such as limited data volume incremental loads.
    ✘ No
    Reliant on JDBC Query Pushdown limitations and SF performance
    ⚠ Moderate
    This method is more suitable than Query Federation when it comes to zero-copy with high volumes of data.
    Supports Data Transformation⚠ No direct transformation. Ingests SF objects as is. Transformation happens downstream in the Declarative Pipeline.✔︎ Yes. DBRX pushes queries over to Salesforce using JDBC protocol.✔︎ Yes. Transformations execute on Databricks compute
    ProtocolSF REST API and Bulk API over HTTPSJDBC over HTTPSSalesforce Data Cloud DaaS APIs over HTTPS (file-based access)
    ScalabilityUp to 250 objects per pipeline. Multiple pipelines are allowed.Depending on SF Data Cloud performance when running transformation with multiple objectsUp to 250 Data Cloud objects may be included in a data share. Up to 10 data shares.
    Salesforce PrerequisitesAPI-enabled Salesforce user with access to desired objectsSalesforce Data Cloud must be available.

    Data Cloud DMOs mapped to DLOs with Streams or other methods for Data Lake population.

    Enable JDBC API access to Data Cloud.

    Salesforce Data Cloud must be available.

    Data Cloud DMOs mapped to DLOs with Streams or other methods for Data Lake population.

    Data share target is created in SF with shared objects.

    If you’re looking for guidance on leveraging Databricks with Salesforce, reach out to Perficient for a discussion with Salesforce and Databricks specialists.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePerficient Included Again in IDC Market Glance for Customer Experience Services
    Next Article Stream API in Java: Enhancements and Use Cases

    Related Posts

    Development

    Streamlining Application Automation with Laravel’s Task Scheduler

    August 30, 2025
    Development

    A Fluent Path Builder for PHP and Laravel

    August 30, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-46338 – Audiobookshelf Reflected Cross-Site Scripting (XSS) Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source Integration

    Machine Learning

    Profex – Rietveld refinement of powder X-ray diffraction (XRD)

    Linux

    Apple Appeals €500M EU DMA Fine: Challenges “Unprecedented” Ruling on App Store Policies

    Security

    Highlights

    CVE-2025-48954 – Discourse is an open-source discussion platform. V

    June 25, 2025

    CVE ID : CVE-2025-48954

    Published : June 25, 2025, 2:15 p.m. | 14 minutes ago

    Description : Discourse is an open-source discussion platform. Versions prior to 3.5.0.beta6 are vulnerable to cross-site scripting when the content security policy isn’t enabled when using social logins. Version 3.5.0.beta6 patches the issue. As a workaround, have the content security policy enabled.

    Severity: 8.1 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Will your next iPhone be ‘Made in America’? Let’s do the math

    April 28, 2025

    CVE-2025-47282 – Gardener External DNS Management Seed Cluster Control Vulnerability

    May 19, 2025

    CVE-2025-5541 – WordPress Runners Log Plugin Stored Cross-Site Scripting Vulnerability

    June 6, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.