Engineering Data Cloud Governance: Structured Security for 300,000 Orgs

In our “Engineering Energizers” Q&A series, we shine a spotlight on the innovative engineering minds at Salesforce. Today, we feature Sapna Vasant Pandit, Director of Software Engineering, who leads the team behind Data Spaces and Data Governance — a key component of Data Cloud that enables secure, structured data access at an enterprise level.

Discover how Sapna’s team navigated the architectural challenges of creating a horizontal security layer, ensured top-tier performance for large enterprises, and maintained trust by preventing regressions during a critical rollout.

What is your team’s mission?

The mission is to build robust security and governance capabilities into Data Cloud, starting with Data Spaces and expanding toward comprehensive ABAC (Attribute-Based Access Control) policy-based Data Governance. Data Spaces allows customers to enforce data segregation by brand, geography, or department. For example, a global automotive company might want to prevent its European marketing team from accessing data managed by its North American division within a single organization.

Since the general availability of Data Spaces, the focus has shifted to more granular security. This involves defining policies at the object, field, and record levels to restrict access based on user roles using the ABAC model. To support this, the team introduced a tagging system on Data Cloud objects, including pre-defined taxonomies for PII, PHI, Finance etc. These tags propagate through transformation processes like Segmentation and Calculated Insight, enabling consistent policy enforcement across the data lifecycle based on user attributes.

The team is currently driving the beta rollout of data governance on structured data and developing features to extend security controls to unstructured content, such as documents, PDFs, and Slack messages. These efforts support the broader goal of helping customers implement scalable, reliable, and policy-driven data security within Salesforce.

What was the most significant technical challenge that your team faced?

The most complex challenge was architectural: Data Spaces and Data Governance are horizontal security features, spanning every layer of Data Cloud and interacting with virtually every other feature. Unlike building a vertical microservice, ensuring these new security constructs do not disrupt existing customer workflows and remain compatible with all other features was crucial, making the rollout particularly complex and risky.

Given the horizontal nature of these initiatives, the team needed to be highly versatile, capable of navigating various technologies, languages, and system boundaries. The work involved multiple programming languages and databases, building a new ABAC policy engine, introducing new libraries for the Data Cloud’s Query layer to translate governed queries to different compute engines, and deeply integrating with Salesforce’s platform access control constructs. Extensive coordination across engineering teams was essential, and siloed development was not an option.

Another significant challenge we overcame was standardizing access control for all types of data retrieval from Data Cloud’s storage layers. Whether a human is making a connect API call, clicking a button on the UI, or an Agent is running a RAG process, all retrieval requests go through a unified, stringent security layer.

Example of what each user will see once the table is secured with ABAC policies.

What ongoing research and development efforts are aimed at improving data security capabilities?

The team continues to evolve Data Spaces as the foundation of a more comprehensive security model. One major area of active development is enabling cross-data-space access for approved use cases. Currently, users operate within a single Data Space, but the goal is to allow AI agents or human users to retrieve data across multiple spaces while adhering to governance rules, ensuring flexibility without compromising security boundaries.

Another immediate focus is on unstructured data governance. With increasing amounts of documents, PDFs, and content from external systems like Slack or GDrive being ingested into Data Cloud, new AI-assisted features are being developed to detect and tag sensitive data, such as credit card numbers or patient medical history, in freeform text. This task is significantly more challenging than masking a database column.

Generative AI is being integrated into the detection process to classify sensitive content and automatically apply redaction or masking policies. This work is still in the early stages but is crucial for supporting regulated industries like healthcare and finance. Data Spaces Governance is seen as an evolving system that must adapt to new data types, regulations, and trust boundaries.

Sapna shares the tools her team is exploring to boost their productivity, such as AI code generators.

What strategies did your team employ to ensure that enhancements in one area of Data Spaces didn’t compromise others?

One of the primary risks with a horizontal feature like Data Spaces or Data Governance is regression, especially when introducing object-, field-, and record-level policies on top of existing logic. To mitigate this risk, several strategic steps were implemented.

First, a shared testing framework was created, with each participating team responsible for writing and owning tests for their specific functionality. This ensured comprehensive test coverage across microservices, user interface layers, and Data Cloud APIs, without assuming automatic protection from other systems.

Second, significant investment was made in automated performance testing. Each new governance rule, particularly those involving large metadata sets or complex policy evaluations, was rigorously validated across various org sizes. These tests were repeatable runs against synthetic orgs designed to simulate the most challenging scenarios.

Finally, key components of the user interface were re-architected to address rendering bottlenecks that only became apparent at large scale. The layout was refactored to enhance response time, and aggressive caching and in-depth query optimization were implemented to ensure customers would not experience the burden of the underlying policy stack.

The key takeaway is that integrating layered security into an existing platform requires comprehensive testing and shared responsibility.

Sapna explains why engineers should join Salesforce.

How do you manage challenges related to scalability in Data Governance?

Scalability remains an ongoing engineering challenge, especially as we support orgs with 10,000+ objects, including DLOs, DMOs, and CIOs, and all combinations of metadata and data interactions. As Data Cloud continues to grow, we anticipate orgs with 100,000 objects spread across 100+ data spaces in the near future.

To address these challenges, significant work has been done. Caching strategies have been enhanced at multiple layers—platform, database, and microservices — to minimize database hits. Data fetching from Iceberg, the storage layer, was optimized, and multi-threaded operations were revised to increase the number of parallel threads in certain services, speeding up processing while maintaining healthy CPU and memory utilization.

The user interface for governed objects was also redesigned to address rendering bottlenecks. Certain layout decisions were causing unnecessary load, especially when layered policies needed to be evaluated before displaying results.

Ultimately, scalability is treated as a primary constraint, not just a performance enhancement, and this assumption is revisited with every change in usage patterns.

Learn more

Stay connected — join our Talent Community!
Check out our Technology and Product teams to learn how you can get involved.

Engineering Data Cloud Governance: Achieving Structured Security Across 300,000 Orgs

New to Salesforce?

About Salesforce

Popular Links

Scaling Data Cloud for Agentforce and AI: Rethinking Metadata, Activation, and Governance at Hyperscale

How Data Cloud’s Data Graph Delivers Sub-Second Insights from 200 Million Records

Zero Copy Revolutionizes Data Cloud: Real-Time Analysis Without Data Duplication or Migration

Data Cloud's Lightning-Fast Migration: From Amazon EC2 to Kubernetes in 6 Months

New to Salesforce?

About Salesforce

Popular Links