Skip to main content

Scaling AI-Driven Conversations from 10K to 100K While Maintaining Real-Time Consistency

Ashima Kochar
May 04 - 5 min read
Scaling AI-Driven Conversations from 10K to 100K While Maintaining Real-Time Consistency featured image

by Ashima Kochar and Deepak Mali.

In our Engineering Energizers Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Ashima Kochar, Lead Software Engineer in Service Cloud, whose team developed the Conversation Storage Service (CSS), a near-core platform that persists digital engagement conversations.

Explore how the team evolved CSS beyond transactional storage limits to support AI-driven conversational workloads reaching 100,000 concurrent interactions under uneven traffic, while managing the difficulty of real-time visibility and consistency in streaming-first architectures where Kafka lag and growing payloads introduce latency.

What is your team’s mission in building the Conversation Storage Service (CSS), and how does it support AI-driven customer engagement in Service Cloud?

CSS is not just a storage system — it is the source of truth for conversational context, which directly powers real-time AI systems such as sentiment analysis, agent assist, and supervisor insights. It provides a scalable, reliable foundation for persisting and retrieving interactions across digital channels with near real-time availability.

The team behind CSS builds a scalable, reliable foundation for conversational data to enable AI-driven engagement within Service Cloud. This platform persists every interaction across digital channels while ensuring that data remains almost immediately available for downstream systems.

As organic data growth and the adoption of Agentforce and CcaaS increase, workloads must support up to 50,000 concurrent conversations with a target of 100,000 at peak throughput. These interactions involve larger payloads and longer threads, and higher expectations for real-time access.

This scale is foundational to the Unified Agentic Communication Platform (UACP), enabling unified conversations across channels and agents with consistent, ordered context. CSS focuses on high-throughput ingestion and low-latency retrieval to deliver accurate data to Data 360, Core, and AI pipelines, ensuring both human and AI agents operate on a complete, up-to-date view of interactions.

What constraints emerged as CSS scaled to support high-volume conversational workloads across distributed systems?

CSS hit limits beyond 10,000 concurrent conversations because the Postgres-based system struggled with bursty traffic from high-activity tenants. These surges created hotspots and degraded write performance across the platform.

To address this, the team moved to a horizontally scaled No-SQL DB to buffer and batch events in the application layer at the tenant level as ingestion rates increased. They also introduced Kafka with conversation-level partitioning to distribute the load more evenly which smoothened spikes before they reached storage.

This shift still had a tradeoff where asynchronous processing introduced delays between writes and reads. These read-after-write gaps could’ve impacted agent workflows, so the team introduced VegaCache to serve recent writes directly while persistence catches up.

These changes allowed CSS to scale throughput effectively. Now, the system maintains faster read after writes and powers real-time insights for AI-driven conversational workloads.

What scale pressures shaped the CSS architecture as it moved from 10K to 50K and toward 100K concurrent conversations?

The conversation platform underwent an architectural evolution to meet the UACP north star and CSS goals. Each scale inflection point exposed specific limitations and drove subsequent design decisions.

At roughly 10,000 conversations, a transactional system worked well. By the time the load reached 25,000, uneven traffic became the primary issue as certain tenants generated disproportionate load. Moving to distributed storage improved scalability but it did not eliminate the imbalance.

The team introduced Kafka at 30,000 to stabilize ingestion through buffering and batching. While this improved throughput, it also introduced lag and delayed data visibility. By 50,000, these delays impacted real-time workflows. To solve this, the team added VegaCache to restore read-after-write consistency by serving recent data directly from memory.

As CSS approaches 100,000, the system evolves toward curated Kafka. This shift turns the conversation stream into an ordered source of truth and reduces the reliance on multiple sources.

What complexities emerged as AI-driven conversational workloads increased in volume, payload size, and real-time expectations?

AI-driven conversational workloads significantly increased both the volume and complexity of data. Modern interactions now include longer dialogues, larger payloads such as conversational email and formats like voice transcripts and AI-generated responses. Both human agents and automated systems increasingly rely on near real-time access to the latest conversation state, making low-latency, consistent retrieval critical.

To address this, CSS introduced data efficiency and access optimizations. Compression helps manage growing payload sizes, while pagination ensures large conversations are fetched in manageable segments. VegaCache provides immediate visibility into recent updates, masking delays introduced by asynchronous streaming.

Reaching 100,000 concurrent conversations is a platform-wide effort across Service Cloud modernized platform (SCRT2) components. CS enables high-throughput ingestion, Ajna stabilizes flow, ZOS scales storage, and VegaCache maintains real-time visibility — together ensuring predictable scale without compromising latency or consistency.

What limitations surfaced in the CSS streaming pipeline when ingesting real-time conversation events at scale?

Kafka provided stability but revealed scaling constraints. Consumer lag surfaced as the primary challenge when ingestion exceeded 30,000 events per minute. Heavy loads forced events into queues, which delayed data saving and created gaps in information access. These issues could disrupt agent tasks and AI systems that need instant data.

The introduction of VegaCache allows the system to serve recent writes immediately and hide streaming delays. Extra improvements like conversation partitioning, batching, and back-pressure controls reduce hotspots and steady the flow of data. These updates help the pipeline manage traffic spikes while keeping AI workloads consistent.

As part of UACP, this streaming pipeline serves as the backbone for Unified Communication Services (UCP), enabling messages from multiple channels to converge into a single conversational flow while maintaining consistent ingestion at scale.

What integration constraints did you face ensuring CSS data could reliably power downstream systems such as Data 360, Core reporting, and AI pipelines?

CSS serves various downstream systems that use different data formats. This diversity creates difficulties for schema mapping and data transformation. Defining data lake objects and data model objects for every interaction type makes integration slow and heavy.

The team is building a metadata-driven integration layer to solve this. This layer simplifies how you add data types and removes manual schema work. CSS also supports bulk exports through S3 for large-scale analytics. This effectively positions CSS as a data distribution layer, not just a storage system — serving operational, analytical, and AI workloads simultaneously, serving operational, analytical, and AI workloads simultaneously.

Learn more

Related Articles

View all