Skip to main content

How Einstein Copilot Sharpens Large Language Model Outputs and Redefines AI Data Testing

Armita Peymandoust
Jun 21 - 6 min read

In our “Engineering Energizers” Q&A series, we explore the paths of engineering leaders who have attained significant accomplishments in their respective fields. Today, we spotlight Armita Peymandoust, Senior Vice President of Software Engineering at Salesforce, who spearheads the development of Einstein Copilot, a conversational AI assistant for CRM that integrates data, metadata, prompts, and workflows to perform intelligent analysis and task execution, thereby optimizing efficiency and improving task completion rates.

Explore how Armita’s team addresses large language model (LLM) complexities by implementing guardrails to ensure accurate and reliable AI outputs, ensures customer data privacy by using synthetic and public datasets to test features, and much more!

What are the primary technical challenges in developing Einstein Copilot?

The development of Einstein Copilot at Salesforce faces two primary technical challenges that are crucial to its success and functionality.

The first challenge involves LLM management and utilization. Einstein Copilot uses LLM to perform complex tasks by generating accurate and contextually appropriate responses. However, the technology is relatively new and the team continuously navigates learning how best to implement and control it. A significant aspect of this challenge is preventing the model from “hallucinating”—producing plausible but incorrect or irrelevant outputs. To address this, the team focuses on creating strict guardrails and providing specific, contextual grounding to the model, which helps in narrowing down the response scope and maintaining output reliability.

The second major challenge is ensuring the availability of appropriate and realistic datasets for testing the AI features of Einstein Copilot. Since Salesforce policy prohibits the use of customer data for development purposes, the team must use alternative methods to test and refine their features. They primarily rely on synthetic data, which they generate to mimic real-world scenarios that Einstein Copilot might encounter. Additionally, they utilize public data sources available for research to further validate and compare the performance of different model configurations. In some cases, they also engage in pilot agreements with customers, allowing them to access real data in a controlled and ethical manner. This not only helps in fine-tuning Einstein Copilot’s features but also ensures that the data used respects customer privacy and data ownership.

Armita discusses Salesforce’s engineering culture.

What are the challenges and typical adoption process for integrating Einstein Copilot into existing customer workflows?

Integrating Einstein Copilot into existing customer workflows poses significant challenges, requiring careful change management. This often prolongs the feedback loop, as customers need time to test and adapt to new features before fully integrating them.

For instance, when Salesforce releases a new feature of Einstein Copilot, initial use is typically limited to admins and power users. This gradual adoption process ensures the AI solutions are trusted and seamlessly integrate into established systems without disruption.

This methodical approach to innovation underscores Salesforce’s commitment to delivering dependable AI solutions that meet customer needs and fit into their operational realities.

A diagram depicting the testing of generative copilots and agents at scale.

How does the collaboration between your engineering and product teams contribute to the development of effective and relevant Einstein Copilot features?

It’s really about the synergy between our engineering and product teams. Our engineering team is deeply involved in exploring what’s technically feasible with the latest AI advancements. They essentially set the stage by showcasing the art of the possible.

On the other side, our product team steps in with a strong understanding of the challenges and needs our customers face. This combination allows us to pinpoint which business problems can be effectively solved using new technologies. It’s a dynamic interplay where both teams bring something vital to the table, ensuring that the solutions we develop are cutting-edge, directly relevant, and highly beneficial to our users.

We also take a very measured approach to deciding which features to push forward. This involves a careful assessment of the costs associated with developing and maintaining these features versus the value they deliver to our customers and to Salesforce as a business. It’s all about making strategic choices that maximize impact while optimizing resource use.

This collaborative process keeps us agile and responsive, enabling us to adapt quickly in a fast-changing market environment. It ensures that the AI features we develop are innovative and aligned perfectly with what our customers need to succeed.

Armita discusses a new project in development, in collaboration with Salesforce’s AI Research team

What is the iterative process your team follows when developing new features for Einstein Copilot?

The development of new features for Einstein Copilot at Salesforce follows a meticulous iterative process. Initially, the team defines the problem they aim to solve and constructs a specific prompt for it. This prompt engineering is crucial as it shapes the subsequent development and testing phases.

Once a feature is built, it undergoes rigorous testing to evaluate the quality and accuracy of the generated outputs. This testing involves using labeling tools, human labelers, and state-of-the-art evaluation metrics. After the release of these features, the team heavily relies on customer feedback, which is gathered both qualitatively and quantitatively within the product. This feedback is integral to the iterative cycle, enabling the team to refine and enhance the features continuously.

This structured approach ensures that each feature not only meets the initial design specifications but also evolves based on direct user input, aligning closely with customer needs and expectations.

Armita explains why engineers should join Salesforce.

Diving deeper, how does your team handle customer feedback and integrate it into Einstein Copilot’s development process?

The team takes a meticulous approach to monitor and evaluate every feature developed, especially given the innovative nature of generative AI technology. Due to the novelty of these applications, it can be challenging to anticipate customer preferences and needs accurately. To address this, the team implements comprehensive instrumentation within the features, allowing them to capture detailed data on customer interactions. This includes tracking actions such as generation, editing, and acceptance of outputs, which provides a rich source of both quantitative and qualitative feedback.

This feedback is invaluable, as it not only helps in assessing the effectiveness of each feature but also informs the necessary adjustments to enhance functionality and user satisfaction. By continuously analyzing how customers interact with the features, the team can make data-driven decisions to refine and improve the AI solutions, ensuring they align more closely with user expectations and improve their overall experience. This dynamic feedback loop is integral to the iterative development process, enabling the team to adapt swiftly and effectively to user feedback and evolving market needs.

Learn More
  • Hungry for more AI stories? Learn how the new Einstein Copilot for Tableau is building the future of AI-driven analytics in this blog.
  • Stay connected — join our Talent Community!
  • Check out our Technology and Product teams to learn how you can get involved.

Related Articles

View all