Skip to main content

Optimizing Postgres for AI Workloads: How Heroku Integrated pgvector for Vector Search

Terry Watts
Feb 24 - 6 min read

In our Engineering Energizers Q&A series, we shine a spotlight on engineering leaders who tackle complex technical challenges to drive innovation. Today, we feature Terry Watts, Director of Software Engineering at Salesforce, leading the Heroku Data Services team. This team is dedicated to enhancing Heroku’s managed data services, enabling developers to build and scale applications without the burden of infrastructure management.

Discover how Terry’s team has successfully adapted database technologies for multi-tenancy, optimized performance for AI-driven workloads, and ensured robust security and compliance within Heroku’s managed environment.

What is your team’s mission?

Heroku Data Services manages Heroku’s Postgres offering, ensuring developers can build applications without the hassle of database infrastructure. The team’s mission is to provide a scalable, high-performance, and secure managed database service, seamlessly integrating new capabilities like pgvector.

Postgres is an open-source relational database renowned for its extensibility, strong query performance, and ability to handle large datasets, making it ideal for AI-driven applications. pgvector, a Postgres extension, enables vector search, allowing developers to store and retrieve high-dimensional embeddings for AI tasks such as recommendation engines, image recognition, and natural language processing. By integrating pgvector into Heroku’s managed Postgres environment, the team significantly expanded the platform’s AI capabilities while maintaining its simplicity and reliability.

The integration was a collaborative effort between two teams: the Data Experience team, which focused on developer experience, onboarding, and usability improvements to ensure a seamless experience, and the Data Foundation team, which handled backend automation, observability, and multi-tenant scaling. Through this coordinated effort, the team successfully integrated pgvector, providing developers with powerful AI-driven vector search in a fully managed Postgres environment.

Terry shares why engineers should join Salesforce.

What were the biggest technical challenges your team faced while integrating pgvector?

Integrating pgvector into Heroku Postgres posed several technical challenges. The primary challenge was adapting it for Heroku’s multi-tenant architecture, as it was originally designed for single-tenant deployments. Multi-tenancy requires careful workload balancing to prevent any one tenant from overloading shared resources. Given that Heroku’s infrastructure supports over 300,000 active databases, implementing resource isolation policies was crucial to ensure fair compute distribution across tenants. To address this, the team implemented these policies and rolled out the integration incrementally, allowing engineers to monitor performance and make adjustments before expanding access.

Ensuring compatibility with Heroku’s managed Postgres environment was another significant challenge. Since Postgres extensions often rely on version-specific features, pgvector had to be tested across different releases. The team developed an automated testing framework to validate extension behavior, identifying and resolving conflicts before deployment. Continuous pre-production testing helped fine-tune the integration and minimize production risks.

Creating a seamless developer experience was also a top priority. To avoid manual setup, the team streamlined the provisioning process, enabling developers to provision a database with pgvector with a single CLI command. This frictionless experience made it easier for AI-focused developers to integrate vector search into their applications without extensive reconfiguration or external dependencies.

Illustration depicting how users can ask questions directly to PDF files using RAG with pgvector and OpenAI.

How did your team ensure pgvector met Salesforce’s security and trust standards?

Since pgvector is an open-source extension rather than a native Postgres feature, it underwent rigorous security evaluations before being approved for Heroku’s managed Postgres. Salesforce enforces strict trust and security standards, requiring all extensions to be thoroughly vetted for vulnerabilities. The team worked with security engineers to conduct threat modeling, identifying risks such as privilege escalation, unauthorized access, and denial-of-service attacks to ensure pgvector could be safely integrated without compromising customer data.

Penetration testing was another critical step, simulating real-world attack scenarios to assess pgvector’s resilience. Any identified vulnerabilities were addressed before production deployment. To meet Salesforce’s Trust & Security framework, pgvector had to pass extensive testing before being whitelisted as an approved Postgres extension, ensuring compliance with Heroku’s security policies. Additional safeguards were implemented to prevent unauthorized activation, requiring customers to explicitly enable pgvector before use.

After passing security reviews, pgvector was added to Heroku’s allowlist. Post-deployment, the team continues to monitor security metrics and conduct periodic reviews to ensure ongoing compliance, allowing developers to use pgvector confidently without risking data integrity or system reliability.

Terry shares some of the key traits that make engineers successful at Salesforce.

What were the major performance challenges involved in rolling out pgvector?

Optimizing pgvector’s performance was crucial, as vector search is computationally intensive and can create bottlenecks in Postgres. To prevent performance degradation, the team conducted large-scale performance benchmarking in both single-tenant and multi-tenant environments. They analyzed query execution times, memory overhead, and index performance under high-throughput workloads to ensure pgvector could efficiently support AI-driven applications at scale. This testing helped identify inefficiencies and refine pgvector’s behavior before deployment.

Indexing was a key area of focus. Since pgvector relies on approximate nearest neighbor search, indexing directly impacts query performance. Engineers experimented with various indexing strategies, fine-tuning recommendations to balance speed and accuracy. Caching was another critical aspect—by implementing efficient caching mechanisms, redundant computations were minimized, significantly reducing query latency.

Scaling pgvector in multi-tenant environments added complexity, as workload balancing was essential to prevent vector search operations from negatively impacting other customers. Engineers developed performance and regression tests to ensure that resource allocation dynamically adjusted based on query demand. This ensured that pgvector could scale efficiently while maintaining stable performance across Heroku’s managed database fleet.

How did customer demand and community feedback influence the decision to prioritize pgvector?

pgvector was not originally part of Heroku’s roadmap, but strong customer demand drove its prioritization. Developers frequently requested native vector search support in Postgres, aiming to eliminate the need for external vector databases. pgvector quickly became one of the most upvoted feature requests in Heroku’s public roadmap, with developers emphasizing the need for a built-in vector search capability over third-party solutions.

Beyond formal requests, the team observed widespread advocacy in the developer community. Several blog posts and technical discussions highlighted the need for native embedding storage and vector search in Heroku’s Postgres offering, reinforcing the importance of making pgvector available.

To validate the demand, a small group of engineers conducted a feasibility study to assess pgvector’s impact. Once internal testing confirmed that the extension could be integrated securely and efficiently, it was officially prioritized. The initial release targeted performance-tier customers, providing early adopters with access before expanding to multi-tenant and Essential Tier users. This staged rollout ensured stability while allowing a broader set of developers to benefit from pgvector’s capabilities.

Terry explores the power of curiosity and how that helps engineering teams.

How does pgvector fit into Heroku’s larger AI strategy, and what’s next for vector-based AI workloads?

pgvector is part of Heroku’s broader effort to simplify AI workloads within its ecosystem. By integrating vector search into Postgres, Heroku aims to enable developers to build AI-powered applications without relying on external services. In parallel, Heroku is piloting a managed AI inference service, allowing developers to deploy pre-trained models directly on the platform. pgvector complements this by serving as a scalable data retrieval layer for AI-driven applications.

Future improvements will focus on optimizing indexing strategies, reducing query latency, and enhancing multi-tenant performance. As AI adoption grows, Heroku remains committed to evolving its database offerings to support the next generation of machine learning applications, ensuring that developers can build and scale AI-powered solutions seamlessly.

Learn more

Related Articles

View all