The Journey of Building a Scalable API

The Journey of Building a Scalable API featured image

APIs are an essential tool to allow partners, developers, and applications to consume, communicate, or build on top of the various capabilities your microservices provide.

Building a high quality API that can scale and perform with the business ecosystem is not easy and requires putting thought and planning into everything, from choosing an execution environment to even determining what API technology you will use.

So how did we do it? In this blog post, I will share my experience of building the API for Activity Platform at Salesforce as a guide to writing an API for your own needs. Activity Platform is a big data event processing engine that ingests and analyzes over 100 million customer interactions every day to automatically capture data and to generate insights, recommendations, and feeds. Activity Platform provides APIs to serve these to our clients.

Choosing an Execution Environment

Depending on the requirement, an execution environment could be bare metal, a virtual machine (VM), or an application container. We chose application containers, as these can run on a physical machine or in a VM, and a single operating system instance can support multiple containers, each running within its own, separate execution environment. In a nutshell, containers are lightweight, portable, fast, and easy to deploy and scale, so they are a natural fit for microservices.

A note on container orchestration

If you decide to go with containers, like we did, container orchestration will help you automate the deployment, management, scaling, and networking of containers. There are many container orchestration tools to consider: Kubernetes, Apache Mesos, or DC/OS (with Marathon), Amazon EKS, Google Kubernetes Engine (GKE), and others.

We use Nomad clusters from Hashicorp. It’s simple, lightweight, and can orchestrate applications of any type — not just containers. It integrates seamlessly with Consul and Vault for service discovery and secrets management. You can easily describe the requirements a task needs to execute such as memory, network, CPU, and more along with specifying the number of instances you need to horizontally scale your service.

Choosing an API Technology

To build an API, we chose GraphQL. If you haven’t heard of it, it is a popular alternative to other available options like REST, SOAP, Apache Thrift, OpenAPI/Swagger or gRPC.

Why did we choose GraphQL?

We wanted to build an API that can serve various clients ranging from web to mobile app. It needed to be efficient, powerful and flexible.

GraphQL was the best fit for our needs for a few reasons:

1). GraphQL is database agnostic and can serve data from anywhere you want for your defined business domain. This means that underneath you can use Cassandra, Elasticsearch, or an existing API from other modules for a single query.

2). It allows clients to request exactly what they need, avoiding overfetching or underfetching. If an API returns more than what a client needs, there is a performance hit, and if it returns less, multiple network calls will slow the rendering time. GraphQL avoids both of these outcomes.

3). While most APIs do versioning, GraphQL serves a versionless API, as it only returns the data that’s explicitly requested, so new capabilities can be added via new types and new fields on those types without creating a breaking change.

4). GraphQL uses a strong type system where all the types are written in schema using the Graph SDL. It serves as the contract between the client and the server with no confusion about request/response structure.

5). GraphLQ supports introspection, so schema definition can easily be shared or downloaded using various tools like GraphiQL , GraphQL- playground, or cli tools.

GraphQL in Action

We used GraphQL in our Classification Insight API. Classification Insight offers information about a user and helps meeting participants know the titles and roles of other people present at the meeting. For this API, we used Kotlin and graphql-java, a Java implementation of GraphQL.

Step 1: Define your schema (e.g. schema.graphqls). Every GraphQL service defines a set of types. The most basic components of a GraphQL schema are object types, which represent a kind of object you can fetch from your service. Query type is to define the entry point of every GraphQL query.

In the schema below, I have defined a query “getClassificationInsightsByUser” which can be called later by posting this payload to your running api (e.g. localhost:8080/api) :
{ getClassificationInsightsByUser(emailAddresses: [“test1@gmail.com”, “test2@gmail.com”]) { userId, title } }

schema.graphqls

# object type to describe what you can fetch
type ClassificationInsightByUser {
  organizationId: ID!
  userId: String!
  emailAddress: String!
  title: String!
}
# Query type to define all your queries
type Query {
  getClassificationInsightsByUser(
    emailAddresses: [String!]!
  ): [ClassificationInsightByUser]
}

schema {
  query: Query
}

Step 2: Implement Datafetcher (also known as resolver) to resolve the field getClassificationInsightsByUser. A resolver is basically a function provided by the developer to resolve each field of type defined in schema and return its value from the configured resources like a database, other APIs, or from cache, etc.

In this example, our Query type provides a field called getClassificationInsightsByUser which accepts the argument emailAddresses. The resolver function for this field likely accesses a database and then constructs and returns a list of ClassificationInsightByUser object.

// Assuming you already have your data class
// (e.g. ClassificationInsightByUser) defined to hold the data

// Write your datafetcher class
class ClassificationInsightByUserDataFetcher:
  DataFetcher<List<ClassificationInsightByUser>?> {

  // override DataFetcher's get function.
  override fun get(env: DataFetchingEnvironment):
    List<ClassificationInsightByUser>? {    // get the argument passed in submitted query
    val emailAddresses = env.getArgument<List<String>>    (EMAIL_ADDRESSES)
    // write logic to get data from other API Or,
    // from your business layer calling your controller/service
    // Here, just returning the static data to keep it simple.
    return EntityData.getClassificationInsightByUser(emailAddresses)
  }
}

Step 3: Initialize GraphQLSchema and GraphQL Object (using graphql-java) to help execute the query.

// load all your schema files as string using your own utility function
String schema = getResourceFileAsString("schema.graphqls")

// create the typeRegistry from all your schema files
val schemaParser = SchemaParser()
val typeDefinitionRegistry = TypeDefinitionRegistry()
typeDefinitionRegistry.merge(schemaParser.parse(schema))

// runtime wiring where you wire your query type to resolver
val runtimeWiring = RuntimeWiring()
  .type("Query", builder -> builder.dataFetcher(
            "getClassificationInsightsByUser", ClassificationInsightByUserDataFetcher()
          )
  )
  .build();
// create graphQL Schema
val schemaGenerator = SchemaGenerator();
val graphQLSchema = schemaGenerator
  .makeExecutableSchema(typeDefinitionRegistry,runtimeWiring);
// create graphQL
val graphQL = GraphQL.newGraphQL(graphQLSchema).build();

Step 4: Write a servlet (MyAppServlet) to handle incoming requests.

override fun doPost(req: HttpServletRequest, resp:
    HttpServletResponse) {
  val jsonRequest = JSONObject(payloadString)
  val executionInput = ExecutionInput.newExecutionInput()
  .query(jsonRequest.getString("query"))
  .build()
  // execute your query using graphQL. 
  // It will call your resolvers to get the data
  // and only return the data that was requested.
  val executionResult = graphQL.execute(executionInput)

  //send the response
  resp.characterEncoding = "UTF-8"
  resp.writer.println(mapper.writeValueAsString(executionResult.toSpecification()))
  resp.writer.close()

}

Step 5: Embed the web server (jetty in this case) in your application.

// The Server
val server = new Server();

// HTTP connector, use HTTPS in production
val http = ServerConnector(server)
http.host = "localhost"
http.Port = 8080
http.idleTimeout = 30000

// Setup handler
val servletContextHandler = ServletContextHandler()
servletContextHandler.contextPath = "/"
servletContextHandler.addServlet(ServletHolder(MyAppServlet()), "/api")
server.handler = servletContextHandler

//start the jetty server to listen the request
server.start()
server.join()

Step 6: Build and start your application. Use your CI/CD tool to create, publish, and deploy your Docker images to your cluster.

Ensuring Your APIs are Secure

At Salesforce, security is our top priority. Our APIs are accessible only to registered users, and they can access only the data that they have the permissions for. You may want to explore OAuth 2.0 (JWT grant type and role based access control) and Open Policy Agent (OPA) for your access control needs.

As a best practice, your authentication middleware should be placed before GraphQL and have a single source of truth for authorization in the business logic layer, avoiding the need to check at multiple places. In addition to authentication and authorization, rate limiting, data masking, and payload scanning should also be considered while designing your API.

Conclusion

We have demonstrated how to build a scalable, efficient, secure API. We used application containers to scale, GraphQL and embedded Jetty to make it efficient and lightweight, and prioritized the security aspects of our API. We will discuss other aspects of API development, such as security and deployment, in more detail in upcoming posts.

Acknowledgement

Thanks to Alex Oscherov for keeping me honest about our systems and architecture and to Laura Lindeman for her review and feedback on improving this blog post. Also, I’d like to take the opportunity to mention it has been a wonderful learning experience working with the talented folks on the Activity Platform and Infra teams.

Please reach out to me with any questions. I would love to hear your thoughts on the topic. If you’re interested in solving challenges in the framework of software components built to ingest and process large volumes of streaming data from multiple sources, we’re hiring.

Choosing an Execution Environment

A note on container orchestration

Choosing an API Technology

Why did we choose GraphQL?

GraphQL in Action

Ensuring Your APIs are Secure

Conclusion

Acknowledgement

New to Salesforce?

About Salesforce

Popular Links

Choosing an Execution Environment

A note on container orchestration

Choosing an API Technology

Why did we choose GraphQL?

GraphQL in Action

Ensuring Your APIs are Secure

Conclusion

Acknowledgement

One Trillion Transactions Monthly: How Salesforce’s Observability Platform Scales with Zero-Code Instrumentation and Open Standards

Data Enrichment and Automation: Helping Salesforce Security Overcome the Threat Identification Challenge

Evolution of Region Assignment in the Apache HBase Architecture — Part 1

Scaling Cardinality For Time Series Data

New to Salesforce?

About Salesforce

Popular Links