If you know anything about Salesforce, you probably think of it as a “customer company” — a popular Customer Relationship Management system, or a platform for business apps delivered in the cloud. That’s all true — our mission as a company is to make our customers successful with their customers. And we do it damn well.
Technical readers will realize, however, that there’s no magic in running a cloud service: it’s just a lot of hard engineering work. Under the covers, “the cloud” isn’t delivered with pixie dust and jargon, it’s delivered by understanding and tackling some of the most challenging distributed engineering problems in computer science.
At Salesforce, the center of that challenge is data.
We have every kind of data you can imagine: relational data, file data, streaming data, high-scale data, search index data, analytical data, operational metrics and logs, queues, caches, map/reduce jobs, machine learning pipelines … you name it. And our customers expect all of that to be as reliable and predictable as the tides, day after day, at incredible scale. It’s a tall order, to put it lightly.
To give you a taste of how our world-class data engineering teams do it, we’ve collected a few pieces of writing from here on Medium (and around the web):
- Our resident mad scientist, Pat Helland, has been writing about databases since the 70s and shows no signs of slowing down. His latest article in the ACM Queue is “Standing On Distributed Shoulders Of Giants”. Don’t miss his previous installments either, like Immutability Changes Everything (ha ha, get it? Hmm.)
- The breadth of our scaling challenges has necessitated a lot of different solutions, like sharding our infrastructure, and ramping up our monitoring systems a little bit at a time. And it has led us to expand from traditional relational databases into contributing to large, horizontally scalable systems like Apache HBase and Argus. We’ve even come full circle — we created (and open-sourced) a relational engine on top of HBase, Apache Phoenix.
- Data doesn’t stand still, and that’s why the emerging world of event-stream-based architectures are taking center stage at Salesforce, and why we invest in systems like Apache Kafka to keep those streams flowing with low latency and high availability.
- Designing distributed systems to be not only scalable but correct isn’t a walk in the park. But thanks to engineer Diego Ongaro, the creator of the Raft consensus protocol, it can be a walk down the runway.
- The engineering team members at Heroku are no strangers to data. Whether it’s Event Driven Data Synchronization, using Redis, or leading the way on running Postgres at scale, they’re on the cutting edge of data.
- As our Chief Strategic Officer Adam Bosworth stated to the New York Times, “We’re headed into one of those historic discontinuities where society changes.” So we’ve also invested heavily in data companies that are helping us pursue the next generation of data intelligence and scale — companies like SalesforceIQ, Minhash, Tempo, MetaMind, and PredictionIO.
This week, we’ll be attending SIGMOD ’16 (which is across the street from our office, nudge nudge). You going? Come chat with us about the central role that data plays at Salesforce, and the exciting things coming next.