As you may have read in the initial posts in this channel, Salesforce works with a lot of Open Source software. This has so many benefits, but to get the most out of it, you need to be thoughtful about how you do it. If you’re not careful, the goals of the company and any given open source project might sometimes be at odds.
I work on projects related to Apache HBase (read our recent article about that here). Here are a few informal rules we follow on these projects (HBase, Hadoop, Zookeeper, etc) to keep things moving forward:
1. No Forking!
We do not fork any of the projects that we use. By “fork”, I mean, make a departure from the Open Source repository significant enough to prevent us from contributing patches back to the project, or to apply new Open Source updates to our repository. We do (almost) all work against the open source branches directly.
Why not fork? While it might seem like a harmless thing to do, and might even appear to increase your velocity at first, the problem is that divergence from the open source build basically means you’re “getting off the train” of ongoing development on the project. Most teams who fork an Open Source Project think that it’ll give them more freedom, but the problems it causes are generally much bigger than you expect.
So for that reason, we’ve made it a rule that we won’t fork.
2. Keep Internal Copies
Sometimes you must fork. 😉
OK, well, not really fork. But we do keep internal copies of the relevant repositories (HBase and all dependent projects: Hadoop, ZooKeeper, etc). We maintain a minimal set of patches in these local copies of the repositories. Mostly these are changes to the way the software is built, such as defining where to pull dependencies from — for example, we want to build HBase against our own build of Hadoop.
We attach internal version numbers to the builds we do locally (such as “0.98.4-sfdc-2.2.1”), to indicate the exact version of what we’re running in production. (“sfdc” stands for “salesforce dot com”, in case that wasn’t clear.)
3. Stay Agile
Sometimes, we have a patch that we’ve written and need to apply internally right away. We always contribute these patches to the upstream project right away, but realistically it takes some time for these patches to make it into a published release of the project. Sometimes you can’t wait that long, and those are cases where having a local copy comes in handy.
Everything we run in production is built automatically (via Jenkins jobs) from the source against these internal repositories. This allows us to be agile in case of emergencies.
4. Contribute!
It’s great to use Open Source software. But it’s even better to contribute to it! This might be implied by what I’ve said above about my teams, but let’s be explicit about it: we don’t add any functionality to open source software that we don’t also contribute to the community.
Most of our work is done directly in the open source project (using Apache’s Jira, git repos, and mailing lists). When we do have to make internal patches, they are eventually cleaned up and contributed back to open source, so that we can follow along the release train of minor versions (0.98.4, 0.98.5, etc.)
5. Update Repositories Manually
Updates to our internal repositories are made per our internal schedule. We do not track the open source branches automatically. At our own pace, when we are ready, we move to a new upstream version, which most of the time allows us to remove some of the one-off patches we had applied locally.
For example, we stayed at HBase 0.98.4 for a while with some patches on top, and recently moved to 0.98.7, to which we had contributed all of the patches. That was great–it emptied out our local patch folder, and brought us back to using the vanilla Open Source release!
6. Choose Carefully
As you can imagine, since we manually choose when to bump up to new versions, that also means we spend a lot of time reviewing the open source releases. This is to make sure they are fully stable, and suitable for us to use in our own production environments.
In fact, we’ve spent so much time on this that committers who work at Salesforce have been named “Release Manager” for two different major HBase release series: HBase 0.94 (myself) and HBase 0.98 (Andrew Purtell).
Conclusion
This is a fairly simple and understandable model. But, it lets us avoid forking, so we can tack along with the public open source releases, remain agile, and be in full control of exactly what we deploy, at our own pace. By doing this, open source and corporate goals do not have to be at odds.
To learn more, check out this video of my keynote at HBaseCon, where I expand more on what it means to fork (or not).
Got opinions about whether or not to fork Open Source projects you use? Tell us about it in the comments.