Kunal Nawale is a Software Architect at Salesforce who designs, architects, and builds Big Data Systems. Kunal did the design and architecture of this open source project. Colbert Guan did the implementation and testing.
The entire HBase table is split into regions; each region has a start rowkey and a stop rowkey. The information about each region is stored in HBase in the data structure called RegionInfo. The RegionInfo has the following format:
On startup, the OpenTSDB read daemons connect with HBase master and retrieve the Region Info map for each region server. This region info map is stored in a local cache and refreshed often. During query execution time, the query is mapped to the regions. The regions are mapped to the region servers. These region servers are then queried for the metric data.
So, let’s return to our original problem: how to determine region/region-server on demand without a painstaking log search. This was a question we asked ourselves repeatedly. After looking around, we could not find any such tool in the open source community that we could use. Therefore, we decided to build a tool that would help ease our problem and hopefully yours, too. We are very happy to announce that we have built this tool and are open sourcing it. TSDB HBase Region Finder is now available here. This tool includes both a web server and a command line version.
Here is an example of how to use the web server version:
The cli version operates similarly :
$ bin/cli envoy.server.uptime
If you would like us to add any features, or have any suggestions on the tool, please let us know via GitHub. (Or better yet, send us a Pull Request!)
The Salesforce Infrastructure organization has many such exciting problems to solve. Some of these problems present very difficult scale challenges that are unique and available at very few companies in the world. If such challenges excite you then please reach out to us via our careers page.