What is a Sandbox?
A Sandbox is an isolated environment where you can safely experiment and test changes that you intend to apply to your live environment. The idea is to learn about the implications of these intended changes in advance, in order to decide on how to best conduct them, without actually causing any harm to the live environment in the process.
How is This Usually Addressed?
The way in which most companies solve the sandbox requirement is by copying the live environment onto a completely separate environment. Testing and experimenting then takes place on the copied environment, with no impact on the live one. This approach has a few drawbacks:
- Copying the data from one environment to another, along with the entire configuration, is a very time consuming process.
- While working on your sandbox environment, your live environment gets constantly updated with new data and configuration changes. This means that within time, the sandbox environment will no longer truly represent the production environment.
- In most cases, the data is only partially copied from the live environment to the sandbox. This data deficiency in the sandbox makes it impossible to truly appreciate the full implications of your sandbox actions.
How is Sandbox for Datorama Different?
Rather than having two separate environments, our solution is based on having two separate modes, a sandbox mode and a production mode, within the same environment, where changes made in the sandbox mode have zero impact on the production mode. Having the two modes in the same environment eliminates the need to copy data from one environment to another, meaning the data is never partial and the setup process is always instantaneous. Furthermore, the two modes never differ with the passage of time. They are both perfectly aligned, 100% of the time. The way in which we were able to achieve this is by making the distinction between sandbox and production right down to the entity level, rather than the environment level. So, any given list of entities, be it dashboards, reports, pages, data streams, etc., is identified as either being production or sandbox entities:
From there, we just make sure that the right entities are accessible from the relevant environment. So, within production, only the production entities are accessible, since all of the Sandbox entities are filtered out. This means that anything you do within the sandbox has no impact on production whatsoever. Within the sandbox environment, you have both production and sandbox entities. This is because, in addition to adding new sandbox entities, you may also want to edit existing production ones to see how they interact:
The Data Stream Entity — Loading & Querying sandbox data
In order to explain Data Querying in sandbox mode, we will examine one of the main entities a customer has on their workspace — Data Stream.
A data stream is the configuration entity that is responsible for the data ETL process. It is responsible for pushing/pulling data from the different data providers, transforming the data according to the user definition, and loading it to the Datorama data warehouse.
To allow differentiating between the different streams data, the Data Stream ID is part of every row of data.
When the user edits a Data Stream in sandbox mode, what actually happens behind the scenes is that a new data stream is created with a different ID, and the relevant data is re-loaded to the data warehouse with this new ID; all of this is hidden from the user.
Having both sets of data (Production and Sandbox) reside next to each other in the database with the knowledge of the “replace data” relation between the 2 entities allows us to manipulate the DB query and filter in and out the desired data according to the user’s sandbox/production mode.
For example in Sandbox Mode, the query will filter out:
- Data Stream that was deleted in Sandbox
- The data of an origin data stream which was edited in Sandbox (i.e filter out the production stream and include the sandbox stream which replace it)
In Production Mode, the Query will filter out all Sandbox data stream’s data.
Example of Querying Data in Production Mode Vs Sandbox Mode
- The data stream table shows all the stream configuration
- The Sandbox Entity Table describes the relationship between the Sandbox entity and its “Sibling” Production streams. In this example, the Query will replace the production stream data with its Sibling sandbox stream data.
- DW is the data warehouse where both sets of data (production & sandbox) reside.
Sandbox Entity Management
There are three main operations a user can perform on entities when working in their virtual sandbox:
- Adding an entity that does not yet exist in production
- Editing an existing production entity
- Deleting an existing production entity.
To allow managing sandbox entities, we use a new DB Table and a common infrastructure that allows registering sandbox operations.
For example, when a production entity is edited in sandbox mode, an additional record is added to the database schema; this record describes that Entity ‘123’ should be replaced with Entity ‘456’ in sandbox mode.
Another example is when a new entity that does not exist in production is added in the sandbox.
Switching Between Sandbox & Production Views
When a user selects to switch from production to sandbox view, what actually happens behind the scenes is that an API on the server that registers this action is invoked and responds with a unique sandbox token.
This sandbox token is saved on the user’s browser session. From this point of time, all the requests sent from the client are intercepted and the sandbox token is added to the request.
The above sandbox token that indicates the mode is propagated through the method call stack and allows the server to decide which way to create or update entries configuration and which data to retrieve from storage.
Conclusion
The new approach Datorama took with the Virtual Sandbox created huge benefits for Datorama customers, allowing them to create a Sandbox environment that includes all of their production data in one click and with zero duration. And, more than that, this sandbox environment is continually updated with the latest production changes.
From the engineering side, this approach eliminated the need to transfer huge amounts of data from the production environment to the sandbox environment and dramatically reduced the processing and storage needed.