Databricks Workspace replication

Problem we found

Multinational Pharmaceutical company wanted to utilize new Microsoft Azure and Databricks capabilities in order to improve performance, reduce cost, optimize their operations and develop new products.   

Customer Vision

  • Develop an automated way of replicating all the workspace assets into the new Azure and Databricks platforms
  • Make the process repeatable in order to replicate 250+ workspaces with minimal business downtime

Technical Pain

  • No automated tooling
  • Tooling had to be comprehensive, recoverable, and auditable.  
  • Tooling needed to handle workspaces with all kinds of previously unidentified areas
  • Tooling had to account for users/groups being different or missing in the new workspace 
  • Tooling had to run significantly faster 

Solution we implemented

  • Developed a python based automated tool to replicate Databricks Workspaces and solve for the above technical challenges
  • Migrated 50+ Databricks workspaces from legacy Azure to the latest Azure platform 
  • Documented and enabled the customer’s in-house team to replicate a further 150+ Workspaces
  • Enabled new Databricks features such as Unity Catalog + Serverless
  • Enabled optimizations on Databricks to reduce infrastructure costs
Workspaces replicated

Positive Outcomes

  • Automated replication process for 50+ workspaces
  • Enabled the customer team to replicate 150+ additional workspaces using the automation tool
  • Future-proof architecture to enable new Azure and Databricks features 
  • Ability to create new use cases using Unity Catalog, Serverless and other Databricks features
  • Completed the project ahead of schedule and below budget