Modernizing Healthcare Governance and Security with Zero-Downtime

Author: Wavicle Data Solutions


Healthcare companies are struggling to manage and secure growing volumes of data, including information from medical devices, prescriptions, insurance claims, and sensitive PHI and PII data, while staying compliant with strict regulations and delivering optimal patient care.

 

Many still rely on spreadsheets and manual ticketing systems, which are difficult to maintain and prone to errors. These systems are manageable when data volumes are smaller, but as data grows and schemas change, complexity increases, and maintenance costs rise. To control costs, teams often split data across different systems. This creates a fragmented environment with no centralized security or data integration.

 

Without fine-grained access controls, clinicians may not get the information they need, which affects patient care. The absence of data lineage makes it hard to trace how data has changed over time. This creates blind spots in auditing and regulatory reporting. Proving data integrity becomes time-consuming and expensive, making it difficult to meet HIPAA compliance requirements.

 

Modern healthcare demands governance and security that moves as fast as the data it protects. That means:

 

  • Centralized Policy Control: Author policies and apply them everywhere, across EHRs, billing systems, device telemetry, and research datasets.
  • Fine-Grained Access: Enforce HIPAA’s least-privilege principle at table, column, and row levels.
  • Real-Time Lineage and Audit Trails: Capture every read, write, and transformation automatically, removing the burden of manual tracing.

To bring these capabilities to life in a highly regulated environment, Wavicle Data Solutions, a data and AI consulting company, helped a healthcare company improve data governance and security by migrating to Databricks Unity Catalog on Microsoft Azure.

 

The problem

The healthcare company previously used Apache Hive Metastore, managing 2 TB of critical data spread across 80 schemas and more than 50,000 tables. Multiple versions of Hive tables were being used in different processes, adding to the complexity of the environment. This made governance, security, and audit-readiness difficult to maintain in a decentralized environment.

 

The solution

Databricks was chosen as the platform, for its unified metadata management. Unity Catalog facilitated fine-grained database controls and row-level security that could be aligned to clinical roles. Its built-in audit trails and cross-cloud policy synchronization enabled consistent governance across cloud environments. The customer’s data environment was built using Databricks Delta Lake, Azure Data Factory, and PySpark. Here’s a step-by-step approach to how Wavicle made significant improvements for the healthcare company to migrate from Apache Hive to Databricks Unity Catalog.

 

Step 1: The Wavicle team used Databricks’ UCX Utility Tool to assess and capture the metadata of the current environment. This generated a migration plan and mapped out which schemas, data assets, and tables needed to move. UCX also provided a few dashboards to track the progress of each schema, user group identities, and permissions to be migrated.

 

The Wavicle team realized that to create a detailed migration plan, specific to the healthcare company’s environment and needs, and track the status of the migration, these dashboards were not sufficient. They built additional dashboards on the UCX metadata to view the customer environment holistically, identify duplicate tables across schemas, narrow down the objects to be migrated, track object dependencies, etc., and this additional insight helped create a detailed migration plan and prioritize the migration tasks.

 

Knowledge about the customer environment and their architecture helped the Wavicle team build custom migration solutions, and along with UCX developed an optimized zero downtime migration plan.

 

Step 2: A metadata-first approach was adopted to maintain continuity. Instead of copying data immediately, existing Delta Lake locations were registered as external tables in Unity Catalog. This enabled a fast and seamless transition, completing the migration of over 50,000 tables in under two months.

 

Databricks UCX - Wavicle

 

Step 3: Governance was embedded directly into pipelines rather than layered on top. Key Databricks capabilities included:

  • Automated policy enforcement for new tables and columns
  • Use of Databricks Genie to auto-generate table and column descriptions
  • Unity Catalog’s built-in data quality features to ensure:
    • Data quality standards were met
    • Security checks ran before data was surfaced to clinicians or analysts
    • ETL/ELT pipeline access and modifications were tracked with live visibility
    • Metadata and schema updates were automated

This approach not only reduced complexity and improved governance but also accelerated data delivery.

 

Step 4: To unify a previously fragmented environment stemming from client-specific siloed repositories, diverse metadata structures, and inconsistent access controls, Wavicle redesigned the catalog structure to align with the customer’s access patterns while maintaining centralized control. Fine-grained permissions were implemented at the user, role, and object level — fully aligned with HIPAA and healthcare regulatory standards.

 

Step 5: To ensure all tables are migrated, the Wavicle team explored Databricks’ Deep Clone capability. It would assist in creating precise snapshots of the latest version of legacy Hive data, while migrating the external tables, without the delta or version logs. That became a challenge as the healthcare company maintained multiple versions of the same tables in their Hive Metastore. They used it for data lineage, schema evolution, regulatory and auditing purposes. And they expected all the versions of the tables to be migrated to their new environment on Databricks.

 

The Wavicle team understood the customer requirements and realized that the Deep Clone capability alone will not suffice. They developed a custom solution on top of UCX to facilitate the migration of all versions of the tables. The Wavicle team further automated the entire migration process—schemas, tables, properties, and permissions—using Databricks APIs and Python scripts. This ensured speed and reduced manual errors.

 

Step 6: Given the customer’s requirement for zero downtime, Wavicle implemented automated data replication pipelines. Real-time sync between the legacy source and new target environments ensured continuous data availability and SLA compliance.

 

Step 7: By using the UCX Tool along with the custom solution, the Wavicle team ensured data accuracy and full audit-readiness from Day One. The team referred to the custom-built dashboards regularly to understand the progress of the migration and which tables were migrated over.

 

 

Databricks UCX migration workflow - Wavicle

 

The result

With Wavicle’s tailored approach and Databricks Unity Catalog, the healthcare company not only modernized their data governance but also laid the foundation for long-term innovation and operational excellence building a healthcare catalog. Clinicians are now able to access the required medical information and deliver optimal patient care.

 

HIPAA compliance requirements are being met with data lineage and data integrity. This conversion positions them as a leader in healthcare data management, ready to tackle future challenges and deliver superior value to their clients.

 

The customized solution built on top of UCX is packaged by the Wavicle team as value add service and is being leveraged in customer security and governance modernization projects. For additional information, schedule a strategy call with our Wavicle experts to learn how we can strengthen your data governance with Databricks Unity Catalog.

 

Wavicle Data Solutions
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.