Cloud Data Platform
A digital platform engineering, cloud, and data project for a global healthcare technology company with self-service ability across all layers.
Objectives
- Implement data platform using Microsoft Cloud’s cloud-scale analytics with detailed business and data discovery to deliver scalable and secured platform.
- Focus on growing digital healthcare ecosystem by providing real-time data to downstream systems with 100% partner integration and multi-tenant solution.
- Data democratization by providing data as a product to end customers and enable them to become more data driven.
- Pull data from 100+ devices and partners to process real time.
Challenges
- Fragmented data environment with disparate and obsolete tech stack.
- Focused on operational reporting with ad-hoc data governance.
- No common framework due to lack of standardization and integration.
- Longer time to market.
- Lack of security compliance.
- Required compliance to healthcare standards like FHIR, OMOP, and HL7.
Challenges
- Fragmented data environment with disparate and obsolete tech stack.
- Focused on operational reporting with ad-hoc data governance.
- No common framework due to lack of standardization and integration.
- Longer time to market.
- Lack of security compliance.
- Required compliance to healthcare standards like FHIR, OMOP, and HL7.
Solution
Implementation of cloud adoption framework recommended by Microsoft.
Decentralization with a core data mesh tenet to implement domain-oriented data ownership. This also helped in implementing data-as-a-product concept.
Implementation of modern data engineering architectural concepts like Unified Data Highway, Transient Data Landing Zones, Relay Semantics, Health Lakehouse, Microservices to publish data, Kafka Connectors, Gold/Silver/Bronze data models.
Implementation of Azure Purview to discover and classify data with metadata to build a unified data map.
Using infrastructure as a code to manage infrastructure with configurable files to build a more configurable and scalable framework.
Implementation of core best practices of DataOps using Microsoft Azure Data Factory.
Technology
Azure Data Factory, Azure Databricks, Azure Data Lake, Synapse, Python, Spark, Lakehouse, Azure Purview
Message Hub, Azure Data Lake Storage Gen2, Azure Blob Storage, Event Hubs
Azure DevOps, Profisee
Outcome
Established Data as a Product using data mesh.
99% on-time data availability.
100% cloud environment.
Improved competitive advantage by enabling ease of deployment and collaboration across partners.
Real-time data processing to support critical research.
Unified data taxonomy for compliance to healthcare industry standards.