Serverless ETL (AWS Glue + PySpark), Airflow orchestration, FastAPI tooling, CI/CD (Azure DevOps/GitLab), and data governance.
Overview
At PwC (May 2023 – Oct 2024), I delivered reusable SQL/Python data models for SAP data, built serverless ETL pipelines, and migrated an internal Handelsregister tool to FastAPI with PostgreSQL and Angular to eliminate reliance on external sites.

Reusable Data Models & Automated Deployment Pipelines
- Developed reusable SQL/Python data models to integrate SAP-based client data, embedding data quality checks to ensure accuracy and consistency. This reduced onboarding time for new clients by 30% and ensured clean, structured data was available for downstream BI and analytics teams
- Implemented CI/CD pipelines using Azure DevOps and GitLab with Infrastructure as Code (IaC) practices (Terraform/HashiCorp Vault), improving release automation, and reducing deployment errors by 25%.

Serverless ETL & Data Governance in AWS
- Built serverless ETL pipelines in AWS Glue (PySpark), Apache Airflow and applied data governance practices (schema validation, role-based access, and lineage tracking) with automating schema evolution, partitioning, and performance tuning. This improved query efficiency in Athena/Redshift by 40% and enabled seamless delivery of curated datasets to APIs and dashboards

Enhanced Web Tool for German Handelsregister Data
- Migrated and scaled an internal website tool from Flask to FastAPI with PostgreSQL and Angular, deployed on Kubernetes, enabling PwC consultants to directly extract German company registry details (Handelsregister) without relying on external websites. This improved performance, security, and data accessibility for large-scale registry extracts.
- Created Power BI dashboards for internal PoCs, enabling leadership to track data pipeline efficiency and support decision-making based on downstream KPI visibility.