Aditya Agarwal ⚡
Data Engineer & Cloud Architect
I build scalable data pipelines and cloud infrastructure that process 500GB+ daily, powering AI-driven health and educational products. Currently pursuing MS in Computer Science at Indiana University.
Technical Arsenal
Comprehensive expertise across cloud platforms, big data processing, and modern development stack
Cloud Platforms
Extensive experience with enterprise cloud solutions and serverless architectures
- AWS S3
- EC2
- EMR
- Glue
- Athena
- Lambda
- Kinesis
- Azure ADF
- Databricks
- Synapse
Big Data & ETL
Scalable data processing pipelines and real-time streaming solutions
- Apache Spark
- Flink
- PySpark
- Hadoop
- Kafka
- Airflow
- DBT
Databases
Modern data warehouses and NoSQL solutions for diverse use cases
- Snowflake
- Redshift
- PostgreSQL
- MongoDB
- DynamoDB
Programming
Multi-language proficiency for data engineering and software development
- Python
- SQL
- Scala
- Java
- C++
Visualization
Interactive dashboards and business intelligence solutions
- Power BI
- Tableau
- Looker
Professional Journey
Software Developer
Configuring AWS cloud platforms for AI-based health/educational products, designing data pipelines, developing backend APIs with Python/MySQL.
Research Assistant
Leading graph database benchmarking project, integrating Stack Overflow Graph into Neo4j, GraphFrames, TigerGraph, PuppyGraph.
Data Engineer II
Migrated 70TB health claims data to AWS/Azure, built real-time Kafka streaming (500GB/day), reduced manual testing by 30%.
Data Engineer I
Built dbt analytics layers, tuned SQL queries (80% performance improvement), created 15+ dashboards.
Featured Projects
Real-Time Change Data Capture Pipeline
Built CDC pipeline using Debezium & Kafka for MySQL data streaming, automated ETL with Airflow, integrated with Google BigQuery.
Real-time data synchronization with sub-second latency
ETL Pipeline for JSON Data Processing
Processed large JSON files with PySpark, flattened nested fields, created normalized tables, reduced execution time by 60%.
60% reduction in execution time with optimized data processing
HealthEdge Data Migration
Migrated 70TB of US Health Claims data to cloud, improved data availability by 80%, built real-time streaming processing 500GB/day.
80% improvement in data availability • 500GB/day real-time processing
Ready to Build Something Amazing? ⚡
Currently seeking full-time Data Engineering opportunities. Let's discuss how I can help scale your data infrastructure and unlock insights from your data.
📞 +1 (930) 333 2884