

Big Data Developer with 3+ years of experience designing, building, and optimizing cloud-based data pipelines using Apache Spark (Scala, PySpark, Spark SQL) on AWS and Azure platforms. Strong expertise in data lake and lakehouse architectures using Amazon S3, Delta Lake, and Parquet, with hands-on experience in ETL pipelines, performance optimization, and workflow orchestration using Apache Airflow. Proven ability to process large-scale structured and semi-structured datasets.
Technologies: Apache Spark, Scala, PySpark, Spark SQL, Hadoop, Hive, Sqoop, AWS (S3, EC2, EMR), MySQL, and Airflow.
Technologies: Apache Spark, Hadoop, Hive, Sqoop, Spark SQL, HDFS.
AWS: S3, EC2, EMR
Azure: Azure Databricks, Azure Blob Storage
Apache Spark (Scala Spark, PySpark, Spark SQL)
Hadoop (HDFS), Hive, Sqoop
Python, SQL
Apache Airflow, AWS Lambda, GitHub
Parquet, Avro, ORC, JSON, CSV