Core Technologies Apache Spark (Core, Structured Streaming) PySpark Databricks (AWS/Azure) Advanced SQL DevOps CI/CD Jenkins Git/GitHub/Bitbucket Programming Python SQL Cloud Infrastructure AWS (S3, EMR, EC2, IAM, CloudWatch) Databricks Runtime Cluster Management- Preferred Streaming Integration Apache Kafka Snowflake integration (preferred) Airflow (preferred) Preferred Qualification Experience in financial services, payouts, or enterprise data platforms. Hands-on experience in Delta Lake optimization and incremental processing strategies. Experience with Snowflake data warehousing. Strong understanding of distributed computing principles. Key Responsibilities Design and develop scalable batch and near-real-time ETL/ELT pipelines using Snowflake (AWS) and Apache Spark (PySpark, Spark SQL, Structured Streaming). Build structured streaming pipelines using Kafka and Spark Structured Streaming. Design dimensional data models (Fact/Dimension, SCD Type 2). Orchestrate pipelines using Databricks Workflows / Apache Airflow. Integrate CI/CD pipelines using Jenkins, Git, Bitbucket/GitHub for automated deployment across DEV/UAT/PROD.
Disclaimer: This job posting has been aggregated from external source. Role details, content, and availability are subject to change. Applicants are advised to confirm the latest information directly on the company website before applying.
Keyskills: GIT Lead Software spark Cluster Management Programming Data warehousing Financial services SQL Python