Job Description:
? Design, build, and maintain scalable data pipelines using PySpark and Python
? Develop and optimize complex SQL queries for large datasets
? Implement and manage ETL/ELT processes ensuring data quality and reliability
? Collaborate with business and product teams to translate data requirements into solutions
? Build and maintain data warehouse solutions
? Handle large-scale data processing using Hadoop/Big Data technologies
? Perform performance tuning and optimization of data workflows
Required Skills:
? Strong hands-on experience with PySpark and Python
? Advanced proficiency in SQL
? Solid experience in ETL processes and data warehousing
? Familiarity with Hadoop ecosystem and Big Data technologies
? Experience working with large datasets in distributed environments
? Good communication and business understanding
Good to Have:
? Experience with Apache Airflow
? Exposure to cloud platforms (AWS, GCP, Azure)
? Knowledge of data lakes and modern data architectures
? Experience with streaming tools (Kafka, Spark Streaming)
Disclaimer: This job posting has been aggregated from external source. Role details, content, and availability are subject to change. Applicants are advised to confirm the latest information directly on the company website before applying.
Keyskills: Performance tuning SQL queries GCP Data processing Data quality Apache big data Business understanding Data warehousing Python
Indium is in the business of enhancing software quality throughout the Software Development Lifecycle. Since 1999, Indium has been offering software testing solutions across a wide spectrum of organizations such as Independent Software Vendors (Product Companies), Software services companies and IT ...