This role is for one of our clients
Industry: Technology, Information and Media
Seniority level: Mid-Senior level
Min Experience: 5 years
Location: Remote (India)
JobType: full-time
Key Responsibilities
Design, build, and optimize scalable ETL/ELT pipelines using PySpark and Databricks.
Develop and maintain data ingestion frameworks for batch and streaming data processing.
Write efficient and optimized SQL queries for data transformation and performance tuning.
Implement data solutions on Azure Cloud (Azure Data Lake, Azure Data Factory, Azure Synapse, etc.).
Collaborate with data architects, analysts, and business teams to understand data requirements.
Ensure data quality, governance, and security best practices are followed.
Monitor pipeline performance and troubleshoot data workflow issues.
Work with CI/CD and version control processes for data engineering deployments.
Participate in design discussions and recommend improvements in data architecture.
Required Skills
Strong hands-on experience with PySpark and distributed data processing.
Advanced SQL development and query optimization skills.
Experience working with Databricks (notebooks, jobs, workflows, Delta Lake).
Good understanding of Azure cloud services related to data engineering.
Experience in building scalable data pipelines and data lakes.
Strong understanding of data modeling and ETL concepts.
Familiarity with Git and Agile development methodologies.
Skills
PySpark
SQL
Databricks
Microsoft Azure