Job Overview:
We are looking for an experienced Senior Data Engineer to design, build, and optimize scalable, high-performance data platforms using AWS cloud services and Python. The ideal candidate will play a key role in architecting end-to-end data pipelines, driving automation, ensuring data quality, and enabling analytics and AI workloads across the organization.
This role requires deep technical expertise in AWS data services, modern data architecture, and a passion for delivering reliable, high-quality data solutions at scale.
Key Responsibilities
Architect and implement scalable, fault-tolerant data pipelines using AWS Glue, Lambda, EMR, Step Functions, and RedshiftBuild and optimize data lakes and data warehouses on Amazon S3, Redshift, and AthenaDevelop Python-based ETL/ELT frameworks and reusable data transformation modulesIntegrate multiple data sources (RDBMS, APIs, Kafka/Kinesis, SaaS systems) into unified data modelsLead efforts in data modeling, schema design, and partitioning strategies for performance and cost optimizationDrive data quality, observability, and lineage using AWS Data Catalog, Glue Data Quality, or third-party toolsDefine and enforce data governance, security, and compliance best practices (IAM policies, encryption, access control)Collaborate with cross-functional teams (Data Science, Analytics, Product, DevOps) to support analytical and ML workloadsImplement CI/CD pipelines for data workflows using AWS CodePipeline, GitHub Actions, or Cloud BuildProvide technical leadership, code reviews, and mentoring to junior engineersMonitor data infrastructure performance, troubleshoot issues, and lead capacity planning
Required Skills & Qualifications
Bachelor’s or Master’s degree in Computer Science, Information Systems, or related field5–10 years of hands-on experience in data engineering or data platform developmentExpert-level proficiency in Python (pandas, PySpark, boto3, SQLAlchemy)Advanced experience with AWS Data Services, including:
AWS Glue, Lambda, EMR, Step Functions, DynamoDB, EDW Redshift, Athena, S3, Kinesis, Amazon Quicksight.IAM, CloudWatch, CloudFormation / Terraform (for infrastructure automation)Strong experience in SQL, data modeling, and performance tuningProven ability to design and deploy data lakes, data warehouses, and streaming solutionsSolid understanding of ETL best practices, partitioning, error handling, and data validationHands-on experience in version control (Git) and CI/CD for data pipelinesKnowledge of containerization (Docker/Kubernetes) and DevOps conceptsExcellent analytical, debugging, and communication skills
Preferred Skills
Experience with Apache Spark or PySpark on AWS EMR or GlueFamiliarity with Airflow, dbt, or Dagster for workflow orchestrationExposure to real-time data streaming (Kafka, Kinesis Data Streams, or Firehose)Knowledge of Lake Formation, Glue Studio, or DataBrewExperience integrating with machine learning and analytics platforms (SageMaker, QuickSight)Certification: AWS Certified Data Analytics – Specialty or AWS Certified Solutions Architect
Soft Skills
Strong ownership mindset with focus on reliability and automationAbility to mentor and guide data engineering teamsEffective communication with both technical and non-technical stakeholders