Josys

Lead AI Data Engineer

Josys Remote Today
data

Lead AI Data Engineer

Location: Bengaluru, Karnataka, India

About the Role:


We’re looking for an experienced AI Data Engineer (4-8 years) to join our data team. In this role, you’ll build and maintain our data infrastructure on AWS, enabling analytics and AI teams to extract actionable insights. You’ll design and manage end-to-end data pipelines, ensuring high-quality, reliable, and real-time data, while also contributing to ML/GenAI workflows and model deployment pipelines.

What You'll Do:

  • Design and build scalable data pipelines/transformations using Spark / PySpark / Scala.

  • Manage and optimize Airflow DAGs for complex data workflows.

  • Clean, transform, and prepare data for analytics, AI, and ML use cases.

  • Use Python for automation, data processing, and internal tooling.

  • Work with AWS services (S3, Redshift, EMR, Glue, Athena) to maintain robust data infrastructure.

  • Collaborate with Analytics and AI teams to design pipelines for ML/GenAI projects.

  • Contribute to Node.js (TypeScript) backend development for data services.

  • Automate deployments using CI/CD pipelines (GitHub Actions).

  • Monitor, troubleshoot, and ensure data quality, consistency, and reliability across systems.

  • Build and maintain data warehouses/lakes and handle real-time streaming data using Kafka or similar technologies.

What You'll Need:

  • Bachelor’s or Master’s in Computer Science, Engineering, or related field.

  • 4-8 years of hands-on experience in data engineering.

  • Strong expertise in Spark / Scala for large-scale data processing.

  • Proficient in Airflow for managing and optimizing complex DAGs.

  • Advanced Python skills for data manipulation, automation, and tool development.

  • Proven experience with AWS related cloud services (S3, Redshift, EMR, Glue, Athena, IAM, EC2).

  • Solid understanding of ETL/ELT, data preparation, and analytics workflows.

  • Familiar with Node.js and TypeScript for backend data services.

  • Experience with automated CI/CD (GitHub Actions).

  • Familiarity with CDC Tools like Debezium.

  • Strong SQL, knowledge of data warehousing and streaming (Kafka, Flink, Kinesis), and excellent communication skills.

Bonus Points:

  • Experience with data lake technologies (Delta Lake, Apache Iceberg).

  • Knowledge of ML/GenAI model deployment pipelines.

  • Familiarity with data governance, quality frameworks, and statistics.

  • Experience with infrastructure as code (Terraform).

  • Familiarity with containers (Docker, Kubernetes).

  • Experience with monitoring and logging tools (Prometheus, Grafana).

Skills in this job

People also search for

More jobs at Josys