The Data Engineer will work closely with data analysts, data scientists, DevOps teams etc. to design, build, and maintain scalable data warehouse solutions that support business intelligence and analytics. This role requires expertise in data warehousing, ETL/ELT pipelines, cloud data platforms, and performance optimization.
The ideal candidate will be proactive in ensuring data quality, reliability, security, and continuous improvement of data workflows.
Responsibilities:
- Design, develop, and maintain robust data warehouse architectures and data models (e.g., star schema, snowflake schema) that support analytical and operational needs
- Build, maintain, and optimize scalable ETL (Extract, Transform, Load) or ELT processes and pipelines to integrate data from diverse sources into the data warehouse
- Implement data validation, quality checks, and security protocols to ensure data accuracy, consistency, integrity, and compliance with regulations (e.g., GDPR, HIPAA)
- Monitor system performance, troubleshoot issues, and implement optimization strategies (e.g., query tuning, indexing, partitioning) for efficient data storage and retrieval in the data warehouse
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements, provide technical guidance, and deliver high-quality data solutions and reports
- Collaborate with platform or DevOps teams to provision and maintain data infrastructure components, compute resources, networking, and access controls using infrastructure-as-code tools
- Ensure proper monitoring, logging, and alerting is in place for data pipelines, compute resources, and storage systems
- Automate data workflows and implement monitoring and alerting systems to ensure pipeline reliability and freshness
- Create and maintain comprehensive documentation for data flows, processes, system architecture, and schemas to facilitate knowledge sharing
Technical Qualifications:
- Proficiency in programming languages commonly used in data engineering, especially Python, SQL and PySpark
- Strong experience with relational databases and SQL for complex querying and data manipulation. PostgreSQL/SQL Server
- Proven experience with cloud-based data warehousing solutions such as Amazon Redshift, Snowflake, or Google BigQuery
- Experience with ETL and workflow management tools like Apache Airflow, Azure Data Factory, AWS Glue, Databricks workflow or Talend
- Experience with data infrastructure setup and configuration, including cloud storage, compute clusters, networking basics, and access/security controls
- Experience with big data technologies and frameworks such as Hadoop, Spark, or Kafka is beneficial
- Experience with machine learning (ML) integration and data science teams
- Knowledge of data visualization or business intelligence (BI) tools like Tableau, Power BI, or Looker
- Familiarity with major cloud platforms (AWS, Azure, GCP) and their data services
- Familiarity with NoSQL databases (e.g., MongoDB, Cassandra)
Personal Skills:
- Strong problem-solving skills and ability to quickly debug and resolve issues
- Effective communication skills with the ability to collaborate across teams
- Adaptability and willingness to learn new technologies in a fast-evolving environment
- Ability to work under pressure and troubleshoot critical production issues
- Team player with a proactive approach to improving processes and automation
Education and Work Experience:
- Bachelor's degree in computer science, IT, or a related field
- Minimum 3+ years of relevant experience
- Certifications in AWS, Azure or Databricks are a plus
Sponsored
Explore Data
Skills in this job
People also search for
Similar Jobs
Apply for this position
Sign In to ApplyAbout Creative Capsule
We're always looking for creative, talented and passionate people to join the Creative Capsule team.