Is the Senior Data Engineer role at Involve remote?

The Senior Data Engineer role at Involve is an on-site position located in Kuala Lumpur, MY.

How do I apply for the Senior Data Engineer position at Involve?

You can apply for the Senior Data Engineer position at Involve directly through HireHere. Click the "Apply" button on the job listing to be taken to the application page.

Senior Data Engineer

InvolveKuala Lumpur, MY11h ago

About the team & the platform you’ll own

Data Engineering is part of the Data Team. The team build, maintain, and continuously improve the company’s data infrastructure covering the Data Lake, CDC (Change Data Capture), compute clusters, and our workflow orchestrator (Apache Airflow) along with other analytics workloads that support decision-making across the business.

Today, we ingest and process tens of terabytes of data, with an aggregate workload of 1,000+ runtime hours per day across our pipelines and jobs. This role is for someone who gets excited about operating data systems at scale with reliability, performance, cost efficient, and clean engineering practices.

About the role

We’re looking for a Senior Data Engineer to design and build robust, scalable data systems with a reliability-first mindset and strong software engineering fundamentals. You’ll lead architecture and technical decisions for pipelines (batch + streaming where applicable), mentor other engineers, and raise engineering standards across the team.

This is a hands-on role: you’ll ship, operate, and improve production systems, not just design them.

What you’ll do

Build and operate scalable data systems

Design, build, and maintain scalable data pipelines (batch and streaming) that are reliable, observable, and cost-effective.
Own end-to-end pipeline architecture: ingestion → processing → storage → serving, including data modeling and performance considerations.
Improve and extend our core infrastructure: data lake, CDC pipelines, compute cluster workloads, and Airflow orchestration.
Work deeply with distributed processing and data lake concepts, including performance tuning and stability at scale.

Engineering excellence & production readiness

Develop in Python or at least one big-data language (e.g., Scala or Go), writing clean, modular, testable code.
Apply strong software engineering practices: design patterns, trade-offs, DRY principles, dependency management, code reviews, and CI/CD.
Raise the bar on documentation: architecture diagrams, data contracts, operational playbooks, runbooks, and decision records.

Reliability, observability, and incident ownership

Define and operate system observability:
- establish metrics/dashboards (latency, throughput, failure rate, resource usage, SLA/SLO adherence)
- implement alerting + runbooks
Lead root-cause analysis for complex incidents and recurring failures; implement permanent fixes (not just patches).
Partner cross-functionally with analytics, product, platform, and DevOps teams to align data solutions with business needs.

Leadership & mentoring

Mentor and level up other engineers through pairing, reviews, technical guidance, and best-practice evangelism.
Lead technical discussions, drive alignment, and make pragmatic decisions with clear trade-offs.

The “extra mile” mindset we value

We value engineers who don’t stop at “it works.” You’ll thrive here if you naturally:

Stay with hard problems until the real root cause is found (not just symptoms).
Use a “detective” approach: form hypotheses, validate with evidence, and iterate quickly.
Go beyond your immediate area to unblock solutions, including:
- reading internal tooling or framework code when needed (and occasionally digging into upstream/open-source source code to understand behavior)
- collaborating across teams to trace system boundaries and ownership
- building reproducible test cases, simulations, or load tests to validate fixes and performance changes
- creating small tools/scripts to diagnose production issues or prevent regressions

What we’re looking for (must-have)

1) Technical competencies

3+ years of experience in data engineering (or equivalent experience building production-grade data systems).
Strong coding ability in Python, plus experience in Scala and/or Go (or strong ability to ramp quickly).
Strong grasp of system design, design patterns, and engineering trade-offs.
Experience designing robust pipelines end-to-end (batch + ideally streaming).
Solid SQL skills and strong understanding of data modeling:
- OLTP vs OLAP, star schema, partitioning strategies, and how modeling impacts performance and usability
Hands-on experience with distributed processing and big data systems (e.g., Spark, EMR, data lake architectures).
Strong operational mindset: observability, reliability, and performance optimization.

2) Behavioral & leadership competencies

Demonstrated ability to lead technical discussions and drive decisions.
Strong ownership: you take problems from unclear to solved, and you close loops.
Comfortable mentoring junior engineers and raising team standards.
Clear communication, especially around constraints, risks, and trade-offs.
Strong documentation habits and a reliability-first mindset.

Nice-to-haves / advantages

Active involvement in open source, technical blogging/writing, hackathons, or building meaningful side projects.
Experience with streaming ecosystems (e.g., Kafka-style patterns), data contracts, schema evolution, or event-driven architectures.
Experience implementing data quality frameworks (tests, anomaly detection, freshness checks).
Cost optimization experience in cloud data platforms (compute/storage trade-offs).

What success looks like in the first 3–6 months

You’ve improved the reliability and observability of key pipelines (clear metrics, alerts, fewer incidents).
You’ve delivered at least one meaningful pipeline or architecture improvement that scales better and is easier to operate.
You’ve led one or more root-cause deep dives and implemented fixes that prevent recurrence.
You’ve strengthened team execution through mentoring, reviews, and better engineering practices.