We are looking for a senior‑level data engineer to design, build, and operate highly‑scalable ingestion and CDC (change‑data‑capture) pipelines on our AzureFabric Lakehouse platform. In addition to delivering production‑grade pipelines, you will help mature our engineering discipline by turning “notebook‑style” work into reusable, test‑driven Python libraries and CI/CD‑driven deployment artefacts.
You will be part of the Common Data Intelligence Hub, partnering with data architects, analytics engineers, and solution designers to ship robust, governed data products that serve the enterprise‑wide analytics ecosystem.
- Your team owns ingestion & CDC engineering end-to-end (design, build, operate, observability, reliability, reusable components).
- You contribute to platform standards (contracts, layer semantics, readiness criteria) and reference implementations.
- You do not primarily own cloud infrastructure provisioning (e.g., enterprise networking, core IaC foundations), but you collaborate with the platform team by defining requirements, reviewing changes, and maintaining deployable code for pipelines and jobs.
Platform data engineering & delivery
- Design and develop ingestion pipelines using Azure Fabric services (notebooks/jobs/workflows).
- Implement and operate CDC patterns (inserts, updates, deletes), including late arriving data and reprocessing strategies.
- Structure and maintain bronze and silver Delta Lake datasets (schema enforcement, de-duplication, performance tuning).
- Build “transformation-ready” datasets and interfaces (stable schemas, contracts, metadata expectations) for analytics engineers and downstream modeling.
- Ingest data in a batch-first approach (raw landing, replayability, idempotent batch processing), and help evolve patterns toward true streaming where future use cases require it.
Software engineering for data frameworks
- Develop and maintain Python-based ingestion/CDC components as production-grade software (modules/packages, versioning, releases).
- Apply engineering best practices: code reviews, unit/integration tests, static analysis, formatting/linting, type hints, and clear documentation.
- Establish and improve CI/CD pipelines for data engineering code and pipeline assets (build, test, security checks, deploy, rollback patterns).
- Drive reuse via shared libraries, templates, and reference implementations; reduce “one-off notebook” solutions.
Operations, reliability & observability
- Implement logging, metrics, tracing, and data pipeline observability (run-time KPIs, SLAs, alerting, incident readiness).
- Troubleshoot distributed processing and production issues end-to-end.
- Work with solution designers on event-based triggers and orchestration workflows; contribute to operational standards.
- Implement operational and security hygiene: secure secret handling, least-privilege access patterns, and support for auditability (e.g., logs/metadata/lineage expectations).
Collaboration & leadership
- Mentor other engineers and promote consistent engineering practices across teams.
- Contribute to the Data Engineering Community of Practice and help define standards, patterns, and guardrails.
- Contribute to architectural discussions (layer semantics, readiness criteria, contracts, and governance).
- Work with architects and governance stakeholders to ensure datasets meet governance requirements (cataloging, ownership, documentation, access patterns, compliance constraints) before promotion to higher layers.
Qualifications
- 3–5 years of hands-on experience building data pipelines with Azure Fabric in production.
- Strong knowledge of Delta Lake patterns (CDC, schema evolution, deduplication, partitioning, performance optimization).
- Advanced Python engineering skills: building maintainable projects (packaging, dependency management, testing, tooling).
- Solid SQL skills (complex transformations, debugging, performance tuning).
- Proven experience with CI/CD and Git-based workflows (merge requests, branching strategies, automated testing, environment promotion).
- Ability to diagnose and resolve issues in distributed systems (Spark execution, cluster/runtime behavior, data correctness).
- Good understanding of data modeling principles and how they influence ingestion and performance.
- Practical experience applying data governance and security controls in a Lakehouse environment (permissions/access patterns, secure secret handling, audit needs; Unity Catalog is a plus).
- Proactive, reliable, and able to work independently within agile teams.
- Strong communication skills in English (spoken and written).
Technical Core Skills
- Azure Fabric (Notebooks, Spark, Delta Lake)
- Azure Functions & Durable Functions (orchestration, long-running workflows)
- SQL (analysis + performance tuning)
- PySpark and Python (production-grade)
- ADLS Gen2 (lake storage design, folder/partition strategy, access controls, lifecycle/retention)
Software engineering toolchain
- Git + code review workflows
- CI/CD pipelines (e.g., GitLab CI, Azure DevOps)
- Testing: unit/integration tests, test data strategies
- Code quality: linting/formatting, static analysis, type hints
- Packaging & dependency management (e.g., Poetry/pip-tools/conda — whichever you standardize on)
Governance, security & orchestration
- Secure secret handling and service authentication patterns (Key Vault or equivalent)
- Event Grid / Azure Functions / event-driven orchestration
- Observability (structured logging, metrics, alerting; Log Analytics or equivalent)
Additional Information
* Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.