Thunes

Machine Learning Ops Engineer

Thunes Singapore, Central, Singapore Today
data

About Thunes

Thunes is the Smart Superhighway for money movement around the world. Thunes’ proprietary Direct Global Network allows Members to make payments in real-time in over 130 countries and more than 80 currencies.

Thunes’ Network connects directly to over 7 billion mobile wallets and bank accounts worldwide, via more than 350 different payment methods, such as GCash, M-Pesa, Airtel, MTN, Orange, JazzCash, Easypaisa, AliPay, WeChat Pay and many more.

Members of Thunes’ Direct Global Network include gig economy giants like Uber and Deliveroo, super-apps like Grab and WeChat, MTOs, fintechs, PSPs and banks. Thunes’ Direct Global Network differentiates itself through its worldwide reach, in-house Smart Treasury Management Platform and Fortress Compliance Infrastructure, ensuring Members of the Network receive unrivalled speed, control, visibility, protection and cost efficiencies when making real-time payments globally.

Headquartered in Singapore, Thunes has offices in 12 locations, including Barcelona, Beijing, Dubai, London, Manila, Nairobi, Paris, Riyadh, San Francisco, Sao Paulo and Shanghai. For more information, visit: https://www.thunes.com/

Context of the role

We are looking for a highly driven, process-obsessed, and a technically excellent engineer who is excited about bridging the gap between Data Science, AI Engineering, and Production Infrastructure.

You will need to combine a startup mindset with the discipline of a platform architect, ensuring that our "Golden Path" to production is automated, secure, and cost-efficient. The MLOps function is responsible for the infrastructure that bridges our core working systems with our AI tech stack. We architect solutions, automated pipelines, and monitoring stacks to ensure our Data Scientists and AI Engineers can ship fast without breaking things.

Key Responsibilities

  • Architect and orchestrate a seamless multi-cloud environment. Manage the AI tech stack and systems alongside the enterprise data infrastructure using Terraform
  • Design and maintain robust DataOps pipelines implementing Medallion Architecture (Bronze / Silver / Gold). Use Airflow to orchestrate DAGs and ensure data quality / lineage before it reaches the models
  • Ensure excellence in the MLOps lifecycle by implementing the "4 C's": CI (Automated linting/testing in GitLab), CD (Safe rollout strategies), CT (Automated retraining triggers), and CM (Continuous Monitoring of drift / latency)
  • Champion Finance operations (cost and efficiency) for ML and LLM systems. Implementing approaches to prevent redundant API calls and scripting automated "Kill Switches" for runaway GPU instances or token spikes
  • Secure the platform by architecting services to allow our team to access different resources securely from different environments, managing IAM Identity Center for least-privilege access
  • Participate in the evaluation of observability tools to trace token usage, error rates per users and other other measures

Professional Experience and Qualifications

  • 5+ years of technical experience, with a proven track record of shipping ML pipelines in production
  • Multi-Cloud Fluency: Deep expertise in architecting solutions on major cloud platforms (e.g. AWS, GCP). Strong operational grasp of cloud services (e.g. Security, Networking, Storage, AI)
  • Experience in LLM Observability & Cost Optimisation: Experience setting up stacks with self-hosted tools (e.g. Langfuse, LangSmith, Phoenix). Ability to implement caching strategies (e.g. Redis / Memcached)
  • Certifications: Google Professional Machine Learning Engineer or AWS Certified Machine Learning - Specialty / DevOps Engineer - Professional certification 
  • Holding a Bachelor’s degree in Computer Science, Engineering, or related fields
  • Expert in Infrastructure as Code (IaC): Mastery of IaC (e.g. Terraform, OpenTofu). Experience writing modular, reusable code for multi-environment setups (Dev / Staging / Prod)
  • Proficient in DataOps: Proven implementation of Medallion Architecture on a Data Lakehouse. Proficiency with Apache Airflow (writing custom operators), with data quality tools like dbt tests, and with data governance tools (e.g. OpenMetadata)
  • Mastery of CI/CD & Automation: Advanced configuration of GitLab CI (e.g. Runners, Secrets Management). Experience with CML (Continuous Machine Learning) is a plus
  • Proficient in Containerisation: Mastery of Docker, Kubernetes and orchestration (e.g. VM, K8s)
  • Passionate about cost management and efficiency: You view efficiency as a dual mandate, optimising financial costs while maximising system performance

Sound like you? Apply now!

Sponsored

Explore Data

Skills in this job

People also search for