Cloudinary

Staff Backend Engineer – AI Algorithm Platform

Cloudinary Israel 33 days ago
engineering
Cloudinary empowers companies to deliver exceptional digital experiences by managing the entire media lifecycle at scale. Within Cloudinary’s R&D, the Research Group leads the development of cutting-edge algorithms for media understanding, generation, and optimization. 

We are seeking an experienced Staff Backend Engineer to lead the engineering efforts behind our homegrown platform for serving and operating production-grade AI models and AI based algorithms.

This is a mission-critical role for someone passionate about building highly-scalable, GPU-aware, cloud-native systems that act as the connective tissue between algorithm research and product innovation. You will play a pivotal part in re-designing and evolving the platform, while supporting both research and application teams across the organization, and contributing to MLOps initiatives.

Key Responsibilities


  • Platform Ownership
  • Own the architecture, stability, scalability, and performance of the system.
  • Design and implement platform features that support both synchronous low-latency and asynchronous compute-heavy algorithm execution.
  • Enhance GPU management, scheduling, and resource allocation for optimal performance and cost-efficiency.
  • Ensure robust Kubernetes-based deployment and observability for a highly dynamic system.

  • Cross-Team Collaboration
  • Act as the technical bridge between Research and Application teams by translating requirements into scalable system designs.
  • Collaborate closely with algorithm developers to streamline model deployment processes.
  • Partner with backend engineers (primarily working in Ruby and Go) to integrate the research group algorithms into Cloudinary services.

  • Engineering Excellence
  • Advocate for high standards in code quality, observability, testing, and security.
  • Guide engineering integration efforts when consuming the different platform APIs.
  • Provide mentorship, support, and best practices to other engineers interacting with the platform.
  • Take part in general R&D efforts, supporting a broader production environment.

  • Platform Extension and MLOps
  • Contribute to the evolution of our platform to support a wider range of algorithmic workloads and model types.
  • Help shape tooling and infrastructure for model versioning, rollout, monitoring, and testing.
  • Collaborate with DevOps and Infrastructure teams to maintain operational excellence, system observability, and robust infrastructure support
  • Your Qualifications

  • 8+ years of experience in software engineering, with 3+ years working on infrastructure/platforms involving ML/AI, GPU, or data-heavy systems.
  • Proficiency in Python and familiarity with backend languages such as Ruby and/or Go.
  • Strong understanding of Kubernetes internals and experience running GPU workloads in production environments.
  • In-depth knowledge of AWS services.
  • Experience architecting systems that support both real-time and asynchronous processing pipelines.
  • Familiarity with the ML lifecycle and MLOps practices, including CI/CD for models, monitoring, and rollback strategies.
  • Bonus Qualifications

  • Experience working in research-driven environments or alongside data scientists, algorithm research team and ML engineers.
  • Contributions to open-source projects related to model serving, Kubernetes operators, or ML platforms.
  • Experience supporting systems with diverse user groups across engineering and research disciplines.
  • Why Join Us?

  • Opportunity to build and scale a one-of-a-kind platform powering state-of-the-art media algorithms.
  • Collaborate with world-class research, engineering, and product teams.
  • Have a direct impact on product experiences used by millions of developers and end-users.
  • Be part of a culture that values creativity, autonomy, and continuous improvement.
  • #LI-SL1