Is the Lead Site Reliability Engineer role at Kmstechnology1 remote?

The Lead Site Reliability Engineer role at Kmstechnology1 is an on-site position located in Guadalajara.

How do I apply for the Lead Site Reliability Engineer position at Kmstechnology1?

You can apply for the Lead Site Reliability Engineer position at Kmstechnology1 directly through HireHere. Click the "Apply" button on the job listing to be taken to the application page.

Lead Site Reliability Engineer

Kmstechnology1Guadalajara2h ago

PythonJavaAzureKubernetesTerraform

We are seeking a Lead Site Reliability Engineer to spearhead the reliability, scalability, and performance of our AI-powered property intelligence platform. Operating at the intersection of Geospatial AI and Insurance Technology, you will be responsible for a mission-critical Azure ecosystem supporting high-throughput Java microservices.

As a Lead, you will bridge the gap between complex AI model inference and enterprise-grade stability. You will own the "Production Excellence" mandate, mentoring a team of engineers and collaborating with Senior Delivery Directors to ensure our global infrastructure stays ahead of our rapid growth.

Key Responsibilities

Strategic Infrastructure & Azure Leadership

Cloud Architecture: Lead the design of highly available, multi-region architectures on Azure, utilizing AKS (Azure Kubernetes Service), Azure Functions, and Service Bus.
IaC Governance: Establish and enforce standards for Infrastructure as Code using Terraform or Bicep, ensuring 100% automated provisioning across all environments.
Java Performance Engineering: Partner with Backend squads to optimize JVM performance, garbage collection tuning, and memory management for high-concurrency insurance processing.

Reliability & AI Operations (AIOps)

Error Budgeting: Define, negotiate, and manage SLIs, SLOs, and SLAs with Product Stakeholders, balancing the velocity of AI feature releases with system stability.
Advanced Observability: Architect end-to-end monitoring and distributed tracing using Azure Monitor, Application Insights, and ELK/Grafana.
Incident Commander: Act as the ultimate escalation point for high-priority incidents, leading complex Root Cause Analysis (RCA) and driving long-term remediation tasks.

Security & Industry Compliance

Data Sovereignty: Ensure the platform adheres to insurance-specific data residency requirements and security frameworks (SOC2, HIPAA, or ISO 27001).
Automated Governance: Implement Azure Policy and automated security scanning within CI/CD pipelines to ensure a "Secure by Design" infrastructure.

Qualifications

Technical Leadership:

7+ years in SRE, DevOps, or Cloud Engineering, with at least 2 years in a Lead or Principal capacity.
Azure Mastery: Expert-level knowledge of the Azure Well-Architected Framework, specifically around networking (VNet/ExpressRoute) and Compute.
Java Ecosystem: Deep proficiency in the Java/Spring Boot stack from an operational perspective (JVM profiling, thread dump analysis).
Container Orchestration: Mastery of Kubernetes (AKS), including ingress controllers, service mesh (Istio), and cluster security.

Professional Competencies:

Strategic Mindset: Ability to translate technical debt and reliability risks into a data-driven business case for leadership.
Automation Advocate: Proven track record of eliminating "Toil" through Python, Go, or Java-based automation tooling.
Mentorship: Passion for leveling up the engineering organization through workshops, documentation, and pair programming.
AI-First Integration: Experience leveraging AI for predictive scaling and automated log summarization to reduce Mean Time to Recovery (MTTR).

Additional Information

Perks you enjoy at KMS Mexico

Mexican law benefits
15 days of PTO (in year zero, from the first year onwards it is 3 days per year).
5 days' leave for the death of immediate family members, negotiable.
Major Medical Expenses Insurance with coverage for immediate dependents (spouse and children).
Annual performance bonus (≈10% of annualized salary).
Annual salary adjustment.
Employee Referral Bonus.
Paid Certifications / Courses
Coursera License.
5% Savings Fund.
5% Grocery Vouchers.

Lead Site Reliability Engineer

Qualifications

Additional Information

Explore Engineering

Skills in this job

People also search for