About Grupo QuintoAndar
We are Grupo QuintoAndar, the largest real estate ecosystem in Latin America. Guided by a shared purpose of helping people love where they live, we have a diversified portfolio of brands and solutions across different countries in Latin America, covering all phases of the housing journey. We also have a Technology Hub in Portugal. We develop technology and innovation to transform and enhance the overall living experience.
With the support of a world-class team of investors and advisors, including Kaszek, Qualcomm, General Atlantic, and SoftBank, Grupo QuintoAndar is currently valued at over USD 5.1 billion and continues to grow year over year.
Here, you will work with top professionals in the market, in an environment that breathes innovation, collaboration, and high performance. To learn more about our story, visit: https://grupoquintoandar.com/pt/.
Location & Remote Work
Our technology team operates under a "remote-first" model, which means we work from home and can live anywhere in Brazil. We also offer the option of working from our São Paulo offices or partner coworking spaces, up to twice a week.
Hiring Process Stages
The stages of our hiring processes aim to assess your experiences and allow you to meet our teams and explore career opportunities. They are structured as follows:
People interview
Tech Screening | Live Coding
Tech Interview 1 | System Design
Tech Interview 2 | Debugging Interview
About the Team
As a Site Reliability Engineer focused on Observability, you will help build and maintain our cloud infrastructure while enabling teams to better understand and operate their systems.
You’ll work closely with product engineering teams to ensure services are observable, scalable, secure, and resilient.
Your activities will include provisioning cloud infrastructure, evolving our observability stack (metrics, logs, and traces), defining and maintaining SLIs/SLOs, automating workflows, improving CI/CD pipelines, identifying and correcting performance issues, and developing tools that enhance the daily experience of our engineers.
We are strong adopters of OpenTelemetry and continuously evolving our instrumentation strategy across all services.
Picture an observability platform that we own–no Datadog training wheels, no Dynatrace magic carpets. It’s built on OpenTelemetry from collector to UI, so every metric, log, trace (and any future shiny signal) is ours to shape. Your job is to keep that beast humming:
- Provision and tune the cloud plumbing that powers the platform.
- Grow QuintoAndar Observability—All telemetry, for All services, in All environments, All the time, available to All engineers.
- Define and guard the SLIs/SLOs that tell us when reality drifts from “supposed to.”
- Automate anything that moves twice (workflows, dashboards, data retrieval, you name it).
- Hunt down performance gremlins with the help of the rest of the engineering before they nibble production.
- Build tools that make every engineer’s day 10 % less painful—and brag-worthy.
TL;DR: You’ll be the custodian of a home-grown, company-wide observability stack, wiring it, scaling it, and making sure it never blinks. If that sounds fun, bring your cape.
Some real use cases we've worked on:
- Observability & Incident Analysis
- Evolved our observability platform using Prometheus, Thanos, Grafana, Loki, Tempo, and Faro over a full OpenTelemetry stack to provide deep visibility across our systems;
- Built data pipelines to analyze incident metrics, helping us reduce MTTR and understand patterns across environments;
- Worked alongside engineering teams to define and monitor (and fight for) SLIs/SLOs for key APIs, improving reliability and customer experience;
- Led workshops and internal sessions on instrumentation and observability best practices using OpenTelemetry and our observability tools;
- Improve our internal observability infrastructure to reduce costs, latency and downtime.
- Platform Engineering & Developer Experience
- Created custom Kubernetes operators in Golang to automate infrastructure lifecycle and reduce manual interventions;
- Built our internal CLI (QLI) to help developers manage resources, debug environments, and access observability data more easily;
- Migrated our continuous delivery platform to GitOps without disrupting workflows—supporting over 300 daily production deployments;
- Security & Infrastructure
- Designed a centralized solution for services to connect to databases using temporary credentials, improving security posture;
- Segmented AWS accounts to provide better cost visibility, access control, and separation of concerns across teams;
- Developed tools that enhance both security and observability without creating friction for engineers;
- Ensure best security practices for our open-source tools.
- Collaboration & Production Readiness
- Partnered with developers to investigate complex, production-level issues using logs, metrics, and distributed tracing;
- Supported teams in onboarding to our Kubernetes environment, ensuring applications are properly monitored and alerting is in place from day one.
Requirements
You will:
- Provision and maintain our cloud infrastructure;
- Identify and fix performance and reliability issues;
- Operate and evolve our Kubernetes clusters;
- Build tools that improve engineering workflows and visibility;
- Support and expand our observability platform with metrics, logs, traces and profiling.
What we are looking for:
- Solid experience with observability practices and tooling (metrics, logs, and traces);
- Hands-on experience with OpenTelemetry infrastructure and instrumentation;
- Familiarity with monitoring tools like Prometheus, Grafana, Loki, or similar;
- Ability to define and maintain SLIs/SLOs aligned with product and engineering goals;
- Experience with container orchestration platforms (Kubernetes, ECS);
- Understanding of CI/CD workflows and delivery automation;
- Proficiency in at least one programming language (we primarily use Python and Golang);
- Knowledge of infrastructure as code tools (Terraform, Crossplane, and/or Pulumi).
You will stand out if you have:
- Knowledge of microservice architecture and distributed systems;
- Additional experience with GitOps, Kafka, CDN, Gateway APIs, or similar tools.
- Knowledge in JVM-based programming languages
Important
- Our hiring process starts with the application! If you truly want to be part of our team, please complete this step of the process. We analyze all candidates individually and provide feedback to all applicants.
- All communication will be conducted via email, so please stay tuned for our messages and release the domain @quintoandar.com.br to ensure our emails are not sent to spam.
Benefits
- Competitive salary
- Profit sharing
- Meal allowance
- Health insurance
- Dental plan
- Life insurance
- Childcare subsidy and Atypical Parenthood subsidy
- Wellhub
- Home office allowance
- Employee assistance program (mental health, social, legal, and financial support)
- Extended parental leave
- Day off on birthday, Mother’s Day, and Father’s Day
- Benefits Club (discounts on everyday services)
- Discounts at educational institutions
- Reading kit for children – PlayKids
Diversity & Inclusion at Grupo QuintoAndar
We value diversity and want everyone to feel welcome here, regardless of their age, gender identity, sexual orientation, race, color, ethnicity, origin, disability, religion, or any other characteristic. All our job openings are open to all individuals!
You'll notice there are some diversity questions in the application form. For affirmative action roles, this information may be used to verify your alignment with the target audience for the opportunity. In such cases, it may be used for elimination purposes. For non-affirmative action roles, this data will be used anonymously, exclusively to monitor and improve our inclusion practices in the hiring process, and will have no impact on your application.
Privacy and Data Protection
The Grupo QuintoAndar operates in compliance with privacy and data protection laws, including, but not limited to, the Brazilian General Personal Data Protection Law (LGPD) (Law No. 13,709/2018), and ensures the security of your data. To learn more, please access our Privacy Notice for Candidates. For questions or to exercise your rights as a data subject, please contact us through our Service Channel.
Sponsored
Explore Engineering
Skills in this job
People also search for
Similar Jobs
Site Reliability Engineer - Observability
N26
Site Reliability Engineer - Observability
N26
Site Reliability Engineer Pl. – (Cloud/Observability)
Banco PAN
Senior Software Engineer - Observability and Reliability
Sigma Computing
Senior Site Reliability Engineer
Formlabs
More jobs at Grupo QuintoAndar
Grupo QuintoAndar | City Growth Manager - Porto Alegre/Curitiba
Grupo QuintoAndar
Grupo QuintoAndar | Pessoa coordenadora de Inside Sales
Grupo QuintoAndar
Grupo QuintoAndar | Senior Data Scientist
Grupo QuintoAndar
Grupo QuintoAndar | Tech Lead Manager (Data Science)
Grupo QuintoAndar
Grupo QuintoAndar | Staff Software Engineer
Grupo QuintoAndar
Similar Jobs
Site Reliability Engineer - Observability
N26
Site Reliability Engineer - Observability
N26
Site Reliability Engineer Pl. – (Cloud/Observability)
Banco PAN
Senior Software Engineer - Observability and Reliability
Sigma Computing
Senior Site Reliability Engineer
Formlabs
More jobs at Grupo QuintoAndar
Grupo QuintoAndar | City Growth Manager - Porto Alegre/Curitiba
Grupo QuintoAndar
Grupo QuintoAndar | Pessoa coordenadora de Inside Sales
Grupo QuintoAndar
Grupo QuintoAndar | Senior Data Scientist
Grupo QuintoAndar
Grupo QuintoAndar | Tech Lead Manager (Data Science)
Grupo QuintoAndar
Grupo QuintoAndar | Staff Software Engineer
Grupo QuintoAndar