HighLevel

Lead Engineer - Platform Performance & Reliability

HighLevel Delhi Today
engineering
About HighLevel:
HighLevel is an AI powered, all-in-one white-label sales & marketing platform that empowers agencies, entrepreneurs, and businesses to elevate their digital presence and drive growth. We are proud to support a global and growing community of over 2 million businesses, comprised of agencies, consultants, and businesses of all sizes and industries. HighLevel empowers users  with all the tools needed to capture, nurture, and close new leads into repeat customers. As of mid 2025, HighLevel processes over 4 billion API hits and handles more than 2.5 billion message events every day. Our platform manages over 470 terabytes of data distributed across five databases, operates with a network of over 250 microservices, and supports over 1 million hostnames.

Our People
With over 1,500 team members across 15+ countries, we operate in a global, remote-first environment. We are building more than software; we are building a global community rooted in creativity, collaboration, and impact. We take pride in cultivating a culture where innovation thrives, ideas are celebrated, and people come first, no matter where they call home.

Our Impact
As of mid 2025, our platform powers over 1.5 billion messages, helps generate over 200 million leads, and facilitates over 20 million conversations for the more than 2 million businesses we serve each month. Behind those numbers are real people growing their companies, connecting with customers, and making their mark - and we get to help make that happen.

About the Team:
This team is responsible for the speed, stability, and operational health of HighLevel. We partner across Backend, SRE, Infrastructure, Messaging, CRM, and Automations teams to ensure the platform runs smoothly under unpredictable workloads and global traffic patterns.
Our mandate is to detect issues early, eliminate bottlenecks at the root, and build infrastructure that stays reliable years into the future.

About the Role:
We’re hiring a Senior Backend Engineer who is passionate about performance optimisation, distributed systems behaviour, runtime efficiency, and platform-level correctness. You won’t just tune endpoints—you’ll design systems and patterns that prevent slowness before it begins.
This is a hands-on engineering role with broad influence across services, architecture decisions, and operational standards. You will work with high-traffic systems, complex micro-services, and global workloads where milliseconds matter.

Responsibilities:

  • Improve performance and reduce latency:
  • Come up with architectural planning for implementing new features around custom objects
  • Diagnose and remove bottlenecks in backend services, APIs, and message flows
  • Profile Node.js services (CPU, heap, event loop) and rewrite hot paths for efficiency

  • Strengthen platform reliability:
  • Improve resilience via batching, caching, pooling, concurrency controls, and backpressure
  • Harden services against cascading failures and dependency slowness
  • Establish rate-limiting, queueing, and circuit-breaker patterns that scale under load
  • Optimize database queries, indexing strategies, denormalization, and read/write paths

  • Collaborate on infrastructure & operations:
  • Work with SRE and Infra teams on autoscaling, capacity planning, quotas, and workload efficiency
  • Contribute to runtime configuration improvements (GKE, Node.js, Redis, Pub/Sub, Firestore, ClickHouse)
  • Participate in performance incident reviews and drive actionable root-cause fixes

  • Build long-lasting performance frameworks:
  • Define best practices for high-performance microservices, distributed patterns, and observability
  • Create reusable performance tooling, dashboards, and profiling workflows
  • Mentor engineers on writing scalable code, interpreting metrics, and designing reliable services
  • Requirements:

  • 5+ years of backend engineering experience focused on large-scale systems.
  • Strong experience with Node.js internals (event loop, memory model, async behavior)
  • Proficiency in diagnosing performance issues using CPU/heap profilers, tracing, and metrics
  • Solid understanding of Microservices, Distributed systems, High-throughput APIs, Caching strategies, Queuing/Backpressure patterns & Rate limiting/load balancing
  • Experience with MongoDB, Postgres/MySQL, Firestore, Redis, ClickHouse, or similar DBs
  • Familiarity with Kubernetes, GCP/AWS, and observability tooling (Grafana, Prometheus, OpenTelemetry)
  • Possess strong communication skills & are able to explain bottlenecks clearly without finger-pointing
  • A mindset that treats every millisecond, query, and allocation as something to optimize thoughtfully
  • Nice to Have:

  • Experience with high-traffic Node.js services serving millions of requests/hour.
  • Familiarity with distributed tracing (OTel), tail sampling, RED/USE metrics, and SLO-driven engineering
  • Experience preventing and mitigating cascading failures at scale
  • Background in runtime optimization, capacity planning, or SRE-style operational excellence
  • Contributions to internal performance frameworks, profiling tools, or reliability toolkits
  • EEO Statement:
    The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.

    #LI-Remote #LI-HB1

    Sponsored

    Explore Engineering

    Skills in this job

    People also search for