This role is for one of the Weekday's clients
Min Experience: 3 years
Location: Bengaluru
JobType: full-time
We are looking for a highly driven Technical Lead to work across a multi-product SaaS platform, owning system reliability, scalability, and technical execution. This is a horizontal leadership role spanning multiple products and core systems, ensuring platforms remain fast, secure, and resilient under scale and peak traffic conditions.
This is a hands-on technical leadership role, focused on architecture, reliability, and execution—not people management.
Requirements
Key Responsibilities
1. System Reliability & Performance (Primary Ownership)
- Own and improve reliability metrics across products, including uptime, SLAs, and latency (P95).
- Monitor and reduce application errors, bug leakage, and system failures.
- Ensure correctness of distributed systems involving synchronous and asynchronous workflows.
- Optimize queue processing, worker throughput, and caching layers (e.g., Redis).
- Prepare systems for high-traffic events and peak load scenarios.
- Lead root cause analysis and drive permanent, systemic fixes.
- Act as the technical owner for incident resolution and long-term prevention.
2. Architecture & Scalability
- Collaborate with senior technical stakeholders to evolve platform architecture.
- Improve API design, data models, and system boundaries.
- Design scalable distributed system patterns such as idempotent workflows, retries, batching, and fan-out orchestration.
- Build and scale asynchronous pipelines for high-volume workloads.
- Plan capacity for traffic spikes and introduce resilience patterns like circuit breakers and fail-safes.
3. Hands-On Engineering Leadership
- Lead and review technical designs across teams and products.
- Unblock engineers on complex architectural or performance challenges.
- Own and drive cross-product refactors and technical debt reduction.
- Enforce clean code standards, testing practices, and observability-first development.
- Mentor engineers on debugging, system design, and performance optimization.
4. Observability & Monitoring
- Define and maintain SLIs and SLOs across critical systems.
- Build dashboards, alerts, and monitoring using logs, metrics, and traces.
- Ensure issues are detected proactively before impacting users.
- Work closely with platform teams to instrument distributed workflows end-to-end.
5. Security & Compliance
- Ensure secure coding practices and adherence to compliance requirements (e.g., SOC 2).
- Enforce proper secrets management, access controls, and audit logging.
- Maintain data integrity, API security, and permission correctness across systems.
6. Cross-Functional Collaboration
- Partner with Product teams to translate requirements into technically sound solutions.
- Work with Support and Customer Success teams to deeply understand production issues.
- Collaborate with Core Systems and Infrastructure teams to improve platform stability.
- Align with QA teams to define testing strategies, including load, integration, and failure testing.
Requirements
Must Have
- 3–4+ years of backend engineering experience (Python preferred).
- Strong understanding of distributed systems and backend architecture.
- Deep experience with SQL databases, data modeling, and query optimization.
- Hands-on expertise with Redis, queues, async jobs, retries, and background processing.
- Strong debugging skills across application and infrastructure layers.
- Proven ability to lead technical decisions across multiple teams.
- Experience improving system reliability and performance at scale.
- Excellent communication and collaboration skills.
Nice to Have
- Experience with observability tools such as Datadog, Sentry, or Elasticsearch.
- Exposure to CRM integrations or large enterprise systems.
- Prior ownership of reliability for multi-product SaaS platforms.
- Familiarity with secure coding practices and compliance frameworks.
What Success Looks Like
0–3 Months
- Gain a deep understanding of platform architecture and core systems.
- Deliver quick reliability and performance improvements.
- Become a go-to technical problem solver across teams.
4–6 Months
- Establish clear SLIs and SLOs for key systems.
- Introduce architectural guardrails and reduce operational noise.
- Significantly lower error rates and production issues.
7–12 Months
- Achieve high availability (99.9%+) across core platforms.
- Ensure predictable and resilient async pipelines.
- Improve performance under peak traffic conditions.
- Enable faster engineering velocity through cleaner, more stable systems.
Skills
- Backend Engineering
- Distributed Systems
- System Reliability
- Relational Databases
- Platform Scalability
Sponsored
Explore Engineering
Skills in this job
People also search for
Similar Jobs
More jobs at Weekday AI
Apply for this position
Sign In to ApplyAbout Weekday AI
At Weekday (backed by YC; also Product Hunt #1 product of the day), we are building the next frontier in hiring. We have built the largest database of white collar talent in India and have built outreach tools on top of it to generate highest response ...