What you'll do
• The Incident Manager is responsible for orchestrating the response to high-impact incidents, ensuring timely resolution, minimizing service disruption, and maintaining communication across stakeholders. The role demands exceptional crisis management skills, technical acumen, and a strong ability to coordinate cross-functional teams under pressure.
• Lead end-to-end incident resolution for major incident events through orchestrating combined technical efforts of responding teams.
• Assume ownership of major incidents and drive coordinating efforts to ensure quick resolution of impacting events.
• Identify and remove blockers, escalate appropriately, and continuous momentum of troubleshooting efforts.
• Ensure adherence to established incident management processes and protocols.
• Contribute to the improvement of incident response runbooks and documentation.
• Own internal and external communications during major incidents.
• Translate technical details into business-impact language (scope, severity, risk, ETA, confidence level).
• Maintain clear and continuous communication with stakeholders during incidents, providing timely updates.
• Ensure safe execution of mitigations, rollbacks, feature flags, and failovers
• Lead post incident review meetings with stakeholders to confirm event details and assign problem investigators.
• Track and report on incident metrics, identifying patterns and areas for systemic improvement.
• Augment Change Managers and / or Problem Managers as required in the performance of those responsibilities.
What you've done
• Bachelor’s or master’s Degree and/or equivalent experience relevant to functional area.
• 10+ years of experience in incident management, IT operations, or a similar role.
• Experience managing critical incidents in a 24/7 production environment.
• Experience with ServiceNow ITSM and on call incident coordination via PagerDuty / Zen duty (or comparable ITSM/on call tools).
Knowledge, Skills, Abilities & Behaviours
• Understand a wide breadth of technical concepts across a CI/CD environment.
• Background in cloud-based systems and DevOps practices preferred.
• Ability to use AI tools to synthesize communication, reports, and troubleshooting leads.
• Certification in ITIL, incident management, or related frameworks preferred.
• Experience in SaaS or technology product companies preferred.
• Strong leadership and decision-making skills under pressure.
• Excellent verbal and written communication skills for both technical and non-technical audiences.
• Deep understanding of IT service management principles and practices.
• Ability to manage multiple priorities and deadlines in high-stakes situations.
• Strong analytical skills to drive root cause analysis and trend identification.
• Familiarity with modern monitoring and incident management tools.
• Demonstrated ability to build consensus across diverse teams.
• Effective at maintaining calm and focus during critical situations.
• Knowledge of cloud infrastructure (e.g., AWS, Azure) and application architecture.
• Proven track record of improving incident management processes.
• Attention to detail in documentation and follow-through.
• Adept at facilitating collaboration across remote and global teams.
• Proactive in identifying operational risks and implementing preventive measures.
• Committed to continuous learning and process improvement.
• Ethical, dependable, and resilient in challenging scenarios.
Perks