We are looking for a proactive and process-driven Operations Center Manager to lead our command center team. In this role, you will be the "pulse" of our IT environment, ensuring high availability and seamless performance of our critical infrastructure. You aren't just watching screens—you are an Incident Commander and a Process Architect. We need someone who lives and breathes the ITIL framework to bring structure to chaos, and who has deep experience with infrastructure monitoring to detect issues before they impact the business.
Key Responsibilities
1. Operations Center Leadership
Manage the daily activities of the Operations Center (NOC), ensuring 24/7 coverage and rapid response to alerts.
Lead, mentor, and train a team of System Administrators and L1/L2 Support Engineers.
Manage shift schedules, handovers, and on-call rotations to ensure zero coverage gaps.
2. Infrastructure Monitoring & Tooling
Oversee the health of the entire IT estate: Servers (Windows/Linux), Networks (LAN/WAN), Cloud (AWS/Azure), and Virtualization (VMware/Hyper-V).
Tool Ownership: Administer and tune monitoring platforms (e.g., SolarWinds, Nagios, Datadog, Zabbix, Logic Monitor, Elastic, Splunk etc).
Refine alert thresholds to reduce "alert fatigue" and ensure the team focuses on actionable signals.
Design and maintain real-time dashboards for leadership, visualizing uptime, latency, and system health.
Ensure patching schedules are executed on time and compliant with security policies.
Qualifications
Required Experience
12+ years of experience in IT Operations, Infrastructure Support, or NOC environments.
2+ years of experience in a leadership or team lead role.
Deep understanding of the ITIL Framework (Certification is highly preferred).
Hands-on experience with Monitoring Tools: Proficiency in configuring and managing tools like Logic Monitor, Elastic, SolarWinds, PRTG, Nagios, Datadog, or New Relic.
Solid technical background in Server Administration (Windows/Linux) and basic Networking concepts (DNS, TCP/IP, Firewalls).
Soft Skills
Crisis Management: Ability to stay calm and decisive during high-pressure outages.
Communication: capable of translating complex technical issues into clear business updates for executives.
Analytical Thinking: A data-driven approach to identifying trends and inefficiencies.
Preferred (Bonus Points)
ITIL v3 or v4 Foundation/Intermediate Certification.
Experience with ITSM tools like ServiceNow, Jira Service Management, or BMC Remedy etc.
Basic scripting skills (PowerShell, Bash, or Python) for automation.
Experience in a Hybrid Cloud environment (On-prem + Azure/AWS).