Arista Networks

Customer Reliability Engineer (CRE)

Arista Networks Bengaluru Today
engineering

Who You’ll Work With

Arista's Network Detection and Response (NDR) platform is a mission-critical security tool for our customers. Its reliability is paramount. We are hiring a mid-level, Customer Reliability Engineer (CRE) to join our team. This role is critical to the evolution of our customer-facing infrastructure and operational posture.

What You’ll Do:

This is not a traditional operations role. You will inherit a set of critical, manual, and hands-on operational responsibilities essential to our customers' success. We need you to help with the effort to systematically dismantle this operational burden through automation, tooling, and systems. You will have a collaborative team of excellent engineers to work with.

The short-term needs are: manual deployments, reactive troubleshooting, and on-call escalations. But we need you to help us build a system where programmatic solutions have replaced human intervention. You must have the pragmatism to manage the current reality and the systematic impatience and technical skill to build its replacement.

Success in this role requires a dual mindset. You must be a skilled incident leader who can stabilize a crisis and a deliberate systems architect who can prevent the next one. You will work closely with our internal tools, platform, and product engineering teams to channel your direct operational knowledge into durable, long-term solutions.

Your First Year and Beyond

Your work will follow a deliberate trajectory from reactive execution to proactive design.

Phase 1: Stabilize and Map - You will embed with the team, taking on the existing operational workload alongside the other customer SRE team members covering the USA and India time zones. This includes customer deployments, upgrades, and incident response. You will be expected to go on-site for our airgapped customers, occasionally, to assist on-prem deployments. Your initial goal is to achieve stability while mapping the landscape of our operational toil.

Phase 2: Automate and Influence - Armed with your map of toil, you will begin to automate. You will write code, build tooling, and deploy declarative infrastructure to eliminate the most critical operational burdens. For larger projects, you will act as a primary stakeholder, providing clear requirements to our internal tooling and platform teams and ensuring their solutions meet the operational need. Your success will be measured by a demonstrable reduction in the overall support effort, fewer pages, support escalations, and manual tasks.

Qualifications

  • DevOps and SRE Proficiency - You must have a strong 3+ years of background in Site Reliability Engineering or a closely related DevOps function. You also have a strong command of Linux systems administration and possess an understanding of networking fundamentals (TCP/IP, DNS, routing).

  • Customer-Facing Experience - You must have experience working directly with external customers to solve difficult technical problems. Your communication must be clear, empathetic, and precise. You are comfortable developing and executing strategies for updating systems in isolated environments where traditional internet-based tools are unavailable.

  • Cloud Infrastructure Expertise - You should have production experience with AWS (VPC, EC2, IAM, S3) and a proven track record of using Terraform and CI/CD pipelines to automate the delivery of infrastructure and software updates to remote or secure environments.

  • Monitoring and Observability - You will be responsible for both building and using our observability stack. This requires hands-on experience instrumenting applications and managing the telemetry pipelines for metrics, logs, and traces. A core part of the role is then applying this data to debug complex production incidents, understand system behavior, and define SLOs.

  • Automation and Software Development - You must be proficient in writing code to automate operational tasks. Expertise in a high-level language like Python or Go is required, as are strong shell scripting skills (e.g., Bash). We have a diverse tech stack including Python, Scala, C, C++, Haskell, Rust, PureScript, etc which requires experience with monitoring and debugging a complex system using system tools, command line utilities, networking debug tools, and filtering complex logs.

  • Operational Ownership - You must take pride as the On-call/Directly responsible person for mission-critical systems, prioritizing root-cause analysis and durable, system-level fixes and proactively collaborating with Product teams to directly build the automated solutions that resolve operational challenges permanently.

Additional Information

Arista stands out as an engineering-centric company. Our leadership, including founders and engineering managers, are all engineers who understand sound software engineering principles and the importance of doing things right.
We hire globally into our diverse team. At Arista, engineers have complete ownership of their projects. Our management structure is flat and streamlined, and software engineering is led by those who understand it best. We prioritize the development and utilization of test automation tools.
Our engineers have access to every part of the company, providing opportunities to work across various domains. Arista is headquartered in Santa Clara, California, with development offices in Australia, Canada, India, Ireland, and the US. We consider all our R&D centers equal in stature.
Join us to shape the future of networking and be part of a culture that values invention, quality, respect, and fun.

About the Company

Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. Arista is a well-established and profitable company with over $8 billion in revenue. Arista’s award-winning platforms, ranging in Ethernet speeds up to 800G bits per second, redefine scalability, agility, and resilience.  Arista is a founding member of the Ultra Ethernet consortium. We have shipped over 20 million cloud networking ports worldwide with CloudVision and EOS, an advanced network operating system. Arista is committed to open standards, and its products are available worldwide directly and through partners.

At Arista, we value the diversity of thought and perspectives each employee brings. We believe fostering an inclusive environment where individuals from various backgrounds and experiences feel welcome is essential for driving creativity and innovation.

Our commitment to excellence has earned us several prestigious awards, such as the Great Place to Work Survey for Best Engineering Team and Best Company for Diversity, Compensation, and Work-Life Balance. At Arista, we take pride in our track record of success and strive to maintain the highest quality and performance standards in everything we do.

Sponsored

Explore Engineering

Skills in this job

People also search for