facebook

Connecting...

Manager, Site Reliability Engineering

Job Title: Manager, Site Reliability Engineering
Contract Type: Permanent
Location: Bangalore Urban
Industry:
Reference: JR-122008
Contact Name: Lalitha Gangadhara
Job Published: November 25, 2021 11:57

Job Description

Equinix is the world’s digital infrastructure company, operating 210 data centers across the globe and providing interconnections to all the key clouds and networks. Businesses need one place to simplify and bring together fragmented, complex infrastructure that spans private and public cloud environments. Our global platform allows customers to place infrastructure wherever they need it and connect it to everything they need to succeed.

At Equinix, we help the world’s digital leaders scale with agility, speed the launch of digital services, deliver world-class experiences, and transform people’s lives. Our culture is based on collaboration and the growth and development of our teams. 

We hire hardworking people who thrive on solving challenging problems and give them opportunities to hone new skills, and try new approaches, as we grow our product portfolio with new software and network architecture solutions. We embrace diversity in thought and contribution and are committed to providing an equitable work environment. that is foundational to our core values as a company and is vital to our success.

The Role:

We are looking for a people-first leader who can drive operational results. This role is to manage a team of software support professionals (Site reliability Engineers) and provide 24*7 global support for the Digital Integration Platform that drives $6B+ revenue and comprises of 800+ services connecting all the business systems.

Responsibilities:

  • Manage the day-to-day operations that ensure optimal performance, quality, and availability of Digital Integration platform
  • Primarily responsible to provide 24x7 global support for Digital Integration Platform following the guidelines of ITIL processes, reporting of KPIs, stakeholder management and financial management
  • Responsible for ensuring that integration services are highly available, reliable, and performant through world-class monitoring, alerting, self-healing capabilities by applying software engineering practices
  • Act as an escalation points to drive resolution of all Incidents, Problem Tickets, Change Management and Service Requests, and issues for digital Integrations platform services.
  • Serve as the primary subject matter expert for Equinix software integrations preventing (pro-active) as well as troubleshooting and mitigating (re-active) service availability/performance issues
  • Being able to multitask and deliver in a fast paced, rapidly evolving technology landscape and participate in an on-call escalation for incident resolutions.
  • Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the Integration services in production and consistently achieve SLA.
  • Manage the Stakeholder communications and publish the weekly/monthly operational report(s)
  • Implement SRE principles and practices across organization to improve performance and efficiency.
  • Build knowledge base systems that can help to achieve the self-serviceability vision of identifying and resolving incidents across the team.
  • Build a high-trust and fun-loving team that keeps customer at the center of everything

Requirements:

  • A high sense of ownership, passion towards Operations & Support.
  • Bachelor's/Master’s Degree in Computer Science or Technology or other equivalent field experience.​
  • 10+ years of managing and leading a team of software professionals responsible for the operational support of software platform/systems.
  • 5+ years of hands-on experience in Software code troubleshooting, HTTP protocols, Networking, DNS etc.
  • 2+ years of operational understanding of Virtualization, Containerization ecosystem including Docker & Kubernetes.
  • Experience in API Development Platform like TIBCO and API Management platforms like Apigee, MuleSoft is preferrable 
  • Experience optimizing a high traffic website for robustness and scale-out Architectures.
  • Experience in technologies like Cassandra, Kafka, Oracle database & JMS is a plus
  • Expertise in troubleshooting large-scale distributed systems and the tools necessary to do so, such as tcpdump, etc.
  • Experience in providing global support coverage is a plus.
  • Experience with AWS Cloud & Networking Concepts is a plus

Are you looking for better jobs?

Flexible Work . Equal Pay . Leadership Development

Join Our Movement

Are you looking for talents?

Join Us To Diversify Your Team!

Post A Job