Site Reliability Engineer

Job Title: Site Reliability Engineer
Contract Type: Permanent
Location: Selangor
Reference: 743999776213617
Contact Name: Muneem Meah
Job Published: October 31, 2021 23:20

Job Description

Experian is the world’s leading global information services company. During life’s big moments — from buying a home or a car to sending a child to college to growing a business by connecting with new customers — we empower consumers and our clients to manage their data with confidence. We help individuals to take financial control and access financial services, businesses to make smarter decisions and thrive, lenders to lend more responsibly, and organizations to prevent identity fraud and crime.

We have 17,800 people operating across 44 countries, and every day we’re investing in new technologies, talented people and innovation to help all our clients maximize every opportunity. We are listed on the London Stock Exchange (EXPN) and are a constituent of the FTSE 100 Index.

Job Description

Experian® Decision Analytics (DA) integrates predictive data and analytics into valuable business decisions that provide greater insight into decision performance and helps companies keep pace with changing business priorities. By applying expert consulting, analytical tools, software and systems to convert data into valuable business decisions.

Our expertise spans a variety of industries and we provide software to some of the world’s largest finance, telcos and other blue-chip companies. The crown jewel in our software suite is PowerCurve which provides best-in-class decisioning applied across the whole customer life-cycle from customer acquisition to in-life and collections, as well as in fraud detection and identity resolution systems. PowerCurve is able to execute on hosted and cloud platforms.

Decision Analytics has just laid the foundations to move from a collection of individual products towards a consolidated cloud-based platform delivering industry leading capabilities across data, analytics and decisioning, embedding latest technologies such as machine learning, Big Data and AI. We have good momentum and our core product has trebled in revenues over the last 3 years and if we were a stand-alone business we would probably qualify as a unicorn.

As part of the next phase in our growth, we are looking to expand our Site Reliability Engineering team to offer round the global cover. As an organisation we are fully convinced that everything should be automated and that software should run software and believe in the Site Reliability Engineering model. We have established a platform using cutting edge technology, such as Kubernetes, containers, pipelines and monitoring. The candidate will be a forward-looking engineer with an understanding of how SRE will enable operations in the future. You will have broad operations and automation interests and not shy away from the operational aspects of life and understand that the best way to build reliability is to break things often. There are colleagues in other locations to learn from and share knowledge with and a commitment to grow our staff.

The ideal candidate will have experience of operations, a passion for automation and an interest in software development or they will have experience of software development, a passion for automation and an interest in operational excellence. They will be excited by the prospects of learning cutting edge technologies such as OpenShift and Kubernetes, not afraid of dealing with new technologies and working with people from around the globe. If you have incident manager skills and are able to manage rationally and calmly during a crisis that would be an added bonus. There is an expectation to work some weekends as well as some on call requirements. This is the beginning of a growing team and we are looking for individuals to grow with it.

Job Responsibilities:

Primary Accountabilities:

  • Uptime of Experian One – Experian’s Cloud SaaS offering for Decision Analytics.

Significant Demands:

  • Monitoring and Alerting of our platform
  • Responding to incidents and restoring service
  • Over time, gaining a good enough understanding of the systems to efficiently triage issues and find owners for problem resolution
  • An ability to identify an issue or a manual process and ensure that they never occur again
  • Incident management; able to co-ordinate others and be co-ordinated during service disruptions with a focus on restoring availability
  • Ability to write complex queries using various tools
  • Ability to identify high level root cause from symptoms, e.g. Networks, Application, Compute, Storage.
  • Understanding of Kubernetes, Infrastructure as Code, High availability principles.
  • Excellent communication skills in English with colleagues across the globe.

Working Practices and Relationships:

  • Strong relationships with other members of the SRE team, primary based in Kuala Lumpur but also London, Arizona, Sofia
  • Working relationships with colleagues in other departments, third parties who support backing applications.
  • Collaborative relationships with developers, security and architects to influence them to build resilient, maintainable solutions
  • Proficiency in one programming or scripting language and willingness to apply software development best practices to an operational role



Some of the items below

  • Direct experience of supporting complex, highly scaled systems in production
  • Linux knowledge, experience troubleshooting and predicting issues in advance
  • Networking, troubleshooting and monitoring
  • Cloud Native application designs for high performance, scalability and resilience
  • Incident Management and co-ordination, Blameless PIRs
  • Kubernetes, OpenShift, Splunk, Dynatrace, Thousand Eyes, ServiceNow, Jira, Jenkins, Python
  • Java, Cassandra, Redis, RunDeck, MongoDB, Apigee, Okta, PostGres, AWS, Azure, GCP
  • Infrastructure as Code, Git Ops

Key Behaviours:

  • Excellent communication skills. Written and verbal fluency in English is required
  • Highly organised and with a good attention to detail
  • #CustomerObsessed
  • Working across boundaries - geographically, teams, language and cultural
  • Curious and willing and able to learn new technologies and practices
  • Cloud aware, you understand how cloud technologies differ from other technical approaches and are able to explain these to others.
  • Lives and breathes availability and operational excellence in technology

Is this you?

  • You strive to remove repetitive tasks from your daily existence
  • You are a keen following of technology trends
  • You believe that software is to be used not to be admired.
  • You solve for the future as well as the immediate
  • You empower others to deliver
  • You develop trust, you make conflict constructive, create commitment, drive accountability and drive results
  • You are articulate, clear, concise, and you can tailor your approach to the audience
  • You can manage stakeholders at all levels and influence decision making

Additional Information

Experian doesn't just encourage inclusion -- we celebrate it. It's what we call The Power of YOU. We are building a culture where everyone is comfortable bringing their whole self to work. A place where we not only respect our differences and values, but celebrate them in a positive and supportive environment.

Get similar jobs like these by email

By submitting your details you agree to our T&C's

Are you looking for better jobs?

Flexible Work . Equal Pay . Leadership Development

Join Our Movement

Are you looking for talents?

Join Us To Diversify Your Team!

Post A Job