Job title: Site Reliability Engineer (Intermediate)
Job type: Full-Time
Emp type: Full-time
Industry: Consulting and Professional Services
Functional Expertise: Engineering
Skills: SRE
Job published: 2025-01-28
Job ID: 41937

Job Description

KEY RESPONSIBILITIES:

 

Your efforts will directly contribute to enabling the wider engineering effort to innovate faster and with safety while ensuring key performance metrics are met or exceeded. Participate in troubleshooting and resolving issues as they arise and develop monitoring and alerting tools to prevent them recurring. Engage with both internal engineering teams and external third-parties as required to resolve and prevent issues. Team members skills and experience are typically aligned to the most appropriate ecosystem:

    • Azure and majority Windows
    • AWS and majority Linux

You will:

  • Ensure CLIENTS’s multiple systems are operating at peak efficiency, performance and uptime.
  • Assist in providing root cause analysis of complex faults in a large distributed system, and work with multiple teams to see the issue through to resolution and improvements.
  • Participate in ongoing technology refresh initiatives or special projects as required.
  • Use best of breed tooling to support you in ensuring operational stability and to minimise customer disruption.
  • Assist in creating metric collection and visualisation tools to allow you to assist in capacity-planning and trouble-shooting, and take pre-emptive actions in support of overall system stability.
  • Contribute to monthly reporting on platform cost, capacity, incidents and performance.
  • Work with team to carry out deployments of new releases of CLIENTS’s SaaS applications to production and other environments with minimal to no impact on customers, and refine and enhance the tools to achieve this.
  • Identify and automate tasks wherever possible to maintain or increase our high server to engineer ratio moving forward.
  • Participate in on-call roster to ensure uptime is exceeded and platform owned services are operating effectively. 
  • Conduct performance and reliability tests to establish limits, bottlenecks or single points of failure and resolve them.
  • From time to time be called on to work flexible hours to complete tasks that would otherwise disrupt a great customer experience.
  • Keep up to date with the cutting edge of modern web operations, and continually strive to push the CLIENT operations practice forward.
  • Provided day to day support to the engineering teams and internal processes to enable and upskill throughout engineering.
  • Strive for best-costs across CLIENTs cloud infrastructure. 
  • Participate in an Agile team and take shared responsibility for analysing work to be carried out, estimating effort required and identifying risks associated with carrying out the work.
  • Strive to understand the behavior of CLIENT systems in their entirety, from development processes, to manufacturing, to the day-to-day operation of the application.
  • Actively participate in technical solution reviews with development teams and internal stake holders like architecture and security.

 

Work Set up: Hybrid, 2-3x a week onsite

 

Work Locations: Metro Manila (We have yet to decide on the office location, but choices are BGC, Makati and Ortigas)

 

Work Schedule: Anywhere starting 6AM-8AM (New Zealand time)

Apply with indeed
File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB
File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB