DBS - Lead Site Reliability Engineer, Global Financial Markets Technology

Location: Singapore
Business sector: IT Support
Job reference: 1007945
Published: 3 days ago
Business Function

Group Technology enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group Tech, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.

Job Summary

As an SRE Lead, you will lead the management and strategic evolution of a critical suite of in-house developed applications: MAGIC, VANA, KQMS, and GRID. These applications are pivotal in supporting the entire Global Financial Markets (GFM) business Back Office and other key functions. You will guide the team in leveraging and advancing various cutting-edge technologies such as OpenShift, Huawei FusionCompute, AWS, Java, Python, SQL, and more, all while aligning with the overarching GFM strategy and SRE principles. This role offers the opportunity to lead a team in a dynamic environment, driving reliability, efficiency, and operational excellence for essential financial market platforms through a Site Reliability Engineering approach.

Responsibilities
  • SRE Leadership & Strategy:
    • Lead, mentor, and develop a high-performing SRE team responsible for the reliability, support, and enhancement of MAGIC, VANA, KQMS, and GRID.
    • Define and drive strategic SRE initiatives, focusing on continuous improvement, automation, and toil reduction for both end-users and the support team.
    • Establish and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for the supported applications to ensure optimal performance and reliability.
    • Champion a collaborative and integrated approach across all sub-teams within the application support group, promoting SRE best practices.
  • Operational Excellence & Reliability Engineering:
    • Oversee the provision of high-quality service and support to business users leveraging MAGIC, VANA, KQMS, and GRID, with an emphasis on system reliability and performance.
    • Ensure robust day-to-day production support for users of MAGIC, VANA, KQMS, and GRID across Back Office functions, focusing on incident response and root cause analysis.
    • Manage and optimize on-call rotations for the team during off-office hours (less frequent but mostly during 6 am-10 am period) to address escalations from the Level 1 team, as well as lead weekend implementations for new product features with a focus on stability.
  • Release Management & Communication:
    • Guide the team in assisting with the rollout of new versions and enhancements of MAGIC, VANA, KQMS, and GRID, strictly adhering to best practices in release management procedures and ensuring system stability.
 
Requirements
  • Technical & Domain Expertise:
    • Demonstrated strong Technical, and Troubleshooting skills for complex enterprise applications, with a proven ability to lead an SRE team in these areas.
    • Comprehensive understanding of market data, including how to price deals for different products and sensitivities.
    • In-depth knowledge of credit risk modules, settlement risk, and market risk.
    • Knowledge of Murex modules (e-tradepad, simulation, Limits, Livebook, MXML, Datamart etc.) would be a significant advantage but not necessary.
    • Hands on experience in technologies such as OpenShift, Huawei FusionCompute, AWS, Java, Python, SQL and more.
    • ITSM or ITIL certification is highly desirable.
  • Soft Skills
    • Exceptional communication, interpersonal, and leadership skills, with the ability to influence stakeholders and motivate an SRE team towards common reliability goals.