company-logo-image

SA, SRE Observability Platform Engineer, SRE & Governance, Group Technology

ashley-avatar-image

AI-generated summary

beta

This job is for a Platform SRE Engineer at DBS Bank. You might like this job because you’ll enhance system reliability and performance using cutting-edge tools, while automating tasks to boost team productivity. Perfect for tech enthusiasts!

Undisclosed

Singapore-DBS Asia Hub, Central

Job Description

 

Job Objective

DBS Bank is looking for a Platform SRE Engineer with experience working on enterprise level data engineering, analytics, and observability applications. The SRE engineer would be responsible for ensuring high availability of the platform services and perform continuous improvements to increase the platform’s efficiency and resiliency. The SRE engineer will also perform automation development tasks to remove toil and increase the team’s productivity.

Roles and Responsibilities:

  1. Develop monitoring and onboarding guidelines for various applications using observability platform stack, ensuring accurate monitoring and data collection.
  2. Implement Observability standards, best practices, operations and processes for the Enterprise in AppDynamics & other observability tools
  3. Automate routine tasks and reporting processes using APIs and scripting, reducing manual effort and improving efficiency in AppDynamics & other observability tools
  4. Identify and resolve performance issues through detailed analysis of transaction traces, application logs, and system metrics.
  5. Collaborate with stakeholders to define performance metrics and monitoring requirements aligned with business goals.
  6. Contribute to internal knowledge bases, create documentation, and share insights with the team to promote a culture of learning and collaboration.
  7. Design and implement monitoring solutions to track application performance, identifying bottlenecks, capacity planning and optimising system efficiency.
  8. Develop custom dashboards and reports to provide actionable insights and drive decision-making processes.
  9. Collaborate with development and operations teams to integrate Observability platform stack with CI/CD pipelines and other DevOps tools.
  10. Configure and fine-tune alerts to proactively detect and address performance issues before they impact end-users.
  11. Continuously review and enhance monitoring processes and methodologies to improve efficiency and effectiveness.
  12. Work with application teams to develop long-term monitoring strategies that align with business goals and technology roadmaps.
  13. Create data retention polices and access controls (RBAC) to manage user permissions.
  14. Perform application maintenance, patching, upgrading controller versions, agents etc and ensure EOS/EOL is maintained.

Deliverables:

  1. Ensure on-time delivery of tasks and projects.
  2. Ensure continuous uptime of applications and services.
  3. Ensure no security or audit issues.

Job Dimensions:

  1. Comply to bank standards to track and follow up on the assigned projects.
  2. Cover all areas in application and infrastructure operations of the platform.

Education and Relevant Experience:

  1. You should be a university graduate (computer science or related field) with good experience working with contemporary technologies and scripting languages.
  2. Strong communication skills and ability to explain protocol and processes with team and management
  3. A passion for learning and using new technologies in the open-source communities.
  4. A passion for coding.

Functional / Technical Competencies:

  1. Min 10 years of IT work experience.
  2. Working knowledge in AppDynamics, ELK Stack, Grafana, Open Telemetry (OTEL)
  3. In-depth experience in Unix/Linux/Shell/Python scripting with quality, scalability, and extensibility.
  4. Experience in triaging and troubleshooting application problems quickly in monitoring tools by using various techniques - Transaction snapshots, Diagnostic Sessions, Data Collectors
  5. Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency.
  6. Knowledge in Confluent Kafka, Prometheus & other APM tools (Dynatrace, Datadog, New Relic, Splunk) is a plus.
  7. Knowledge in AI/ML capabilities to automate RCA’s and shorter MTTR when issues arise.
  8. Good understanding of Network routing, Load balancing and Networking protocols; a base knowledge of TCP/IP, with an understanding of HTTP and DNS
  9. Ability to contribute to discussions on design and strategy.
  10. Good problem diagnosis and creative problem-solving skills
  11. Experience in automation tools and CICD – Jenkins, Ansible
-en 


Job Requirements


Company Benefits

Comprehensive Medical Benefits

We provide a variety of medical coverage for our employees.

Low Interest Rate Loans

We give low interest rates for our employees!

FlexiBenefits

We provide FlexiBenefits for our employees to ensure their work-life balance!


Additional Info

Experience Level

0 - 10 Years of Experience

Job Specialisation


Company Profile

DBS Bank-logo-image

DBS Bank

OUR ROOTS AS A DIFFERENT KIND OF BANK Born with a mission unlike any other, we were founded with a unique purpose to help develop a young nation. Since then, we've grown alongside Singapore to become Best Bank in the World. Be it our successes or war stories, the untold moments or widely-celebrated ones, or the bonds we've built within and beyond our bank — here, we aim to preserve these memories that capture who we...
Upload Resume