Site Reliability Engineering With AWS for Zero-Trust Platform Leader
A Multinational Technology Company providing integrated zero-trust cloud platforms wanted to upgrade their AWS infrastructure and make Site Reliability Engineering (SRE) the central support team for providing round the clock support.
The scope of challenges included inadequate visibility of metrics and logs for their microservices tech stack due to a lack of uniform standards for monitoring, alerting and ticketing integrations.
The client engaged with Xoriant to upgrade AWS infrastructure, improve observability, add necessary integrations, and optimize security and compliance.
Deployed SRE operations team to manage monitoring, task automation, incidents, and minor changes.
Assessed the environment and automated daily tasks, reducing manual effort up to 40%.
Automated Gateway deployment using CloudFormation service and AWS SDK for CT Gateway deployment in desired region.
Built a unified dashboard for observability. Implemented Prometheus-Grafana and Alert Manager to improve observability of the EKS clusters and key services to meet SLOs.
30% reduction in alerts by removing false positives.
Enhanced environment visibility and insights, enabling data-driven business decisions and optimized cloud spend.
Increased customer satisfaction by reducing TAT from hours and days to minutes.