Client Background
- A Multinational Technology Company providing integrated zero-trust cloud platforms wanted to upgrade their AWS infrastructure and make Site Reliability Engineering (SRE) the central support team for providing round the clock support.
- The scope of challenges included inadequate visibility of metrics and logs for their microservices tech stack due to a lack of uniform standards for monitoring, alerting and ticketing integrations.
- The client engaged with Xoriant to upgrade AWS infrastructure, improve observability, add necessary integrations, and optimize security and compliance.
Xoriant Solution
- Deployed SRE operations team to manage monitoring, task automation, incidents, and minor changes.
- Assessed the environment and automated daily tasks, reducing manual effort up to 40%.
- Automated Gateway deployment using CloudFormation service and AWS SDK for CT Gateway deployment in desired region.
- Built a unified dashboard for observability. Implemented Prometheus-Grafana and Alert Manager to improve observability of the EKS clusters and key services to meet SLOs.
Business Benefits
- 30% reduction in alerts by removing false positives.
- Enhanced environment visibility and insights, enabling data-driven business decisions and optimized cloud spend.
- Increased customer satisfaction by reducing TAT from hours and days to minutes.