As a member of the Links 'R' Us Site Reliability Engineering (SRE) team,
I need to be able to monitor the health of all Links 'R' Us services,
so as to detect and address issues that cause degraded service performance.
The acceptance criteria for this user story are as follows:
- All Links 'R' Us services should periodically submit health- and performance-related metrics to a centralized metrics collection system.
- A monitoring dashboard is created for each service.
- A high-level monitoring dashboard tracks the overall system health.
- Metric-based alerts are defined and linked to a paging service. Each alert comes with its own playbook with a set of steps that need to be performed by a member of the SRE team that is on-call.