SLI’s and SLO’s, how to wrap your head around it and actually use them to calculate availability

service_health{status="healthy",server="858b678dd8-f2vc6", namespace="dev",service="my-service"} 1service_health{status="unhealthy",server="858b678dd8-f2vc6", namespace="dev",service="my-service"} 0
service_http_request_endpoint_bucket{route="/API/MYENDPOINT",server="858b678dd8-f2vc6",namespace="dev",service="my-service",le="100"} 3
1) service_health{status=”healthy”} > 02) histogram_quantile(0.95, sum by(namespace, service, le) (rate(service_http_request_endpoint_bucket{route=”/API/MYENDPOINT”,service=”my-service”}[5m]))) < 100ms
  1. If the 95th percentile of /API/MYENDPOINT endpoint for the last 5m was under 100ms, service is available
  2. If the service health metric with the healthy label is greater than 0, service is available
service:sli:conformity:status{name="health", namespace="dev", service="my-service"} 1
service:sli:conformity:status{name="latency:myendpoint:95:100:ms", namespace="dev", service="my-service"} 1

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store