18.3.3 Working with application liveness and readiness
As we saw in chapter 15, the Actuator’s health endpoint provides a status on the health of an application. But that health is only in relation to the health of any external dependencies that the application relies on, such as a database or message broker. Even if an application is perfectly healthy with regard to its database connection, that doesn’t necessarily mean that it’s ready to handle requests or that it is even healthy enough to remain running in its current state.
Kubernetes supports the notion of liveness and readiness probes: indicators of an application’s health that help Kubernetes determine whether traffic should be sent to the application, or if the application should be restarted to resolve some issue. Spring Boot supports liveness and readiness probes via the Actuator health endpoint as subsets of the health endpoint known as health groups.
Liveness is an indicator of whether an application is healthy enough to continue running without being restarted. If an application indicates that its liveness indicator is down, then the Kubernetes runtime can react to that by terminating the pod that the application is running in and starting a new one in its place.
Readiness, on the other hand, tells Kubernetes whether the application is ready to handle traffic. During startup, for instance, an application may need to perform some initialization before it can start handling requests. During this time, the application’s readiness may show that it’s down. During this time, the application is still alive, so Kubernetes won’t restart it. But Kubernetes will honor the readiness indicator by not sending requests to the application. Once the application has completed initialization, it can set the readiness probe to indicate that it is up, and Kubernetes will be able to route traffic to it.
ENABLING LIVENESS AND READINESS PROBES
To enable liveness and readiness probes in your Spring Boot application, you must set management.health.probes.enabled to true. In an application.yml file, that will look like this:
management:
health:
probes:
enabled: trueOnce the probes are enabled, a request to the Actuator health endpoint will look something like this (assuming that the application is perfectly healthy):
{
"status": "UP",
"groups": [
"liveness",
"readiness"
]
}On its own, the base health endpoint doesn’t tell us much about the liveness or readiness of an application. But a request to /actuator/health/liveness or /actuator/health/readiness will provide the liveness and readiness state of the application. In either case, an up status will look like this:
{
"status": "UP"
}On the other hand, if either readiness or liveness is down, then the result will look like this:
{
"status": "DOWN"
}In the case of a down readiness status, Kubernetes will not direct traffic to the application. If the liveness endpoint indicates a down status, then Kubernetes will attempt to remedy the situation by deleting the pod and starting a new instance in its place.
CONFIGURING LIVENESS AND READINESS PROBES IN THE DEPLOYMENT
With the Actuator producing liveness and readiness status on these two endpoints, all we need to do now is tell Kubernetes about them in the deployment manifest. The tail end of the following deployment manifest shows the configuration necessary to let Kubernetes know how to check on liveness and readiness:
apiVersion: apps/v1
kind: Deployment
metadata:
name: taco-cloud-deploy
labels:
app: taco-cloud
spec:
replicas: 3
selector:
matchLabels:
app: taco-cloud
template:
metadata:
labels:
app: taco-cloud
spec:
containers:
- name: taco-cloud-container
image: tacocloud/tacocloud:latest
livenessProbe:
initialDelaySeconds: 2
periodSeconds: 5
httpGet:
path: /actuator/health/liveness
port: 8080
readinessProbe:
initialDelaySeconds: 2
periodSeconds: 5
httpGet:
path: /actuator/health/readiness
port: 8080This tells Kubernetes, for each probe, to make a GET request to the given path on port 8080 to get the liveness or readiness status. As configured here, the first request should happen 2 seconds after the application pod is running and every 5 seconds thereafter.
MANAGING LIVENESS AND READINESS
How do the liveness and readiness statuses get set? Internally, Spring itself or some library that the application depends on can set the statuses by publishing an availability change event. But that ability isn’t limited to Spring and its libraries; you can also write code in your application that publishes these events.
For example, suppose that you want to delay the readiness of your application until some initialization has taken place. Early on in the application lifecycle, perhaps in an ApplicationRunner or CommandLineRunner bean, you can publish a readiness state to refuse traffic like this:
@Bean
public ApplicationRunner disableLiveness(ApplicationContext context) {
return args -> {
AvailabilityChangeEvent.publish(context, ReadinessState.REFUSING_TRAFFIC);
};
}Here, the ApplicationRunner is given an instance of the Spring application context as a parameter to the @Bean method. This is necessary because the static publish() method needs it to publish the event. Once initialization is complete, the application’s readiness state can be updated to accept traffic in a similar way, as shown next:
AvailabilityChangeEvent.publish(context, ReadinessState.ACCEPTING_TRAFFIC);Liveness status can be updated in very much the same way. The key difference is that instead of publishing ReadinessState.ACCEPTING_TRAFFIC or ReadinessState.REFUSING_TRAFFIC, you’ll publish LivenessState.CORRECT or LivenessState.BROKEN. For example, if in your application code you detect an unrecoverable fatal error, your application can request that it be killed and restarted by publishing Liveness.BROKEN like this:
AvailabilityChangeEvent.publish(context, LivenessState.BROKEN);Shortly after this event is published, the liveness endpoint will indicate that the application is down, and Kubernetes will take action by restarting the application. This gives you very little time to publish a LivenessState.CORRECT event. But if you determine that, in fact, the application is healthy after all, then you can undo the broken event by publishing a new event like this:
AvailabilityChangeEvent.publish(context, LivenessState.CORRECT);As long as Kubernetes hasn’t hit your liveness endpoint since you set the status to broken, your application can chalk this up as a close call and keep serving requests.
