AWS EKS, Kubernetes Health check, Livness probe & Readiness probe

태그

kubernates

K8S

health check

Liveness

Readiness

Probe

Beginer

EKS

공개여부

작성일자

2022/09/22

기본적으로 K8S 는 container 가 crashed 되거나 기타 여러 이유로 restart 될 수 있음을 전재로 한다.

그래서 Liveness 와 Readiness probe 를 통해 실행되고 있는 application 을 확인하고 traffic 을 보내기 위한 pod 들을 특정할 수 있으며, 필요하다면 pod 를 재시작 하도록 만들 수 있다.

이번 섹션에선 liveness and readiness probes 페이지에서 정의된 내용을 이해하고, 다른 상태의 pod 를 확인한다.

아래의 설명은 probes 가 동작하는 상위 레벨의 설명이다.

•

Liveness probe

◦

K8S의 pod가 죽었는지, 살아있는지 확인하기 위한 기능이다.

◦

Pod 는 다양한 이유로 인해 dead state 가 될 수 있다.

◦

K8S는 pod 를 죽이고, 다시 생성하며 이 시간동안 liveness probe 를 pass 할 수 없다.

•

Readiness probe

◦

K8S 가 pod 에 traffic 을 보내도 좋은지 ready 상태를 확인하기 위한 기능이다.

◦

만약에 readiness probe 가 실패한다면, traffic 이 해당 pod 로 보내지지 않는다.

파드를 죽일지 살려야 할지는 restartPolicy 를 통해 결정할 수 있다.

Configure Liveness probe

headcheck 에 대한 새로운 directory 를 만들자

mkdir -p ~/environment/healthchecks
YAML
복사

다음의 code block 을 실행하여 liveness-app.yaml 을 만든다.

이 config 에선 livenessProbe 필드는 kubelet 이 어떻게 container 가 healthy 인지 아닌지를 판단하기 위한 필드가 정의되어 있다.

•

periodSeconds: 얼마나 자주 container 를 확인해야 할까?

◦

아래의 예시에선 5초를 주기로 container 를 확인한다.

•

initialDelaySeconds: kubelet 이 초반에 얼마나 기다려야 하는지 명시한다.

◦

여기서는 pod 생성되고 5초간 기다리고 health 체크를 시작한다.

•

Probe 가 동작하게 하기 위해선 /health 에서 SUCCESS code 를 반환해야 한다.

◦

200 OK SUCCESS 가 반환되면 pod 가 healthy state 라고 판단한다.

◦

만약 healty 가 아니라고 최종적인 판단을 하면 pod 를 죽이고 다시 시작한다.

cat <<EoF > ~/environment/healthchecks/liveness-app.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-app
spec:
  containers:
  - name: liveness
    image: brentley/ecsdemo-nodejs
    livenessProbe:
      httpGet:
        path: /health
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 5
EoF
YAML
복사

그럼 이 manifest 에 해당하는 app 을 생성해보고 잘 반영 되었는지 까지 확인해보자.

kubectl apply -f ~/environment/healthchecks/liveness-app.yaml
YAML
복사

kubectl get pod liveness-app
YAML
복사

NAME           READY   STATUS    RESTARTS   AGE
liveness-app   1/1     Running   0          24s
YAML
복사

여기서 살펴봐야 할 값은 RESTARTS 이다.

kubectl 에서 제공하는 describe 를 통해 event 항목의 history 를 확인하여 probe 가 failure 인지 restart 인지 확인할 수 있다.

kubectl describe pod liveness-app
YAML
복사

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  10m   default-scheduler  Successfully assigned default/liveness-app to ip-192-168-144-242.ap-northeast-2.compute.internal
  Normal  Pulling    10m   kubelet            Pulling image "brentley/ecsdemo-nodejs"
  Normal  Pulled     10m   kubelet            Successfully pulled image "brentley/ecsdemo-nodejs" in 2.186801094s
  Normal  Created    10m   kubelet            Created container liveness
  Normal  Started    10m   kubelet            Started container liveness
YAML
복사
가장 하단에 보면 Events 항목이 존재하며 여기서 history 를 확인할 수 있다.

Introduce a Failure

이제 다음의 command 를 실행하여 nodejs application 에 SIGUSR1 시그널을 보낼 것이다.

이 command 를 입력하면 process에게 application 에 kill 명령을 실행하게 된다.

kubectl exec -it liveness-app -- /bin/kill -s SIGUSR1 1
YAML
복사

그럼 pod 의 application 은 자연스럽게 죽게 되고 pod 가 재시작되며, 이것이 매우 빠른 상황이면 events 의 history 를 통해서 자세하게 확인할 수 있다.

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  14m                 default-scheduler  Successfully assigned default/liveness-app to ip-192-168-144-242.ap-northeast-2.compute.internal
  Normal   Pulled     13m                 kubelet            Successfully pulled image "brentley/ecsdemo-nodejs" in 2.186801094s
  Warning  Unhealthy  91s (x3 over 101s)  kubelet            Liveness probe failed: Get "http://192.168.136.153:3000/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    91s                 kubelet            Container liveness failed liveness probe, will be restarted
  Normal   Pulling    61s (x2 over 14m)   kubelet            Pulling image "brentley/ecsdemo-nodejs"
  Normal   Created    59s (x2 over 13m)   kubelet            Created container liveness
  Normal   Started    59s (x2 over 13m)   kubelet            Started container liveness
  Normal   Pulled     59s                 kubelet            Successfully pulled image "brentley/ecsdemo-nodejs" in 2.181360453s
YAML
복사

당연히 nodejs 가 죽게 되면 healty 요청에 대해 SUCCESS가 아닌 결과가 반환된다.

그러면 pod 는 재시작된다.

다음과 같이 pod 상태는 running 이 된다.

NAME           READY   STATUS    RESTARTS   AGE
liveness-app   1/1     Running   1          14m
YAML
복사

Challenge

어떻게 container health check 를 할까?

kubectl logs liveness-app
YAML
복사

또한 이전 예시의 container log 는 --previous flag 를 통해 container 가 crashed인지 아닌지를 확인할 수 있다.

kubectl logs liveness-app --previous
YAML
복사

CONFIGURE READINESS PROBE

다음의 code block 을 실행하면 readinessProbe 에 대한 manifest 가 정의된다.

readinessProbe 는 linux command 가 상태 확인으로 config 할 수 잇는 방법을 명시한다.

파드는 다음의 기능을 수행한다.

/tmp/healthy 에 readiness probe 는 비어있는 파일을 실행하고

24시간동안 sleep 모드가 된다. (pod 가 살아있게 된다.)

그럼 readinessProbe 는 cat /tmp/healthy 를 실행해 파일을 catch 하여 healthy 상태를 확인한다.

cat <<EoF > ~/environment/healthchecks/readiness-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: readiness-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: readiness-deployment
  template:
    metadata:
      labels:
        app: readiness-deployment
    spec:
      containers:
      - name: readiness-deployment
        image: alpine
        command: ["sh", "-c", "touch /tmp/healthy && sleep 86400"]
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          initialDelaySeconds: 5
          periodSeconds: 3
EoF
YAML
복사

그럼 readindess probe 를 테스트 해보자.

kubectl apply -f ~/environment/healthchecks/readiness-deployment.yaml
YAML
복사

이 명령을 입력하고 확인하면 3개의 replica 가 ContainerCreating 상태임을 확인할 수 있다.

kubectl get pods -l app=readiness-deployment
YAML
복사

NAME                                    READY   STATUS              RESTARTS   AGE
liveness-app                            1/1     Running             1          21m
readiness-deployment-548975dcc5-7gl5z   0/1     ContainerCreating   0          4s
readiness-deployment-548975dcc5-8b7vt   0/1     ContainerCreating   0          4s
readiness-deployment-548975dcc5-dvdf7   0/1     ContainerCreating   0          4s
YAML
복사
5초 이전에는 readiness probe 를 체크하지 않는다.

NAME                                    READY   STATUS    RESTARTS   AGE
readiness-deployment-548975dcc5-7gl5z   1/1     Running   0          55s
readiness-deployment-548975dcc5-8b7vt   1/1     Running   0          55s
readiness-deployment-548975dcc5-dvdf7   1/1     Running   0          55s
YAML
복사
이후에 health 가 SUCCESS로 확인되어 Running 상태로 구분된다.

그개의 replica 가 모두 살아있는지 확인하기 위해 다음의 명령어를 실행해보자.

kubectl describe deployment readiness-deployment | grep Replicas:
YAML
복사

Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
YAML
복사

Introduce a Failure

3개중 하나의 pod 를 뽑아서 다음의 명령어를 실행하여 /tmp/healthy 파일을 삭제하자

삭제되면 cat 명령이 실패하기 때문에 readiness probe 를 실패할 것이다.

kubectl exec -it <YOUR-READINESS-POD-NAME> -- rm /tmp/healthy
YAML
복사
예시 kubectl exec -it readiness-deployment-548975dcc5-8b7vt -- rm /tmp/healthy

Pod 의 list 를 조회하여 파일이 삭제되었는지 그래서 readiness probe 가 동작하는지 확인하기 위해 다음의 명령어를 실행해보자.

kubectl get pods -l app=readiness-deployment
YAML
복사

NAME                                    READY   STATUS    RESTARTS   AGE
readiness-deployment-548975dcc5-7gl5z   1/1     Running   0          5m45s
readiness-deployment-548975dcc5-8b7vt   0/1     Running   0          5m45s
readiness-deployment-548975dcc5-dvdf7   1/1     Running   0          5m45s
YAML
복사

Traffic 은 두 번째 pod 로 route 되지 않는다. ready column 이 두 번째 파드의 readiness probe 가 실패했으므로 not ready 로 판단하기 때문이다.

Traffic 이 수신 가능한 replica 를 확인하기 위해 다음의 명령어를 실행해보자.

kubectl describe deployment readiness-deployment | grep Replicas:
YAML
복사

Replicas:               3 desired | 3 updated | 3 total | 2 available | 1 unavailable
YAML
복사

Readiness probe 가 실패하면 service 의 endpoint 에서 제거된다.

Challenge

어떻게 unavaliable 로 반환된 pod 를 복구할 수 있을까?

Readiness probe 는 /tmp/healthy 파일이 생성 되었는지 기준으로 판단하기 때문에 다시 파일을 만들어주자

kubectl exec -it <YOUR-READINESS-POD-NAME> -- touch /tmp/healthy
YAML
복사

그리고 pod 의 상태를 확인해보자

kubectl get pods -l app=readiness-deployment
YAML
복사

NAME                                    READY   STATUS    RESTARTS   AGE
readiness-deployment-548975dcc5-7gl5z   1/1     Running   0          10m
readiness-deployment-548975dcc5-8b7vt   1/1     Running   0          10m
readiness-deployment-548975dcc5-dvdf7   1/1     Running   0          10m
YAML
복사

kubectl describe deployment readiness-deployment | grep Replicas:
YAML
복사

Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
YAML
복사

Cleanup

Liveness probe 는 HTTP request 를 통한 예시로 설명하였고, Readiness probe 는 linux command 를 통해 health check 를 수행했다.

동일하게 TCP request를 통해서도 이를 확인할 수 있다. documentation 참조

kubectl delete -f ~/environment/healthchecks/liveness-app.yaml
kubectl delete -f ~/environment/healthchecks/readiness-deployment.yaml
YAML
복사

AWS EKS, Kubernetes Health check, Livness probe & Readiness probe

Configure Liveness probe

Introduce a Failure

Challenge

CONFIGURE READINESS PROBE

Introduce a Failure

Challenge

Cleanup

Reference