Kubernetes Pod 的生命周期

原文：https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/

Pod phase

Pod 的 status 字段是一个 PodStatus 对象，这个对象有个 phase 字段。
Pod的阶段(phase)是对Pod在其生命周期中所处位置的简单的宏观的概述。这个阶段（phase）字段不是为了全面归纳容器或 Pod的状态，也不打算成为全面的状态机。

Pod 阶段的数量和含义是严格指定的。除了本文档列举的内容外，不应该再假设Pod会有其他的 phase 值。

下面是所有 phase的可能值:

Value	Description
`Pending`	The Pod has been accepted by the Kubernetes system, but one or more of the Container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while. Pod 已经被Kubernetes 系统接受，但是有一个或多个 Container 镜像还没被创建。这包含Pod被调度前和调度后通过网络下载镜像的时间，根据情况可能需要等一会。
`Running`	The Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting. Pod 已经被绑定到一个节点，而且所有的 Containers 已经被创建。至少有一个 Container 是正在运行的，或正处于启动中或重启中的状态。（也就是只要没都挂掉退出，没结束执行）
`Succeeded`	All Containers in the Pod have terminated in success, and will not be restarted. Pod 中所有的容器都成功的退出了（执行Job结束），而且不会被重启）。
`Failed`	All Containers in the Pod have terminated, and at least one Container has terminated in failure. That is, the Container either exited with non-zero status or was terminated by the system. Pod 中所有的 Containers都结束了，但不是所有的都成功，至少有一个失败了。也就是以非零的返回码退出或被系统终止。
`Unknown`	For some reason the state of the Pod could not be obtained, typically due to an error in communicating with the host of the Pod. 由于某种原因无法取得 Pod 的状态，通常是由于Pod所在的宿主机（和master之间）的通信失败导致。（kubelet 无法正确上报状态）

Pod conditions

A Pod has a PodStatus, which has an array of PodConditions through which the Pod has or has not passed. Each element of the PodCondition array has six possible fields:

The lastProbeTime field provides a timestamp for when the Pod condition was last probed.
The lastTransitionTime field provides a timestamp for when the Pod last transitioned from one status to another.
The message field is a human-readable message indicating details about the transition.
The reason field is a unique, one-word, CamelCase reason for the condition’s last transition.
The status field is a string, with possible values “True”, “False”, and “Unknown”.
Thetypefield is a string with the following possible values:
- PodScheduled: the Pod has been scheduled to a node;
- Ready: the Pod is able to serve requests and should be added to the load balancing pools of all matching Services;
- Initialized: all init containers have started successfully;
- Unschedulable: the scheduler cannot schedule the Pod right now, for example due to lack of resources or other constraints;
- ContainersReady: all containers in the Pod are ready.
  Container probes
  A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls aHandler implemented by the Container. There are three types of handlers:
ExecAction: Executes a specified command inside the Container. The diagnostic is considered successful if the command exits with a status code of 0.
TCPSocketAction: Performs a TCP check against the Container’s IP address on a specified port. The diagnostic is considered successful if the port is open.
HTTPGetAction: Performs an HTTP Get request against the Container’s IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.

Each probe has one of three results:

Success: The Container passed the diagnostic.
Failure: The Container failed the diagnostic.
Unknown: The diagnostic failed, so no action should be taken.

The kubelet can optionally perform and react to two kinds of probes on running Containers:

livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.

When should you use liveness or readiness probes?
If the process in your Container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod’s restartPolicy.
If you’d like your Container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy of Always or OnFailure.
If you’d like to start sending traffic to a Pod only when a probe succeeds, specify a readiness probe. In this case, the readiness probe might be the same as the liveness probe, but the existence of the readiness probe in the spec means that the Pod will start without receiving any traffic and only start receiving traffic after the probe starts succeeding. If your Container needs to work on loading large data, configuration files, or migrations during startup, specify a readiness probe.
If you want your Container to be able to take itself down for maintenance, you can specify a readiness probe that checks an endpoint specific to readiness that is different from the liveness probe.
Note that if you just want to be able to drain requests when the Pod is deleted, you do not necessarily need a readiness probe; on deletion, the Pod automatically puts itself into an unready state regardless of whether the readiness probe exists. The Pod remains in the unready state while it waits for the Containers in the Pod to stop.
For more information about how to set up a liveness or readiness probe, see Configure Liveness and Readiness Probes.

Pod and Container status
For detailed information about Pod Container status, see PodStatus and ContainerStatus. Note that the information reported as Pod status depends on the current ContainerState.

Container States
Once Pod is assigned to a node by scheduler, kubelet starts creating containers using container runtime.There are three possible states of containers: Waiting, Running and Terminated. To check state of container, you can use kubectl describe pod [POD_NAME]. State is displayed for each container within that Pod.
Waiting: Default state of container. If container is not in either Running or Terminated state, it is in Waiting state. A container in Waiting state still runs its required operations, like pulling images, applying Secrets, etc. Along with this state, a message and reason about the state are displayed to provide more information.
1
2
3
4
...
State: Waiting
Reason: ErrImagePull
...
Running: Indicates that the container is executing without issues. Once a container enters into Running, postStart hook (if any) is executed. This state also displays the time when the container entered Running state.
1
2
3
4
...
State: Running
Started: Wed, 30 Jan 2019 16:46:38 +0530
...

Terminated: Indicates that the container completed its execution and has stopped running. A container enters into this when it has successfully completed execution or when it has failed for some reason. Regardless, a reason and exit code is displayed, as well as the container’s start and finish time. Before a container enters into Terminated, preStop hook (if any) is executed.

...
      State:          Terminated
        Reason:       Completed
        Exit Code:    0
        Started:      Wed, 30 Jan 2019 11:45:26 +0530
        Finished:     Wed, 30 Jan 2019 11:45:26 +0530
    ...

Pod readiness gate

FEATURE STATE: Kubernetes v1.14 stable
In order to add extensibility to Pod readiness by enabling the injection of extra feedbacks or signals into PodStatus, Kubernetes 1.11 introduced a feature named Pod ready++. You can use the new field ReadinessGate in the PodSpec to specify additional conditions to be evaluated for Pod readiness. If Kubernetes cannot find such a condition in the status.conditions field of a Pod, the status of the condition is default to “False”. Below is an example:

Kind: Pod
...
spec:
  readinessGates:
    - conditionType: "www.example.com/feature-1"
status:
  conditions:
    - type: Ready  # this is a builtin PodCondition
      status: "False"
      lastProbeTime: null
      lastTransitionTime: 2018-01-01T00:00:00Z
    - type: "www.example.com/feature-1"   # an extra PodCondition
      status: "False"
      lastProbeTime: null
      lastTransitionTime: 2018-01-01T00:00:00Z
  containerStatuses:
    - containerID: docker://abcd...
      ready: true
...

The new Pod conditions must comply with Kubernetes label key format. Since the kubectl patch command still doesn’t support patching object status, the new Pod conditions have to be injected through the PATCH action using one of the KubeClient libraries.
With the introduction of new Pod conditions, a Pod is evaluated to be ready only when both the following statements are true:

All containers in the Pod are ready.
All conditions specified in ReadinessGates are “True”.

To facilitate this change to Pod readiness evaluation, a new Pod condition ContainersReady is introduced to capture the old Pod Ready condition.
In K8s 1.11, as an alpha feature, the “Pod Ready++” feature has to be explicitly enabled by setting the PodReadinessGatesfeature gate to true.
In K8s 1.12, the feature is enabled by default.

Restart policy

A PodSpec has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always. restartPolicy applies to all Containers in the Pod. restartPolicy only refers to restarts of the Containers by the kubelet on the same node. Exited Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution. As discussed in the Pods document, once bound to a node, a Pod will never be rebound to another node.

Pod lifetime

In general, Pods do not disappear until someone destroys them. This might be a human or a controller. The only exception to this rule is that Pods with a phase of Succeeded or Failed for more than some duration (determined by terminated-pod-gc-threshold in the master) will expire and be automatically destroyed.
Three types of controllers are available:

Use a Job for Pods that are expected to terminate, for example, batch computations. Jobs are appropriate only for Pods with restartPolicy equal to OnFailure or Never.
Use a ReplicationController, ReplicaSet, or Deployment for Pods that are not expected to terminate, for example, web servers. ReplicationControllers are appropriate only for Pods with a restartPolicy of Always.
Use a DaemonSet for Pods that need to run one per machine, because they provide a machine-specific system service.

All three types of controllers contain a PodTemplate. It is recommended to create the appropriate controller and let it create Pods, rather than directly create Pods yourself. That is because Pods alone are not resilient to machine failures, but controllers are.
If a node dies or is disconnected from the rest of the cluster, Kubernetes applies a policy for setting the phase of all Pods on the lost node to Failed.

Examples

Advanced liveness probe example

Liveness probes are executed by the kubelet, so all requests are made in the kubelet network namespace.

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - args:
    - /server
    image: k8s.gcr.io/liveness
    livenessProbe:
      httpGet:
        # when "host" is not defined, "PodIP" will be used
        # host: my-host
        # when "scheme" is not defined, "HTTP" scheme will be used. Only "HTTP" and "HTTPS" are allowed
        # scheme: HTTPS
        path: /healthz
        port: 8080
        httpHeaders:
        - name: X-Custom-Header
          value: Awesome
      initialDelaySeconds: 15
      timeoutSeconds: 1
    name: liveness

Example states

Pod is running and has one Container. Container exits with success.
- Log completion event.
- IfrestartPolicyis:
  - Always: Restart Container; Pod phase stays Running.
  - OnFailure: Pod phase becomes Succeeded.
  - Never: Pod phase becomes Succeeded.
Pod is running and has one Container. Container exits with failure.
- Log failure event.
- IfrestartPolicyis:
  - Always: Restart Container; Pod phase stays Running.
  - OnFailure: Restart Container; Pod phase stays Running.
  - Never: Pod phase becomes Failed.
Pod is running and has two Containers. Container 1 exits with failure.
- Log failure event.
- IfrestartPolicyis:
  - Always: Restart Container; Pod phase stays Running.
  - OnFailure: Restart Container; Pod phase stays Running.
  - Never: Do not restart Container; Pod phase stays Running.
- If Container 1 is not running, and Container 2 exits:
  - Log failure event.
  - IfrestartPolicyis:
    - Always: Restart Container; Pod phase stays Running.
    - OnFailure: Restart Container; Pod phase stays Running.
    - Never: Pod phase becomes Failed.
Pod is running and has one Container. Container runs out of memory.
- Container terminates in failure.
- Log OOM event.
- IfrestartPolicyis:
  - Always: Restart Container; Pod phase stays Running.
  - OnFailure: Restart Container; Pod phase stays Running.
  - Never: Log failure event; Pod phase becomes Failed.
Pod is running, and a disk dies.
- Kill all Containers.
- Log appropriate event.
- Pod phase becomes Failed.
- If running under a controller, Pod is recreated elsewhere.
Pod is running, and its node is segmented out.
- Node controller waits for timeout.
- Node controller sets Pod phase to Failed.
- If running under a controller, Pod is recreated elsewhere.
  What’s next
Get hands-on experience attaching handlers to Container lifecycle events.
Get hands-on experience configuring liveness and readiness probes.
Learn more about Container lifecycle hooks.

Pod phase

Pod conditions

Container probes

When should you use liveness or readiness probes?

Pod and Container status

Container States