k8s入门系列之pod进阶篇-资源限制调度约束重启策略健康检查

之前有记录文章《k8s入门系列之概念深入篇》，其中记录了一下pod的基本概念以及常用操作。这里做进一步的延伸，一般是针对生产环境来进行操作，测试环境不需要做这么多东西。

1，基本管理

创建　　kubectl create -f xxx.yaml
查询　　kubectl get pod yourPodName　　kubectl describe pod yourPodName
删除　　kubectl delete pod yourPodName kubectl delete -f xxx.yaml
更新　　kubectl replace /path/to/yourNewYaml.yml
说明：存在有deploy或者rs等的，请先删除对应的deploy或者rs。

2，资源限制
集群可能部署多个应用，为了防止某些应用占用资源过高导致集群资源异常，我们一般会针对某些应用的pod资源进行资源限制，这个资源限制是通过resources 的requests和limits来实现。
比如：

    spec:
      containers:
      - image: xxxx
        imagePullPolicy: Always
        name: auth
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: "4"
            memory: 2Gi
          requests:
            cpu: 250m
            memory: 250Mi

resources 对象进行资源限制，具体：requests 要分分配的资源，limits为最高请求的资源值。可以简单理解为初始值和最大值。

3，调度约束
我们可以根据需要将pod分配到指定角色的node上，或具体node节点上，即通过spec.nodeName 或spec.nodeSelector来指定。

spec:
  #nodeName: 10.1.14.25
  nodeSelector:
    role: dev

具体node可以通过kubectl label添加标签。

4，重启策略
重启策略，当Pod中的容器终止退出后，重启容器的策略。这里的所谓Pod的重启，实际上的做法是容器的重建，之前容器中的数据将会丢失，如果需要持久化数据，那么需要使用数据卷进行持久化设置。
Pod支持三种重启策略：
Always（默认策略，当容器终止退出后，总是重启容器）；
OnFailure（当容器终止且异常退出时，重启）；
Never（从不重启）
具体选择哪种策略，需看生产场景的需求

restartPolicy: Always

5，健康检查
提供probe机制，比如如下两种：
livenessProbe:如果检查失败，杀死容器，根据restartPolicy操作。
readnessProbe：如果检查失败，k8s会将pod从service endpoints中剔除。

probe支持如下三种检查：
(1) httpget 发送http请求，返回200或者301 302等状态码即成功

apiVersion: v1
kind: Pod
metadata:
  name: testpod
  labels:
    web: nginx
spec:
  #nodeName: 10.1.14.25
  nodeSelector:
    role: dev
  containers:
  - name: testpod
    image: nginx:1.12
    ports:
    - containerPort: 80
    livenessProbe:
      httpGet:
        path: /index.html
        port: 80
      initialDelaySeconds: 30
      periodSeconds: 10

通过nginx -s stop 以后，describe查看该pod存在重启记录。另外 -f也可以查看实时日志输出，可以看到

(2)exec 执行指定shell 根据返回状态码进行判定。一般针对特定需求来脚本实现。

(3)tcp socket。发起tcp socker 检查，三次握手建立成功则为成功。
用法：

[root@master01 test]# cat test_pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: testpod
  labels:
    web: nginx
spec:
  #nodeName: 10.1.14.25
  nodeSelector:
    role: dev
  containers:
  - name: testpod
    image: nginx:1.12
    ports:
    - containerPort: 80
    livenessProbe:
      tcpSocket:
        port: 80
      initialDelaySeconds: 30
      timeoutSeconds: 1

通过执行nginx -s stop 发现这个pod被重新拉起：

    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 04 Dec 2017 01:21:24 -0500
      Finished:     Tue, 04 Dec 2017 01:32:35 -0500
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     4
      memory:  2G
    Requests:
      cpu:        250m
      memory:     250Mi
    Liveness:     tcp-socket :80 delay=30s timeout=1s period=10s #success=1 #failure=3

6，日志排查

无非就是常用的describe logs 以及exec进行排查，这个实际生产环境中最为常用。借助于events，日志输出或者是进去查看对应应用日志报错进行针对性排查pod。

参考文章：

Kubernetes中Pod的健康检查：http://www.cnblogs.com/cocowool/p/kubernetes_container_probe.html
kubernetes实践之三十七：Pod健康检查： http://blog.itpub.net/28624388/viewspace-2154412/

转载请注明：21运维 » k8s入门系列之pod进阶篇-资源限制调度约束重启策略健康检查

与本文相关的文章