關聯博客《kubernetes/k8s CRI 分析-容器運行時接口分析》

《kubernetes/k8s CRI分析-kubelet創建pod分析》

之前的博文先對 CRI 做了介紹,然後對 kubelet CRI 相關源碼包括 kubelet 組件 CRI 相關啟動參數分析、CRI 相關 interface/struct 分析、CRI 相關初始化分析、kubelet調用CRI創建pod分析 4 個部分進行了分析,沒有看的小夥伴,可以點擊上面的鏈接去看一下。

把之前博客分析到的 CRI 架構圖再貼出來一遍。

本篇博文將對 kubelet 調用 CRI 删除 pod 做分析。

kubelet中CRI相關的源碼分析

kubelet的CRI源碼分析包括如下幾部分:

(1)kubelet CRI相關啟動參數分析;

(2)kubelet CRI相關interface/struct分析;

(3)kubelet CRI初始化分析;

(4)kubelet調用CRI創建pod分析;

(5)kubelet調用CRI删除pod分析。

上兩篇博文先對前四部分做了分析,本篇博文將對kubelet調用CRI删除pod做分析。

基於tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4

5.kubelet調用CRI删除pod分析

kubelet CRI删除pod調用流程

下面以kubelet dockershim删除pod調用流程為例做一下分析。

kubelet通過調用dockershim來停止容器,而dockershim則調用docker來停止容器,並調用CNI來删除pod網絡。

圖1:kubelet dockershim删除pod調用圖示

dockershim屬於kubelet內置CRI shim,其餘remote CRI shim的創建pod調用流程其實與dockershim調用基本一致,只不過是調用了不同的容器引擎來操作容器,但一樣由CRI shim調用CNI來删除pod網絡。

下面進行詳細的源碼分析。

直接看到kubeGenericRuntimeManagerKillPod方法,調用CRI删除pod的邏輯將在該方法裏觸發發起。

從該方法代碼也可以看出,kubelet删除一個pod的邏輯為:

(1)先停止屬於該pod的所有containers;

(2)然後再停止pod sandbox容器。

注意點:這裏只是停止容器,而删除容器的操作由kubelet的gc來做。

// pkg/kubelet/kuberuntime/kuberuntime_manager.go
// KillPod kills all the containers of a pod. Pod may be nil, running pod must not be.
// gracePeriodOverride if specified allows the caller to override the pod default grace period.
// only hard kill paths are allowed to specify a gracePeriodOverride in the kubelet in order to not corrupt user data.
// it is useful when doing SIGKILL for hard eviction scenarios, or max grace period during soft eviction scenarios.
func (m *kubeGenericRuntimeManager) KillPod(pod *v1.Pod, runningPod kubecontainer.Pod, gracePeriodOverride *int64) error {
err := m.killPodWithSyncResult(pod, runningPod, gracePeriodOverride)
return err.Error()
} // killPodWithSyncResult kills a runningPod and returns SyncResult.
// Note: The pod passed in could be *nil* when kubelet restarted.
func (m *kubeGenericRuntimeManager) killPodWithSyncResult(pod *v1.Pod, runningPod kubecontainer.Pod, gracePeriodOverride *int64) (result kubecontainer.PodSyncResult) {
killContainerResults := m.killContainersWithSyncResult(pod, runningPod, gracePeriodOverride)
for _, containerResult := range killContainerResults {
result.AddSyncResult(containerResult)
} // stop sandbox, the sandbox will be removed in GarbageCollect
killSandboxResult := kubecontainer.NewSyncResult(kubecontainer.KillPodSandbox, runningPod.ID)
result.AddSyncResult(killSandboxResult)
// Stop all sandboxes belongs to same pod
for _, podSandbox := range runningPod.Sandboxes {
if err := m.runtimeService.StopPodSandbox(podSandbox.ID.ID); err != nil {
killSandboxResult.Fail(kubecontainer.ErrKillPodSandbox, err.Error())
klog.Errorf("Failed to stop sandbox %q", podSandbox.ID)
}
} return
}

5.1 m.killContainersWithSyncResult

m.killContainersWithSyncResult作用:停止屬於該pod的所有containers。

主要邏輯:起與容器數量相同的goroutine,調用m.killContainer來停止容器。

// pkg/kubelet/kuberuntime/kuberuntime_container.go
// killContainersWithSyncResult kills all pod's containers with sync results.
func (m *kubeGenericRuntimeManager) killContainersWithSyncResult(pod *v1.Pod, runningPod kubecontainer.Pod, gracePeriodOverride *int64) (syncResults []*kubecontainer.SyncResult) {
containerResults := make(chan *kubecontainer.SyncResult, len(runningPod.Containers))
wg := sync.WaitGroup{} wg.Add(len(runningPod.Containers))
for _, container := range runningPod.Containers {
go func(container *kubecontainer.Container) {
defer utilruntime.HandleCrash()
defer wg.Done() killContainerResult := kubecontainer.NewSyncResult(kubecontainer.KillContainer, container.Name)
if err := m.killContainer(pod, container.ID, container.Name, "", gracePeriodOverride); err != nil {
killContainerResult.Fail(kubecontainer.ErrKillContainer, err.Error())
}
containerResults <- killContainerResult
}(container)
}
wg.Wait()
close(containerResults) for containerResult := range containerResults {
syncResults = append(syncResults, containerResult)
}
return
}

5.1.1 m.killContainer

m.killContainer方法主要是調用m.runtimeService.StopContainer

runtimeService即RemoteRuntimeService,實現了CRI shim客戶端-容器運行時接口RuntimeService interface,持有與CRI shim容器運行時服務端通信的客戶端。所以調用m.runtimeService.StopContainer,實際上等於調用了CRI shim服務端的StopContainer方法,來進行容器的停止操作。

// pkg/kubelet/kuberuntime/kuberuntime_container.go
// killContainer kills a container through the following steps:
// * Run the pre-stop lifecycle hooks (if applicable).
// * Stop the container.
func (m *kubeGenericRuntimeManager) killContainer(pod *v1.Pod, containerID kubecontainer.ContainerID, containerName string, message string, gracePeriodOverride *int64) error {
... klog.V(2).Infof("Killing container %q with %d second grace period", containerID.String(), gracePeriod) err := m.runtimeService.StopContainer(containerID.ID, gracePeriod)
if err != nil {
klog.Errorf("Container %q termination failed with gracePeriod %d: %v", containerID.String(), gracePeriod, err)
} else {
klog.V(3).Infof("Container %q exited normally", containerID.String())
} m.containerRefManager.ClearRef(containerID) return err
}
m.runtimeService.StopContainer

m.runtimeService.StopContainer方法,會調用r.runtimeClient.StopContainer,即利用CRI shim客戶端,調用CRI shim服務端來進行停止容器的操作。

分析到這裏,kubelet中的CRI相關調用就分析完畢了,接下來將會進入到CRI shim(以kubelet內置CRI shim-dockershim為例)裏進行停止容器的操作分析。

// pkg/kubelet/remote/remote_runtime.go
// StopContainer stops a running container with a grace period (i.e., timeout).
func (r *RemoteRuntimeService) StopContainer(containerID string, timeout int64) error {
// Use timeout + default timeout (2 minutes) as timeout to leave extra time
// for SIGKILL container and request latency.
t := r.timeout + time.Duration(timeout)*time.Second
ctx, cancel := getContextWithTimeout(t)
defer cancel() r.logReduction.ClearID(containerID)
_, err := r.runtimeClient.StopContainer(ctx, &runtimeapi.StopContainerRequest{
ContainerId: containerID,
Timeout: timeout,
})
if err != nil {
klog.Errorf("StopContainer %q from runtime service failed: %v", containerID, err)
return err
} return nil
}

5.1.2 r.runtimeClient.StopContainer

接下來將會以dockershim為例,進入到CRI shim來進行停止容器操作的分析。

前面kubelet調用r.runtimeClient.StopContainer,會進入到dockershim下面的StopContainer方法。

// pkg/kubelet/dockershim/docker_container.go
// StopContainer stops a running container with a grace period (i.e., timeout).
func (ds *dockerService) StopContainer(_ context.Context, r *runtimeapi.StopContainerRequest) (*runtimeapi.StopContainerResponse, error) {
err := ds.client.StopContainer(r.ContainerId, time.Duration(r.Timeout)*time.Second)
if err != nil {
return nil, err
}
return &runtimeapi.StopContainerResponse{}, nil
}
ds.client.StopContainer

主要是調用d.client.ContainerStop

// pkg/kubelet/dockershim/libdocker/kube_docker_client.go
// Stopping an already stopped container will not cause an error in dockerapi.
func (d *kubeDockerClient) StopContainer(id string, timeout time.Duration) error {
ctx, cancel := d.getCustomTimeoutContext(timeout)
defer cancel()
err := d.client.ContainerStop(ctx, id, &timeout)
if ctxErr := contextError(ctx); ctxErr != nil {
return ctxErr
}
return err
}
d.client.ContainerStop

構建請求參數,向docker指定的url發送http請求,停止容器。

// vendor/github.com/docker/docker/client/container_stop.go
// ContainerStop stops a container. In case the container fails to stop
// gracefully within a time frame specified by the timeout argument,
// it is forcefully terminated (killed).
//
// If the timeout is nil, the container's StopTimeout value is used, if set,
// otherwise the engine default. A negative timeout value can be specified,
// meaning no timeout, i.e. no forceful termination is performed.
func (cli *Client) ContainerStop(ctx context.Context, containerID string, timeout *time.Duration) error {
query := url.Values{}
if timeout != nil {
query.Set("t", timetypes.DurationToSecondsString(*timeout))
}
resp, err := cli.post(ctx, "/containers/"+containerID+"/stop", query, nil, nil)
ensureReaderClosed(resp)
return err
}

5.2 m.runtimeService.StopPodSandbox

m.runtimeService.StopPodSandbox中的runtimeService即RemoteRuntimeService,其實現了CRI shim客戶端-容器運行時接口RuntimeService interface,持有與CRI shim容器運行時服務端通信的客戶端。所以調用m.runtimeService.StopPodSandbox,實際上等於調用了CRI shim服務端的StopPodSandbox方法,來進行pod sandbox的停止操作。

分析到這裏,kubelet中的CRI相關調用就分析完畢了,接下來將會進入到CRI shim(以kubelet內置CRI shim-dockershim為例)裏進行停止pod sandbox的分析。

// pkg/kubelet/remote/remote_runtime.go
// StopPodSandbox stops the sandbox. If there are any running containers in the
// sandbox, they should be forced to termination.
func (r *RemoteRuntimeService) StopPodSandbox(podSandBoxID string) error {
ctx, cancel := getContextWithTimeout(r.timeout)
defer cancel() _, err := r.runtimeClient.StopPodSandbox(ctx, &runtimeapi.StopPodSandboxRequest{
PodSandboxId: podSandBoxID,
})
if err != nil {
klog.Errorf("StopPodSandbox %q from runtime service failed: %v", podSandBoxID, err)
return err
} return nil
}

5.2.1 r.runtimeClient.StopPodSandbox

接下來將會以dockershim為例,進入到CRI shim來進行停止pod sandbox的分析。

前面kubelet調用r.runtimeClient.StopPodSandbox,會進入到dockershim下面的StopPodSandbox方法。

停止pod sandbox主要有2個步驟:

(1)調用ds.network.TearDownPod:删除pod網絡;

(2)調用ds.client.StopContainer:停止pod sandbox容器。

需要注意的是,上面的2個步驟只有都成功了,停止pod sandbox的操作才算成功,且上面2個步驟成功的先後順序沒有要求。

// pkg/kubelet/dockershim/docker_sandbox.go
// StopPodSandbox stops the sandbox. If there are any running containers in the
// sandbox, they should be force terminated.
// TODO: This function blocks sandbox teardown on networking teardown. Is it
// better to cut our losses assuming an out of band GC routine will cleanup
// after us?
func (ds *dockerService) StopPodSandbox(ctx context.Context, r *runtimeapi.StopPodSandboxRequest) (*runtimeapi.StopPodSandboxResponse, error) {
var namespace, name string
var hostNetwork bool podSandboxID := r.PodSandboxId
resp := &runtimeapi.StopPodSandboxResponse{} // Try to retrieve minimal sandbox information from docker daemon or sandbox checkpoint.
inspectResult, metadata, statusErr := ds.getPodSandboxDetails(podSandboxID)
if statusErr == nil {
namespace = metadata.Namespace
name = metadata.Name
hostNetwork = (networkNamespaceMode(inspectResult) == runtimeapi.NamespaceMode_NODE)
} else {
checkpoint := NewPodSandboxCheckpoint("", "", &CheckpointData{})
checkpointErr := ds.checkpointManager.GetCheckpoint(podSandboxID, checkpoint) // Proceed if both sandbox container and checkpoint could not be found. This means that following
// actions will only have sandbox ID and not have pod namespace and name information.
// Return error if encounter any unexpected error.
if checkpointErr != nil {
if checkpointErr != errors.ErrCheckpointNotFound {
err := ds.checkpointManager.RemoveCheckpoint(podSandboxID)
if err != nil {
klog.Errorf("Failed to delete corrupt checkpoint for sandbox %q: %v", podSandboxID, err)
}
}
if libdocker.IsContainerNotFoundError(statusErr) {
klog.Warningf("Both sandbox container and checkpoint for id %q could not be found. "+
"Proceed without further sandbox information.", podSandboxID)
} else {
return nil, utilerrors.NewAggregate([]error{
fmt.Errorf("failed to get checkpoint for sandbox %q: %v", podSandboxID, checkpointErr),
fmt.Errorf("failed to get sandbox status: %v", statusErr)})
}
} else {
_, name, namespace, _, hostNetwork = checkpoint.GetData()
}
} // WARNING: The following operations made the following assumption:
// 1. kubelet will retry on any error returned by StopPodSandbox.
// 2. tearing down network and stopping sandbox container can succeed in any sequence.
// This depends on the implementation detail of network plugin and proper error handling.
// For kubenet, if tearing down network failed and sandbox container is stopped, kubelet
// will retry. On retry, kubenet will not be able to retrieve network namespace of the sandbox
// since it is stopped. With empty network namespcae, CNI bridge plugin will conduct best
// effort clean up and will not return error.
errList := []error{}
ready, ok := ds.getNetworkReady(podSandboxID)
if !hostNetwork && (ready || !ok) {
// Only tear down the pod network if we haven't done so already
cID := kubecontainer.BuildContainerID(runtimeName, podSandboxID)
err := ds.network.TearDownPod(namespace, name, cID)
if err == nil {
ds.setNetworkReady(podSandboxID, false)
} else {
errList = append(errList, err)
}
}
if err := ds.client.StopContainer(podSandboxID, defaultSandboxGracePeriod); err != nil {
// Do not return error if the container does not exist
if !libdocker.IsContainerNotFoundError(err) {
klog.Errorf("Failed to stop sandbox %q: %v", podSandboxID, err)
errList = append(errList, err)
} else {
// remove the checkpoint for any sandbox that is not found in the runtime
ds.checkpointManager.RemoveCheckpoint(podSandboxID)
}
} if len(errList) == 0 {
return resp, nil
} // TODO: Stop all running containers in the sandbox.
return nil, utilerrors.NewAggregate(errList)
}
ds.client.StopContainer

主要是調用d.client.ContainerStop

// pkg/kubelet/dockershim/libdocker/kube_docker_client.go
// Stopping an already stopped container will not cause an error in dockerapi.
func (d *kubeDockerClient) StopContainer(id string, timeout time.Duration) error {
ctx, cancel := d.getCustomTimeoutContext(timeout)
defer cancel()
err := d.client.ContainerStop(ctx, id, &timeout)
if ctxErr := contextError(ctx); ctxErr != nil {
return ctxErr
}
return err
}
d.client.ContainerStop

構建請求參數,向docker指定的url發送http請求,停止pod sandbox容器。

// vendor/github.com/docker/docker/client/container_stop.go
// ContainerStop stops a container. In case the container fails to stop
// gracefully within a time frame specified by the timeout argument,
// it is forcefully terminated (killed).
//
// If the timeout is nil, the container's StopTimeout value is used, if set,
// otherwise the engine default. A negative timeout value can be specified,
// meaning no timeout, i.e. no forceful termination is performed.
func (cli *Client) ContainerStop(ctx context.Context, containerID string, timeout *time.Duration) error {
query := url.Values{}
if timeout != nil {
query.Set("t", timetypes.DurationToSecondsString(*timeout))
}
resp, err := cli.post(ctx, "/containers/"+containerID+"/stop", query, nil, nil)
ensureReaderClosed(resp)
return err
}

總結

CRI架構圖

在 CRI 之下,包括兩種類型的容器運行時的實現:

(1)kubelet內置的 dockershim,實現了 Docker 容器引擎的支持以及 CNI 網絡插件(包括 kubenet)的支持。dockershim代碼內置於kubelet,被kubelet調用,讓dockershim起獨立的server來建立CRI shim,向kubelet暴露grpc server;

(2)外部的容器運行時,用來支持 rktcontainerd 等容器引擎的外部容器運行時。

kubelet調用CRI删除pod流程分析

kubelet删除一個pod的邏輯為:

(1)先停止屬於該pod的所有containers;

(2)然後再停止pod sandbox容器(包括删除pod網絡)。

注意點:這裏只是停止容器,而删除容器的操作由kubelet的gc來做。

kubelet CRI删除pod調用流程

下面以kubelet dockershim删除pod調用流程為例做一下分析。

kubelet通過調用dockershim來停止容器,而dockershim則調用docker來停止容器,並調用CNI來删除pod網絡。

圖1:kubelet dockershim删除pod調用圖示

dockershim屬於kubelet內置CRI shim,其餘remote CRI shim的創建pod調用流程其實與dockershim調用基本一致,只不過是調用了不同的容器引擎來操作容器,但一樣由CRI shim調用CNI來删除pod網絡。

關聯博客《kubernetes/k8s CRI 分析-容器運行時接口分析》

《kubernetes/k8s CRI分析-kubelet創建pod分析》

kubernetes/k8s CRI分析-kubelet删除pod分析的更多相關文章

  1. Kubernetes K8S之通過yaml文件創建Pod與Pod常用字段詳解

    YAML語法規範:在kubernetes k8s中如何通過yaml文件創建pod,以及pod常用字段詳解 YAML 語法規範 K8S 裏所有的資源或者配置都可以用 yaml 或 Json 定義.YAM ...

  2. Kubernetes K8S之資源控制器Daemonset詳解

    Kubernetes的資源控制器Daemonset詳解與示例 主機配置規劃 服務器名稱(hostname) 系統版本 配置 內網IP 外網IP(模擬) k8s-master CentOS7.7 2C/ ...

  3. Kubernetes K8S之存儲ConfigMap詳解

    K8S之存儲ConfigMap概述與說明,並詳解常用ConfigMap示例 主機配置規劃 服務器名稱(hostname) 系統版本 配置 內網IP 外網IP(模擬) k8s-master CentOS ...

  4. 12.深入k8s:kubelet創建pod流程源碼分析

    轉載請聲明出處哦~,本篇文章發布於luozhiyun的博客:https://www.luozhiyun.com 源碼版本是1.19 在上一篇中,我們知道在kubelet中,工作核心就是圍繞著整個syn ...

  5. kubernetes之kubelet運行機制分析

    kubernetes集群中,每個Node節點工作節點上都會啟動一個kubelet服務進程.用於處理master節點下發到本節點的任務,管理pod和pod中的容器.每個kubelet進程都會在API S ...

  6. Kubernetes(k8s)容器運行時(CRI)

    Kubernetes節點的底層由一個叫做"容器運行時"的軟件進行支撐,它負責比如啟停容器這樣的事情.最廣為人知的容器運行時當屬Docker,但它不是唯一的.事實上,容器運行時這個領 ...

  7. heapster源碼分析——kubelet的api調用分析

    一.heapster簡介 什麼是Heapster? Heapster是容器集群監控和性能分析工具,天然的支持Kubernetes和CoreOS.Kubernetes有個出名的監控agent---cAd ...

  8. kubernetes 無法删除 pod 問題的解决

    [摘要] kubernetes 可能會產生垃圾或者僵屍pod,在删除rc的時候,相應的pod沒有被删除,手動删除pod後會自動重新創建,這時一般需要先删除掉相關聯的resources,實際中還要具體情 ...

  9. kubernetes垃圾回收器GarbageCollector Controller源碼分析(二)

    kubernetes版本:1.13.2 接上一節:kubernetes垃圾回收器GarbageCollector Controller源碼分析(一) 主要步驟 GarbageCollector Con ...

  10. (轉)實驗文檔4:kubernetes集群的監控和日志分析

    改造dubbo-demo-web項目為Tomcat啟動項目 Tomcat官網 准備Tomcat的鏡像底包 准備tomcat二進制包 運維主機HDSS7-200.host.com上:Tomcat8下載鏈 ...

隨機推薦

  1. [Hadoop]-從數據去重認識MapReduce

    這學期剛好開了一門大數據的課,就是完完全全簡簡單單的介紹的那種,然後就接觸到這裏面最被人熟知的Hadoop了.看了官網的教程[吐槽一下,果然英語還是很重要!],嗯啊,一知半解地搭建了本地和偽分布式的, ...

  2. xpath錶達式,提取標簽下的全部內容(將其他標簽過濾)

    例如要提取span下的內容 //div[@class="content"]/span 正確的其中一種寫法如下data = response.xpath('//div[@class= ...

  3. Windows下Nginx的安裝與配置(轉)

    一.首先去官網下載 nginx1.0.11的Windows版本,官網下載:http://nginx.org/download/nginx-1.0.11.zip 下載到軟件包後,解壓 nginx-ngi ...

  4. 【PRML讀書筆記-Chapter1-Introduction】引言

    模式識別領域主要關注的就是如何通過算法讓計算機自動去發現數據中的規則,並利用這些規則來做一些有意義的事情,比如說,分類. 以數字識別為例,我們可以根據筆畫規則啟發式教學去解决,但這樣效果並不理想. 我 ...

  5. python中怎麼查看當前工作目錄和更改工作目錄

    查詢當前目錄:os.getcwd() 更改當前目錄:os.chdir()

  6. jFinal中報對應模型不存在的錯誤(The Table mapping of model: demo.User not exists)

    jFinal中報對應模型不存在的錯誤(The Table mapping of model: demo.User not exists) 貼出錯誤: java.lang.RuntimeExceptio ...

  7. 加密你的SQLite

    轉自王中周的個人博客 關於SQLite SQLite是一個輕量的.跨平臺的.開源的數據庫引擎,它的在讀寫效率.消耗總量.延遲時間和整體簡單性上具有的優越性,使其成為移動平臺數據庫的最佳解决方案(如iO ...

  8. 那些學些網址_jquery初學知識

    http://www.cnblogs.com/mingmingruyuedlut/archive/2011/10/18/2216553.html(ajax)http://www.enet.com.cn ...

  9. Linux 定時任務詳解

    原文地址:http://edu.codepub.com/2011/0104/28518.php   crond分為系統級定時和用戶級定時,系統級定時主要編輯/etc/crontab,用戶級定時主要利用 ...

  10. nginx的autoindex,目錄瀏覽,配置和美化,美觀的xslt_stylesheet

    nginx的autoindex,目錄瀏覽,配置和美化,美觀的xslt_stylesheet Nginx custom autoindex with XSLT 轉載注明來源: 本文鏈接 來自osnosn ...