目錄 設(shè)置 prometheus 和 grafana 來監(jiān)控 longhorn 將 longhorn 指標(biāo)集成到 rancher 監(jiān)控系統(tǒng)中 longhorn 監(jiān)控指標(biāo) 支持 kubelet volume 指標(biāo) longhorn 警報(bào)規(guī)則示例設(shè)置 prometheus 和 grafana 來監(jiān)控 longhorn
概覽longhorn 在 rest 端點(diǎn) http://longhorn_manager_ip:port/metrics 上以 prometheus 文本格式原生公開指標(biāo)。有關(guān)所有可用指標(biāo)的說明,請(qǐng)參閱 longhorn's metrics。您可以使用 prometheus, graphite, telegraf 等任何收集工具來抓取這些指標(biāo),然后通過 grafana 等工具將收集到的數(shù)據(jù)可視化。
本文檔提供了一個(gè)監(jiān)控 longhorn 的示例設(shè)置。監(jiān)控系統(tǒng)使用 prometheus 收集數(shù)據(jù)和警報(bào),使用 grafana 將收集的數(shù)據(jù)可視化/儀表板(visualizing/dashboarding)。高級(jí)概述來看,監(jiān)控系統(tǒng)包含:
prometheus 服務(wù)器從 longhorn 指標(biāo)端點(diǎn)抓取和存儲(chǔ)時(shí)間序列數(shù)據(jù)。prometheus 還負(fù)責(zé)根據(jù)配置的規(guī)則和收集的數(shù)據(jù)生成警報(bào)。prometheus 服務(wù)器然后將警報(bào)發(fā)送到 alertmanager。 alertmanager 然后管理這些警報(bào)(alerts),包括靜默(silencing)、抑制(inhibition)、聚合(aggregation)和通過電子郵件、呼叫通知系統(tǒng)和聊天平臺(tái)等方法發(fā)送通知。 grafana 向 prometheus 服務(wù)器查詢數(shù)據(jù)并繪制儀表板進(jìn)行可視化。下圖描述了監(jiān)控系統(tǒng)的詳細(xì)架構(gòu)。
上圖中有 2 個(gè)未提及的組件:
longhorn 后端服務(wù)是指向 longhorn manager pods 集的服務(wù)。longhorn 的指標(biāo)在端點(diǎn) http://longhorn_manager_ip:port/metrics 的 longhorn manager pods 中公開。 prometheus operator 使在 kubernetes 上運(yùn)行 prometheus 變得非常容易。operator 監(jiān)視 3 個(gè)自定義資源:servicemonitor、prometheus 和 alertmanager。當(dāng)用戶創(chuàng)建這些自定義資源時(shí),prometheus operator 會(huì)使用用戶指定的配置部署和管理 prometheus server, alermanager。安裝
按照此說明將所有組件安裝到 monitoring 命名空間中。要將它們安裝到不同的命名空間中,請(qǐng)更改字段 namespace: other_namespace
創(chuàng)建 monitoring 命名空間
apiversion:v1 kind:namespace metadata: name:monitoring安裝 prometheus operator
部署 prometheus operator 及其所需的 clusterrole、clusterrolebinding 和 service account。
apiversion:rbac.authorization.k8s.io/v1 kind:clusterrolebinding metadata: labels: app.kubernetes.io/component:controller app.kubernetes.io/name:prometheus-operator app.kubernetes.io/version:v0.38.3 name:prometheus-operator namespace:monitoring roleref: apigroup:rbac.authorization.k8s.io kind:clusterrole name:prometheus-operator subjects: -kind:serviceaccount name:prometheus-operator namespace:monitoring --- apiversion:rbac.authorization.k8s.io/v1 kind:clusterrole metadata: labels: app.kubernetes.io/component:controller app.kubernetes.io/name:prometheus-operator app.kubernetes.io/version:v0.38.3 name:prometheus-operator namespace:monitoring rules: -apigroups: -apiextensions.k8s.io resources: -customresourcedefinitions verbs: -create -apigroups: -apiextensions.k8s.io resourcenames: -alertmanagers.monitoring.coreos.com -podmonitors.monitoring.coreos.com -prometheuses.monitoring.coreos.com -prometheusrules.monitoring.coreos.com -servicemonitors.monitoring.coreos.com -thanosrulers.monitoring.coreos.com resources: -customresourcedefinitions verbs: -get -update -apigroups: -monitoring.coreos.com resources: -alertmanagers -alertmanagers/finalizers -prometheuses -prometheuses/finalizers -thanosrulers -thanosrulers/finalizers -servicemonitors -podmonitors -prometheusrules verbs: -'*' -apigroups: -apps resources: -statefulsets verbs: -'*' -apigroups: -"" resources: -configmaps -secrets verbs: -'*' -apigroups: -"" resources: -pods verbs: -list -delete -apigroups: -"" resources: -services -services/finalizers -endpoints verbs: -get -create -update -delete -apigroups: -"" resources: -nodes verbs: -list -watch -apigroups: -"" resources: -namespaces verbs: -get -list -watch --- apiversion:apps/v1 kind:deployment metadata: labels: app.kubernetes.io/component:controller app.kubernetes.io/name:prometheus-operator app.kubernetes.io/version:v0.38.3 name:prometheus-operator namespace:monitoring spec: replicas:1 selector: matchlabels: app.kubernetes.io/component:controller app.kubernetes.io/name:prometheus-operator template: metadata: labels: app.kubernetes.io/component:controller app.kubernetes.io/name:prometheus-operator app.kubernetes.io/version:v0.38.3 spec: containers: -args: ---kubelet-service=kube-system/kubelet ---logtostderr=true ---config-reloader-image=jimmidyson/configmap-reload:v0.3.0 ---prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.38.3 image:quay.io/prometheus-operator/prometheus-operator:v0.38.3 name:prometheus-operator ports: -containerport:8080 name:http resources: limits: cpu:200m memory:200mi requests: cpu:100m memory:100mi securitycontext: allowprivilegeescalation:false nodeselector: beta.kubernetes.io/os:linux securitycontext: runasnonroot:true runasuser:65534 serviceaccountname:prometheus-operator --- apiversion:v1 kind:serviceaccount metadata: labels: app.kubernetes.io/component:controller app.kubernetes.io/name:prometheus-operator app.kubernetes.io/version:v0.38.3 name:prometheus-operator namespace:monitoring --- apiversion:v1 kind:service metadata: labels: app.kubernetes.io/component:controller app.kubernetes.io/name:prometheus-operator app.kubernetes.io/version:v0.38.3 name:prometheus-operator namespace:monitoring spec: clusterip:none ports: -name:http port:8080 targetport:http selector: app.kubernetes.io/component:controller app.kubernetes.io/name:prometheus-operator安裝 longhorn servicemonitor
longhorn servicemonitor 有一個(gè)標(biāo)簽選擇器 app: longhorn-manager 來選擇 longhorn 后端服務(wù)。稍后,prometheus crd 可以包含 longhorn servicemonitor,以便 prometheus server 可以發(fā)現(xiàn)所有 longhorn manager pods 及其端點(diǎn)。
apiversion:monitoring.coreos.com/v1 kind:servicemonitor metadata: name:longhorn-prometheus-servicemonitor namespace:monitoring labels: name:longhorn-prometheus-servicemonitor spec: selector: matchlabels: app:longhorn-manager namespaceselector: matchnames: -longhorn-system endpoints: -port:manager安裝和配置 prometheus alertmanager
使用 3 個(gè)實(shí)例創(chuàng)建一個(gè)高可用的 alertmanager 部署:
apiversion:monitoring.coreos.com/v1 kind:alertmanager metadata: name:longhorn namespace:monitoring spec: replicas:3除非提供有效配置,否則 alertmanager 實(shí)例將無法啟動(dòng)。有關(guān) alertmanager 配置的更多說明,請(qǐng)參見此處。下面的代碼給出了一個(gè)示例配置:
global: resolve_timeout:5m route: group_by:[alertname] receiver:email_and_slack receivers: -name:email_and_slack email_configs: -to:<theemailaddresstosendnotificationsto> from:<thesenderaddress> smarthost:<thesmtphostthroughwhichemailsaresent> #smtpauthenticationinformation. auth_username:<theusername> auth_identity:<theidentity> auth_password:<thepassword> headers: subject:'longhorn-alert' text:|- {{range.alerts}} *alert:*{{.annotations.summary}}-`{{.labels.severity}}` *description:*{{.annotations.description}} *details:* {{range.labels.sortedpairs}}•*{{.name}}:*`{{.value}}` {{end}} {{end}} slack_configs: -api_url:<theslackwebhookurl> channel:<thechannelorusertosendnotificationsto> text:|- {{range.alerts}} *alert:*{{.annotations.summary}}-`{{.labels.severity}}` *description:*{{.annotations.description}} *details:* {{range.labels.sortedpairs}}•*{{.name}}:*`{{.value}}` {{end}} {{end}}將上述 alertmanager 配置保存在名為 alertmanager.yaml 的文件中,并使用 kubectl 從中創(chuàng)建一個(gè) secret。
alertmanager 實(shí)例要求 secret 資源命名遵循 alertmanager-{alertmanager_name} 格式。在上一步中,alertmanager 的名稱是 longhorn,所以 secret 名稱必須是 alertmanager-longhorn
$kubectlcreatesecretgenericalertmanager-longhorn--from-file=alertmanager.yaml-nmonitoring為了能夠查看 alertmanager 的 web ui,請(qǐng)通過 service 公開它。一個(gè)簡(jiǎn)單的方法是使用 nodeport 類型的 service :
apiversion:v1 kind:service metadata: name:alertmanager-longhorn namespace:monitoring spec: type:nodeport ports: -name:web nodeport:30903 port:9093 protocol:tcp targetport:web selector: alertmanager:longhorn創(chuàng)建上述服務(wù)后,您可以通過節(jié)點(diǎn)的 ip 和端口 30903 訪問 alertmanager 的 web ui。
使用上面的 nodeport 服務(wù)進(jìn)行快速驗(yàn)證,因?yàn)樗煌ㄟ^ tls 連接進(jìn)行通信。您可能希望將服務(wù)類型更改為 clusterip,并設(shè)置一個(gè) ingress-controller 以通過 tls 連接公開 alertmanager 的 web ui。
安裝和配置 prometheus server
創(chuàng)建定義警報(bào)條件的 prometheusrule 自定義資源。
apiversion:monitoring.coreos.com/v1 kind:prometheusrule metadata: labels: prometheus:longhorn role:alert-rules name:prometheus-longhorn-rules namespace:monitoring spec: groups: -name:longhorn.rules rules: -alert:longhornvolumeusagecritical annotations: description:longhornvolume{{$labels.volume}}on{{$labels.node}}isat{{$value}}%usedfor morethan5minutes. summary:longhornvolumecapacityisover90%used. expr:100*(longhorn_volume_usage_bytes/longhorn_volume_capacity_bytes)>90 for:5m labels: issue:longhornvolume{{$labels.volume}}usageon{{$labels.node}}iscritical. severity:critical有關(guān)如何定義警報(bào)規(guī)則的更多信息,請(qǐng)參見https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules
如果激活了 rbac 授權(quán),則為 prometheus pod 創(chuàng)建 clusterrole 和 clusterrolebinding:
apiversion:v1 kind:serviceaccount metadata: name:prometheus namespace:monitoring apiversion:rbac.authorization.k8s.io/v1beta1 kind:clusterrole metadata: name:prometheus namespace:monitoring rules: -apigroups:[""] resources: -nodes -services -endpoints -pods verbs:["get","list","watch"] -apigroups:[""] resources: -configmaps verbs:["get"] -nonresourceurls:["/metrics"] verbs:["get"] apiversion:rbac.authorization.k8s.io/v1beta1 kind:clusterrolebinding metadata: name:prometheus roleref: apigroup:rbac.authorization.k8s.io kind:clusterrole name:prometheus subjects: -kind:serviceaccount name:prometheus namespace:monitoring創(chuàng)建 prometheus 自定義資源。請(qǐng)注意,我們?cè)?spec 中選擇了 longhorn 服務(wù)監(jiān)視器(service monitor)和 longhorn 規(guī)則。
apiversion:monitoring.coreos.com/v1 kind:prometheus metadata: name:prometheus namespace:monitoring spec: replicas:2 serviceaccountname:prometheus alerting: alertmanagers: -namespace:monitoring name:alertmanager-longhorn port:web servicemonitorselector: matchlabels: name:longhorn-prometheus-servicemonitor ruleselector: matchlabels: prometheus:longhorn role:alert-rules為了能夠查看 prometheus 服務(wù)器的 web ui,請(qǐng)通過 service 公開它。一個(gè)簡(jiǎn)單的方法是使用 nodeport 類型的 service:
apiversion:v1 kind:service metadata: name:prometheus namespace:monitoring spec: type:nodeport ports: -name:web nodeport:30904 port:9090 protocol:tcp targetport:web selector: prometheus:prometheus創(chuàng)建上述服務(wù)后,您可以通過節(jié)點(diǎn)的 ip 和端口 30904 訪問 prometheus server 的 web ui。
此時(shí),您應(yīng)該能夠在 prometheus server ui 的目標(biāo)和規(guī)則部分看到所有 longhorn manager targets 以及 longhorn rules。
使用上述 nodeport service 進(jìn)行快速驗(yàn)證,因?yàn)樗煌ㄟ^ tls 連接進(jìn)行通信。您可能希望將服務(wù)類型更改為 clusterip,并設(shè)置一個(gè) ingress-controller 以通過 tls 連接公開 prometheus server 的 web ui。
安裝 grafana
創(chuàng)建 grafana 數(shù)據(jù)源配置:
apiversion:v1 kind:configmap metadata: name:grafana-datasources namespace:monitoring data: prometheus.yaml:|- { "apiversion":1, "datasources":[ { "access":"proxy", "editable":true, "name":"prometheus", "orgid":1, "type":"prometheus", "url":"http://prometheus:9090", "version":1 } ] }創(chuàng)建 grafana 部署:
apiversion:apps/v1 kind:deployment metadata: name:grafana namespace:monitoring labels: app:grafana spec: replicas:1 selector: matchlabels: app:grafana template: metadata: name:grafana labels: app:grafana spec: containers: -name:grafana image:grafana/grafana:7.1.5 ports: -name:grafana containerport:3000 resources: limits: memory:"500mi" cpu:"300m" requests: memory:"500mi" cpu:"200m" volumemounts: -mountpath:/var/lib/grafana name:grafana-storage -mountpath:/etc/grafana/provisioning/datasources name:grafana-datasources readonly:false volumes: -name:grafana-storage emptydir:{} -name:grafana-datasources configmap: defaultmode:420 name:grafana-datasources在 nodeport 32000 上暴露 grafana:
apiversion:v1 kind:service metadata: name:grafana namespace:monitoring spec: selector: app:grafana type:nodeport ports: -port:3000 targetport:3000 nodeport:32000使用上述 nodeport 服務(wù)進(jìn)行快速驗(yàn)證,因?yàn)樗煌ㄟ^ tls 連接進(jìn)行通信。您可能希望將服務(wù)類型更改為 clusterip,并設(shè)置一個(gè) ingress-controller 以通過 tls 連接公開 grafana。
使用端口 32000 上的任何節(jié)點(diǎn) ip 訪問 grafana 儀表板。默認(rèn)憑據(jù)為:
user:admin pass:admin安裝 longhorn dashboard
進(jìn)入 grafana 后,導(dǎo)入預(yù)置的面板:https://grafana.com/grafana/dashboards/13032
有關(guān)如何導(dǎo)入 grafana dashboard 的說明,請(qǐng)參閱 https://grafana.com/docs/grafana/latest/reference/export_import/
成功后,您應(yīng)該會(huì)看到以下 dashboard:
將 longhorn 指標(biāo)集成到 rancher 監(jiān)控系統(tǒng)中
關(guān)于 rancher 監(jiān)控系統(tǒng)
使用 rancher,您可以通過與的開源監(jiān)控解決方案 prometheus 的集成來監(jiān)控集群節(jié)點(diǎn)、kubernetes 組件和軟件部署的狀態(tài)和進(jìn)程。
有關(guān)如何部署/啟用 rancher 監(jiān)控系統(tǒng)的說明,請(qǐng)參見https://rancher.com/docs/rancher/v2.x/en/monitoring-alerting/
將 longhorn 指標(biāo)添加到 rancher 監(jiān)控系統(tǒng)
如果您使用 rancher 來管理您的 kubernetes 并且已經(jīng)啟用 rancher 監(jiān)控,您可以通過簡(jiǎn)單地部署以下 servicemonitor 將 longhorn 指標(biāo)添加到 rancher 監(jiān)控中:
apiversion:monitoring.coreos.com/v1 kind:servicemonitor metadata: name:longhorn-prometheus-servicemonitor namespace:longhorn-system labels: name:longhorn-prometheus-servicemonitor spec: selector: matchlabels: app:longhorn-manager namespaceselector: matchnames: -longhorn-system endpoints: -port:manager創(chuàng)建 servicemonitor 后,rancher 將自動(dòng)發(fā)現(xiàn)所有 longhorn 指標(biāo)。
然后,您可以設(shè)置 grafana 儀表板以進(jìn)行可視化。
longhorn 監(jiān)控指標(biāo)
volume(卷)
指標(biāo)名 說明 示例 longhorn_volume_actual_size_bytes 對(duì)應(yīng)節(jié)點(diǎn)上卷的每個(gè)副本使用的實(shí)際空間 longhorn_volume_actual_size_bytes{node="worker-2",volume="testvol"} 1.1917312e+08 longhorn_volume_capacity_bytes 此卷的配置大小(以 byte 為單位) longhorn_volume_capacity_bytes{node="worker-2",volume="testvol"} 6.442450944e+09 longhorn_volume_state 本卷狀態(tài):1=creating, 2=attached, 3=detached, 4=attaching, 5=detaching, 6=deleting longhorn_volume_state{node="worker-2",volume="testvol"} 2 longhorn_volume_robustness 本卷的健壯性: 0=unknown, 1=healthy, 2=degraded, 3=faulted longhorn_volume_robustness{node="worker-2",volume="testvol"} 1 node(節(jié)點(diǎn))
指標(biāo)名 說明 示例 longhorn_node_status 該節(jié)點(diǎn)的狀態(tài):1=true, 0=false longhorn_node_status{condition="ready",condition_reason="",node="worker-2"} 1
淄博網(wǎng)站建設(shè)網(wǎng)站如何規(guī)劃?秀峰企業(yè)網(wǎng)站開發(fā)包括什么實(shí)現(xiàn)企業(yè)網(wǎng)站建設(shè)的必要性有哪幾個(gè)方面?網(wǎng)站建設(shè)開發(fā)具體流程是怎樣的?企業(yè)網(wǎng)站制作需要注意幾個(gè)方面百度正式取消百度快照時(shí)間網(wǎng)站建設(shè)做偽靜態(tài)嗎?網(wǎng)站建設(shè)規(guī)劃前提找家專業(yè)建站公司