IoT?邊緣集群Kubernetes?Events告警通知進(jìn)一步配置詳解
目標(biāo)
上一篇文章
IoT 邊緣集群基于 Kubernetes Events 的告警通知實(shí)現(xiàn)
告警恢復(fù)通知 - 經(jīng)過評(píng)估無法實(shí)現(xiàn)
原因: 告警和恢復(fù)是單獨(dú)完全不相關(guān)的事件, 告警是 Warning 級(jí)別, 恢復(fù)是 Normal 級(jí)別, 要開啟恢復(fù), 就會(huì)導(dǎo)致所有 Normal Events 都會(huì)被發(fā)送, 這個(gè)數(shù)量是很恐怖的; 而且, 除非特別有經(jīng)驗(yàn)和耐心, 否則無法看出哪條 Normal 對(duì)應(yīng)的是 告警的恢復(fù).
- 未恢復(fù)進(jìn)行持續(xù)告警 - 默認(rèn)就帶的能力, 無需額外配置.
- 告警內(nèi)容顯示資源名稱,比如節(jié)點(diǎn)和pod名稱
可以設(shè)置屏蔽特定的節(jié)點(diǎn)和工作負(fù)載并可以動(dòng)態(tài)調(diào)整
比如,集群001中的節(jié)點(diǎn)worker-1做計(jì)劃性維護(hù),期間停止監(jiān)控,維護(hù)完成后重新開始監(jiān)控。
配置
告警內(nèi)容顯示資源名稱
典型的幾類 events:
apiVersion: v1
count: 101557
eventTime: null
firstTimestamp: "2022-04-08T03:50:47Z"
involvedObject:
apiVersion: v1
fieldPath: spec.containers{prometheus}
kind: Pod
name: prometheus-rancher-monitoring-prometheus-0
namespace: cattle-monitoring-system
kind: Event
lastTimestamp: "2022-04-14T11:39:19Z"
message: 'Readiness probe failed: Get "http://10.42.0.87:9090/-/ready": context deadline
exceeded (Client.Timeout exceeded while awaiting headers)'
metadata:
creationTimestamp: "2022-04-08T03:51:17Z"
name: prometheus-rancher-monitoring-prometheus-0.16e3cf53f0793344
namespace: cattle-monitoring-system
reason: Unhealthy
reportingComponent: ""
reportingInstance: ""
source:
component: kubelet
host: master-1
type: Warning
apiVersion: v1
count: 116
eventTime: null
firstTimestamp: "2022-04-13T02:43:26Z"
involvedObject:
apiVersion: v1
fieldPath: spec.containers{grafana}
kind: Pod
name: rancher-monitoring-grafana-57777cc795-2b2x5
namespace: cattle-monitoring-system
kind: Event
lastTimestamp: "2022-04-14T11:18:56Z"
message: 'Readiness probe failed: Get "http://10.42.0.90:3000/api/health": context
deadline exceeded (Client.Timeout exceeded while awaiting headers)'
metadata:
creationTimestamp: "2022-04-14T11:18:57Z"
name: rancher-monitoring-grafana-57777cc795-2b2x5.16e5548dd2523a13
namespace: cattle-monitoring-system
reason: Unhealthy
reportingComponent: ""
reportingInstance: ""
source:
component: kubelet
host: master-1
type: Warning
apiVersion: v1
count: 20958
eventTime: null
firstTimestamp: "2022-04-11T10:34:51Z"
involvedObject:
apiVersion: v1
fieldPath: spec.containers{lb-port-1883}
kind: Pod
name: svclb-emqx-dt22t
namespace: emqx
kind: Event
lastTimestamp: "2022-04-14T11:39:48Z"
message: Back-off restarting failed container
metadata:
creationTimestamp: "2022-04-11T10:34:51Z"
name: svclb-emqx-dt22t.16e4d11e2b9efd27
namespace: emqx
reason: BackOff
reportingComponent: ""
reportingInstance: ""
source:
component: kubelet
host: worker-1
type: Warning
apiVersion: v1
count: 21069
eventTime: null
firstTimestamp: "2022-04-11T10:34:48Z"
involvedObject:
apiVersion: v1
fieldPath: spec.containers{lb-port-80}
kind: Pod
name: svclb-traefik-r5p8t
namespace: kube-system
kind: Event
lastTimestamp: "2022-04-14T11:44:59Z"
message: Back-off restarting failed container
metadata:
creationTimestamp: "2022-04-11T10:34:48Z"
name: svclb-traefik-r5p8t.16e4d11daf0b79ce
namespace: kube-system
reason: BackOff
reportingComponent: ""
reportingInstance: ""
source:
component: kubelet
host: worker-1
type: Warning
{
"metadata": {
"name": "event-exporter-79544df9f7-xj4t5.16e5c540dc32614f",
"namespace": "monitoring",
"uid": "baf2f642-2383-4e22-87e0-456b6c3eaf4e",
"resourceVersion": "14043444",
"creationTimestamp": "2022-04-14T13:08:40Z"
},
"reason": "Pulled",
"message": "Container image \"ghcr.io/opsgenie/kubernetes-event-exporter:v0.11\" already present on machine",
"source": {
"component": "kubelet",
"host": "worker-2"
},
"firstTimestamp": "2022-04-14T13:08:40Z",
"lastTimestamp": "2022-04-14T13:08:40Z",
"count": 1,
"type": "Normal",
"eventTime": null,
"reportingComponent": "",
"reportingInstance": "",
"involvedObject": {
"kind": "Pod",
"namespace": "monitoring",
"name": "event-exporter-79544df9f7-xj4t5",
"uid": "b77d3e13-fa9e-484b-8a5a-d1afc9edec75",
"apiVersion": "v1",
"resourceVersion": "14043435",
"fieldPath": "spec.containers{event-exporter}",
"labels": {
"app": "event-exporter",
"pod-template-hash": "79544df9f7",
"version": "v1"
}
}
}
我們可以把更多的字段加入到告警信息中, 其中就包括:
- 節(jié)點(diǎn):
{{ Source.Host }} - Pod:
{{ .InvolvedObject.Name }}
綜上, 修改后的event-exporter-cfg yaml 如下:
apiVersion: v1
kind: ConfigMap
metadata:
name: event-exporter-cfg
namespace: monitoring
resourceVersion: '5779968'
data:
config.yaml: |
logLevel: error
logFormat: json
route:
routes:
- match:
- receiver: "dump"
- drop:
- type: "Normal"
match:
- receiver: "feishu"
receivers:
- name: "dump"
stdout: {}
- name: "feishu"
webhook:
endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..."
headers:
Content-Type: application/json
layout:
msg_type: interactive
card:
config:
wide_screen_mode: true
enable_forward: true
header:
title:
tag: plain_text
content: xxx測(cè)試K3S集群告警
template: red
elements:
- tag: div
text:
tag: lark_md
content: "**EventID:** {{ .UID }}\n**EventNamespace:** {{ .InvolvedObject.Namespace }}\n**EventName:** {{ .InvolvedObject.Name }}\n**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}\n**EventComponent:** {{ .Source.Component }}\n**EventHost:** {{ .Source.Host }}\n**EventLabels:** {{ toJson .InvolvedObject.Labels}}\n**EventAnnotations:** {{ toJson .InvolvedObject.Annotations}}"
屏蔽特定的節(jié)點(diǎn)和工作負(fù)載
比如,集群001中的節(jié)點(diǎn)worker-1做計(jì)劃性維護(hù),期間停止監(jiān)控,維護(hù)完成后重新開始監(jiān)控。
繼續(xù)修改event-exporter-cfg yaml 如下:
apiVersion: v1
kind: ConfigMap
metadata:
name: event-exporter-cfg
namespace: monitoring
data:
config.yaml: |
logLevel: error
logFormat: json
route:
routes:
- match:
- receiver: "dump"
- drop:
- type: "Normal"
- source:
host: "worker-1"
- namespace: "cattle-monitoring-system"
- name: "*emqx*"
- kind: "Pod|Deployment|ReplicaSet"
- labels:
version: "dev"
match:
- receiver: "feishu"
receivers:
- name: "dump"
stdout: {}
- name: "feishu"
webhook:
endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..."
headers:
Content-Type: application/json
layout:
msg_type: interactive
card:
config:
wide_screen_mode: true
enable_forward: true
header:
title:
tag: plain_text
content: xxx測(cè)試K3S集群告警
template: red
elements:
- tag: div
text:
tag: lark_md
content: "**EventID:** {{ .UID }}\n**EventNamespace:** {{ .InvolvedObject.Namespace }}\n**EventName:** {{ .InvolvedObject.Name }}\n**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}\n**EventComponent:** {{ .Source.Component }}\n**EventHost:** {{ .Source.Host }}\n**EventLabels:** {{ toJson .InvolvedObject.Labels}}\n**EventAnnotations:** {{ toJson .InvolvedObject.Annotations}}"
默認(rèn)的 drop 規(guī)則為: - type: "Normal", 即不對(duì) Normal 級(jí)別進(jìn)行告警;
現(xiàn)在加入以下規(guī)則:
- source:
host: "worker-1"
- namespace: "cattle-monitoring-system"
- name: "*emqx*"
- kind: "Pod|Deployment|ReplicaSet"
- labels:
version: "dev"
... host: "worker-1": 不對(duì)節(jié)點(diǎn)worker-1做告警;... namespace: "cattle-monitoring-system": 不對(duì) NameSpace:cattle-monitoring-system做告警;... name: "*emqx*": 不對(duì) name(name 往往是 pod name) 包含emqx的做告警kind: "Pod|Deployment|ReplicaSet": 不對(duì)PodDeploymentReplicaSet做告警(也就是不關(guān)注應(yīng)用, 組件相關(guān)的告警)...version: "dev": 不對(duì)label含有version: "dev"的做告警(可以通過它屏蔽特定的應(yīng)用的告警)
最終效果
如下圖:


以上就是IoT 邊緣集群Kubernetes Events告警通知進(jìn)一步配置詳解的詳細(xì)內(nèi)容,更多關(guān)于IoT Kubernetes Events告警的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
- 詳解Kubernetes 中容器跨主機(jī)網(wǎng)絡(luò)
- Kubernetes?Ingress實(shí)現(xiàn)細(xì)粒度IP訪問控制
- Kubernetes如何限制不同團(tuán)隊(duì)只能訪問各自namespace實(shí)現(xiàn)
- 詳解Rainbond云原生平臺(tái)簡化Kubernetes業(yè)務(wù)問題排查
- 一文解析Kubernetes使用PVC后數(shù)據(jù)丟失
- Kubernetes上使用Jaeger分布式追蹤基礎(chǔ)設(shè)施詳解
- IoT邊緣集群Kubernetes?Events告警通知實(shí)現(xiàn)示例
- kubernetes之statefulset搭建MySQL集群
相關(guān)文章
一文詳解基于Kubescape進(jìn)行Kubernetes安全加固
這篇文章主要為大家介紹了基于Kubescape進(jìn)行Kubernetes安全加固詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2023-02-02
Kubernetes?安裝flannel組件的過程(本地?kube-flannel.yml?文件)
文章介紹了如何在無法直接訪問raw.githubusercontent.com的情況下,通過科學(xué)上網(wǎng)或下載kube-flannel.yml文件源碼來部署Flannel網(wǎng)絡(luò)插件的方法,感興趣的朋友一起看看吧2025-03-03
Rainbond對(duì)前端項(xiàng)目Vue及React的持續(xù)部署
這篇文章主要為大家介紹了Rainbond對(duì)前端項(xiàng)目Vue及React的持續(xù)部署,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2022-04-04
在K8S中實(shí)現(xiàn)會(huì)話保持的兩種方案
這篇文章主要介紹了在K8S中實(shí)現(xiàn)會(huì)話保持的兩種方案,每種方案結(jié)合示例代碼給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2023-03-03

