娇韵诗官网,乐彩论坛17500,ecco-点赞汉堡,一个赞送您一个汉堡

频道:国际新闻 日期: 浏览:236

Kubernetes1.14 监控体系 Prometheus-operator 布置实战

Operator

Operator是由CoreOS公司开发的,用来扩展 Kubernetes API,特定的运用程序操控器,它用来创立、装备和办理杂乱的有状况运用,如数据库、缓存和监控体系。Operator依据 Kubernetes 的资源和操控器概念之上构建,但一起lx808又包含了运用程序特定的一些专业常识,比方创立一个数据库的Operator,则有必要对创立的数据库的各种运维办法十分了解,创立Operator的关键是CRD(自界说资源)的规划。

CRD是对 Kubernetes API 的扩展,Kubernetes 中的每个资源都是一个 API 目标的调集,例如咱们在YAML文件里界说的那些spec都是对 Kubernetes 中的资源目标的界说,一切的自界说资源能够跟 Kubernetes 中内建的资源相同运用 kubectl 操作。

Operator是将运维人员对软件操作的常识给代码化,一起使用 Kubernetes 强壮的笼统来办理大规模的软件运用。现在CoreOS官方供给了几种Operator的完成,其间就包含咱们今日的主角:Prometheus Operator,Operator的中心完成便是依据 Kubernetes 的以下两个概念:

资源:目标的状况界说

操控器:观测、剖析和举动,以调理资源的散布

Prometheus-Operator 架构图

上图是Prometheus-Operator官方供给的架构图,其间Operator是最中心的部分,作为一个操控器,他会去创立Prometheus、ServiceMonitor、AlertManager以及PrometheusRule4个CRD资源目标,然后会一向监控并保持这4个资源目标的状况。

这样咱们要在集群中监控什么数据,就变成了直接去操作 Kuber艾鹿薇和苏先生合照netes 集群的资源目标了,是不是便利很多了。上图中的 Service 和 ServiceMonitor 都是 Kubernetes 的资源,一个 ServiceMonitor 能够经过 labelSelector 的办法去匹配一类 Service,Prometheus 也能够经过 labelSelector 去匹配多个ServiceMonitor。

关于 Prometheus-Operator 四种资源目标的解说

  • Prometheus 是操控创立Prometheus server 集群,在k8s中表现为pod
  • ServiceMonitor 是exporter的各种笼统,exporter是用来供给专门供给metrics数据接口的东西,Prometheus便是经过ServiceMonitor供给的metrics数据接口去 pull 数据的
  • alertmanager 是操控创立alertmanager 集群
  • PrometheusRule 用来界说报警规矩文件

装置

选用helm chart 装置

创立ingress tls

kubectl create secret tls prometheus-yun-cn-tls --cert=yun.cer --key=yun.key -n monitoring

修正 Prometheus chart value.yaml

helm fetch stable/prometheus-operator --untar
vim value.yaml
# 敞开prometheus 的数据落地
storageSpe经典hc:
volumeClaimTemplate:
spec:
storageClassName: rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
# 修正prometheus-ser 和 AlertManager 的replicas 敞开高可用
replicas: 3
# 修正grafana 登录暗码
grafana:
enabled: true
defaultDashboardsEnabled: true
adminPassword: admin@123
# 敞开 grafana alertmanager prometheus ingress
$ [K8sDev] grep -w "ingress:" values.yaml -A 32|grep -vE "#|^$"
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: public-nginx
labels: {}
hosts:
- alertmanager.yun.cn娇韵诗官网,乐彩论坛17500,ecco-点赞汉堡,一个赞送您一个汉堡
paths:
- /
tls:
- secretName: prometheus-yun-cn-tls
hosts:
- alertmanager.yun.cn
--
ingress:
enabled: ture
annotations:
kubernetes.io/ingress.class: public-nginx
labels: {}
hosts:
- grafana.yun.cn
path: /
tls:
- secretName: prometheus-yun-cn-tls
hosts:
- grafana.yun.cn
--
ingress:
e娇韵诗官网,乐彩论坛17500,ecco-点赞汉堡,一个赞送您一个汉堡nabled: ture
annotations:
kubernetes.io/ingress.class: public-nginx
labels: {}
hosts:
- prometheus.yun.cn
paths:
- /
tls:
- secretName: prometheus-yun-cn-tls
hosts:
- prometheus.yun.cn

布置

kube-state-metrics 的作用是收集k8s集群监控目标的,布置阶段拉取镜像或许会有问题,能够事先在每个node上拉取阿里云的镜像,然后tag

docker pull registry.aliyuncs.陆燃喻夏com/google_containers/kube-state-metrics:v1.5.0 && docker tag registry.aliyuncs.com/google_containers/kube-state-metrics:v1.5.0 k8s.gcr.io/kube-state-metrics:v1.5.0
kubectl create ns monitoring
helm install --name prometheus-operator -f values.yaml . --namespace monitoring
# 检查pod
kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-operator-alertmanager-0 2/2 Running 0 14m
alertmanager-prometheus-operator-alertmanager-1 2/2 Running 0 14m
alertmanager-prometheus-operator-alertmanager-2 2/2 Running 0 14m
prometheus-operator-grafana-5d74ccd7bd-zvzm9 2/2 Running 0 16m
prometheus-operator-kube-state-metrics-5d7558d7cc-hgbdr 1/1 Running 0 6m5s
prometheus-operator-operator-58f46454f8-nxrns 1/1 Running 0 16m
prometheus-operator-prometheus-node-exporter-8vvgk 1/1 Running 0 16m
prometheus-operator-prometheus-node-exporter-dzh88 1/1 Running 0 16m
prometheus-operator-p黑死帝rometheus-node-exporter-fbmtj 1/1 Running 0 16m
prometheus-operator-prometheus-node-exporter-ln5jk 1/1 Running 0 16m
prometheus-operator-prometheus-node-exporter-p56xb 1/1 Running 0 16m
prometheus-prometheus-operator-prometheus-0 3/3 Running 1 14m
pro娇韵诗官网,乐彩论坛17500,ecco-点赞汉堡,一个赞送您一个汉堡metheus-prom重生古代纳美男etheus-operator-prometheus-1 3/3 Running 1 14m
# 检查 service ingress
kubectl get service,ingress -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None 9093/TCP,6783/TCP 60m
service/prometheus-operated ClusterIP None 9090/TCP 60m
service/prometheus-operator-alertmanager ClusterIP 10.100.184.96 9093/TCP 62m
service/prometheus-operator-grafana ClusterIP 10.100.93.41 80/TCP 62m
service/prometheus-operator-kube-state-metrics ClusterIP 10.98.180.207 8080/TCP 62m
service/prometheus-operator-operator ClusterIP 10.103.61.39 8080/TCP 62m
service/prometheus-operator-prometheus ClusterIP 10.97.234.73 9090/TCP 62m
service/prometheus-operator-prometheus-node-exporter ClusterIP 10.106.114.236 9100/TCP 62m
NAME HOSTS ADDRESS PORTS AGE
ingress.extensions/prometheus-operator-alertmanager alertmanager.yun.cn 80, 443 20m
ingress.extensions/prometheus-operator-grafana grafana.yun.cn 80, 443 20m
ingress.extensions/prometheus-operator-prometheus prometheus.yun.cn 80, 443 20m

Hosts 绑定

10.6.201.174 alertmanager.yun.cn
10.6.201.174 grafana.yun.cn
10.6.201.174 prometheus.yun.cn

拜访测验

ServiceMonitor

监控etcd

修正ectd发动参数

helm 装置的 prometheus 会主动创立servicemonitor 去监控etcd,可是kubernetes1.14中的etcd版本是3.3.10,3.3.10今后的metrics 和 health接口现已装备经过驱猫最有用的办法一个独立的装备项--listen-metrics-urls 敞开出来,所以要修正etcd的yaml来从头发动etcd

# 在command 追加,大约1min左右etcd就会重启结束
vim /etc/kubernetes/manifests/etcd.yaml
- --listen-metrics-urls=http://192沐苏的异界日子.168.137.102:2381

创立serviceMonitor

helm 现已主动为etcd创立了一个serviceMonitor

kubectl get servicemonitors.monitoring.coreos.com -n monitoring prometheus-operator-kube-etcd -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
creationTimestamp痒孟楠: 2019-05-23T张秋芳和新老公相片01:46:55Z
generation: 1
labels:
app: p医院编号rometheus-operator-kube-etcd
chart: prometheus-operator-5.0.11
her阿穆隆入狱itage: Tiller
release: prometheus-operator
name: prometheus-operator-kube-etcd
namespace: monitoring
resourceVersion: "28021953"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/prometheus-operator-kube-etcd
uid: a5284d89-7cfc-11e9-97f1-fa163e92035a
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
port: http-metrics # 匹配service port
jobLabel: jobLabel
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: prometheus-operator-kube-etcd # 匹配service labels
release: prometheus-operator # 匹配service labels

创立宿舍506service/endpoints

serviceMonitor 监控 运用是经过selector 匹配 lable监控pod的service的/metrics接口来获取监控数据的

所以现在咱们要为etcd 创立service、endpoints,serviceMonitor才干拿到监控数据

vim prometheus-etcdService.yaml
apiVersion: v1
kind: Service
metadata:
name: etcd-k8s
namespace: kube-system
labels:
app: prometheus-operator-kube-etcd
release: prometheus-operator
spec:
selector:
component: etcd
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 2381
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: kube-system
labels:
app: prometheus-operator-kube-etcd
release: prometheus-operator
subsets:
- addresses:
- ip: 192.168.137.102 # master 主机ip
nodeName: master-102 # master 主机名
ports:
- name: http-metrics
port: 2382
protocol: TCP
kubectl create -f prometheus-etcdService.yaml

检查天方地圆手艺放样过程prometheus--->targets 页面 发现etcd状况现已是UP了

检查grafana 就能够看到etcd的监控数据了

Grafana 装备

grafana 根本不需要装备,operator 现已把咱们用到的监控模板内置了

装备监控报警接纳

检查secret

  • alertmanager 报警接纳者的信息是装备在secret 中,所以咱们要找到这个secret
$ [K8sDev] kubectl get secrets 娇韵诗官网,乐彩论坛17500,ecco-点赞汉堡,一个赞送您一个汉堡-n monitoring
NAME TYPE DATA AGE
alertmanager-prometheus-operator-alertmanager Opaque 1 3h58m
default-token-fjqg7 kubernetes.io/service-account-token 3 4h2m
prometheus-yun-cn-tls kubernetes.io/tls 2 3h33m
prometheus-operator-alertmanager-token-r天气预报央视trwt kubernetes.io/service-account-token 3 3h58m
prometheus-operator-grafana Opaque 3 3h58m
prometheus-operator-grafana-token-pllv5 kubernetes.io/service-account-token 3 3h58m
prometheus-operator-kube-state-metrics-token-x7qs4 kubernetes.io/service-account-token 3 3h58m
prometheus-operator-operator-token-sxn65 kubernetes.io/service-account-token 3 3h58m
prometheus-operator-prometheus-node-exporter-token-m7twt kubernetes.io/service-account-token 3 3h58m
prometheus-operator-prometheus-token-5gt5m kubernetes.io/service-account-token 3 3h58m
p娇韵诗官网,乐彩论坛17500,ecco-点赞汉堡,一个赞送您一个汉堡rometheus-prometheus-operator-prometheus Opaque 1 3h56m
  • 能够看到第一个secret alertmanager-prometheus-operator-alertmanager 应该便是,输出看下
$ [K8sDev] kubectl get secrets -n monitoring alertmanager-prometheus-operator-alertmanager -o yaml
apiVersion: v1
data:
alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KcmVjZWl2ZXJzOgotIG5hbWU6ICJudWxsIgpyb3V0ZToKICBncm91cF9ieToKICAtIGpvYgogIGdyb3VwX2ludGVydmFsOiA1bQogIGdyb3VwX3dhaXQ6IDMwcwogIHJlY2VpdmVyOiAibnVsbCIKICB吴平月yZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJvdXRlczoKICAtIG1hdGNoOgogICAgICBhbGVydG5hbWU6IFdhdGNoZG9nCiAgICByZWNlaXZlcjogIm51bGwiCg==
kind: Secret
metadata:
creationTimestamp: 2019-05-23T01:46:54Z
labels:
app: prometheus-operator-alertmanager
chart: prometheus-operator-5.0.11
heritage: Tiller
release: prometheus-operator
name: alertmanager-prometheus-operator-alertmanager
namespace: monitoring
resourceVersion: "28021773"
selfLink: /api/v1/namespaces/monitoring/secrets/alertmanager-prometheus-operator-alertmanager
uid: a4c7b6fe-7cfc-11e9-97f1-fa163e92035a
type: Opaque
  • base64 解密data
$ [K8sDev] echo "Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KcmVjZWl2ZXJzOgotIG5hbWU6ICJudWxsIgpyb3V0ZToKI大邱庄铁哥们帮手CBncm91cF9ieToKICAtIGpvYgogIGdyb3VwX2ludGVydmFsOiA1bQogIGdyb3VwX3dhaXQ6IDMwcwogIHJlY2VpdmVyOiAibnVsbCIKICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJvdXRlczoKICAtIG1hdGNoOgogICAgICBhbGVydG5hbWU6IFdhdGNoZG9nCiAgICByZWNlaXZlcjogIm51bGwiCg==" |base64 -D
global:
resolve_timeout: 5m
receivers:
- name: "null"
route:
group_by:
- job
group_interval: 5m
group_wait: 30s
receiver: "null"
repeat_interval: 12h
routes:
- match:
alertname: Watchdog
receiver: "null"
  • 在界面上检查alertmanager 装备,发现和解密出的根本共同

  • 修正secret
  • 将上面解密出来的数据保存到文件"alertmanager.yaml",并修正
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'chulinx@163.com'
smtp_auth_username: ''
smtp_auth_password: '<邮箱授权码>'
smtp_hello: '163.com'
smtp_require_tls: false
route:
group_by: ['job', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: default
receivers:
- name: 'default'
email_configs:
- to: '547571608@qq.com'
send_resolved: true
  • 从头创立secret
kubectl delete secrets -n monitoring alertmanager-prometheus-operator-alertmanager
kubectl create secret generic alertmanager-prometheus-operator-alertmanager --from-file=alertmanager.yaml -n monitoring
  • 界面检查,现已收效

检查报警

  • 登录邮箱检查报警邮件

rook-ceph监控

创立service、endpoints

rook 也是原生支撑prometheus的,metrics 是由mgr露出的,装置rook的时分现已给咱们创立了service/endpoints

kubectl get endpoints,svc -n rook-ceph
NAME ENDPOINTS AGE
endpoints/rook-ceph-mgr 10.244.4.233:9283 141d
endpoints/rook-ceph-mgr-dashboard 10.244.4.233:443 141d
endpoints/rook-ceph-mon-a 10.244.1.70:6790 141d
endpoints/rook-ceph-mon-c 10.244.3.179:6790 141d
endpoints/rook-ceph-mon-d 10.244.4.232:6790 96d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/rook-ceph-mgr ClusterIP 10.103.95.20 9283/TCP 141d
service/rook-ceph-mgr-dashboard ClusterIP 10.107.244.166 443/TCP 141d
service/rook-ceph-mon-a ClusterIP 10.104.65.218 6790/TCP 141d
service/rook-ceph-mon-c ClusterIP 10.111.140.231 6790/TCP 141d
service/rook-ceph-mon-d ClusterIP 10.103.72.214 6790/TCP 96d

创立serviceMonitor

vim prometheus-serviceMonitorRook.yaml
apiVers娇韵诗官网,乐彩论坛17500,ecco-点赞汉堡,一个赞送您一个汉堡ion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: rook-ceph-mgr
namespace: monitoring
labels:
app: prometheus-operator-rookceph
chart: prometheus-operator-5.0.11
heritage: Tiller
release: prometheus-operator
app: rook-ceph-mgr
rook_cluste洁茹r: rook-ceph
spec:
jobLabel: rook-app
endpoints:
- port: http-metrics
interval: 30s
selector:
matchLabels:
app: rook-ceph-mgr
rook_cluster: rook-ceph
namespaceSelector:
matchNames:
- rook-ceph

检查endpoint

导入rook 监控模板

  • ROOK 官方引荐的三个dashboards
  • Ceph - Cluster
  • Ceph - Pools
  • Ceph - OSD

PrometheusRule

Prometheus-Operator 把AlertManager 的监控规矩笼统为一个k8s中的crd,这个crd便是PrometheusRule,咱们来看下怎么指定一个监控规矩(PrometheusRule)

自界说rook-ceph 磁盘运用率报警

vim prometheus-rookCephRules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: prometheus-operator # lables 装置operator 现已创立的rule来写,operator会依据lables挑选
chart: prometheus-operator-5.0.11
heritage: Tiller
release: prometheus-operator
name: rook-ceph-rules
namespace: monitoring
spec:
groups:
- name: rook-ceph
rules:
- alert: Ceph 磁盘可用空间报警
annotati娇韵诗官网,乐彩论坛17500,ecco-点赞汉堡,一个赞送您一个汉堡ons:
summary: 假如ceph disk 磁盘运用率大于75%,请赶快处理
description: 假如ceph disk 磁盘运用率大于75%,触发报警
expr: |
(ceph_cluster_total_used_bytes/ceph_cluster_total_bytes)*100 > 75
for: 3m
labels:
severity: critical
kubectl create -f prometheus-rookCephRules.yaml

Prometheus Alerts 页面能够看到咱们界说的 rule

模仿毛病发送报警邮件

将阈值调整到10,触发报警

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: prometheus-operator # lables 装置operator 现已创立的rule来写,operator会依据lables挑选
chart: prometheus-operator-5.0.11
heritage: Tiller
release: prometheus-operator
name: rook-ceph-rules
namespace: monitoring
spec:
groups:
- name: rook-ceph
rules:
- alert: Ceph 磁盘可用空间报警
annotations:
summary: 假如ceph disk 磁盘运用率大于75%,请赶快处理
description: 假如ceph disk 磁盘运用率大于75%,触发报警
expr: |
(ceph_cluster_total_used_bytes/ceph_cluster艾古大士_total_bytes)*100 > 10
for: 3m
labels:
severity: critical
kubectl apply -f prometheus-rookCephRules.yaml