Cloud – 第 2 頁 – Claire's Blog

K8S

把普羅米修斯的資料打到ELK

Post By Claire Chang 2022-12-16 下午 2:52

下載Metricbeat的docker版本

官網介紹: https://www.elastic.co/beats/metricbeat
其中給普羅米修斯使用的模組為: https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-prometheus.html
官方映像檔: https://hub.docker.com/r/elastic/metricbeat
下載映像檔

docker pull docker.elastic.co/beats/metricbeat:8.5.3

新建一個Metricbeat的Pod

設定metrucbeat的deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metricbeat
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-srs3-metricbeat
  template:
    spec:
      affinity: {}
      containers:
      - image: dev-registry.xycloud.org/ldr/streaming/metricbeat
        imagePullPolicy: Always
        name: metricbeat
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/metricbeat/metricbeat.yml
          name: vol0
          subPath: metricbeat.yml
        - mountPath: /usr/share/metricbeat/prometheus.yml
          name: vol0
          subPath: prometheus.yml
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: regsecret
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: filebeat-config
        name: vol0

設定兩個config

metricbeat.yml

metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
metricbeat.max_start_delay: 10s
output.logstash:
  enabled: true
  hosts: ["logstash-logstash.tool-elk.svc.cluster.local:5043"]
  index: 'metricbeat'
  logging.level: info
logging.metrics.enabled: false
logging.metrics.period: 30s

prometheus.yml

- module: prometheus
  metricsets: ["query"]
  hosts: ["http://rancher-monitoring-prometheus.cattle-monitoring-system.svc.cluster.local:9090"]
  period: 10s
  queries:
  - name: "stream_total_clients_by_pod"
    path: "/api/v1/query"
    params:
      query: "stream_total_clients_by_pod"

把這兩個config mount進pod的在/usr/share/metricbeat

Prometheus

第一個規則組名為 srs-online-member，包含一個規則。該規則通過表達式 sum(stream_clients_clients{container="json-exporter", name=~".+",namespace=~"stu-srs",pod=~"srs-edge.+"}) by (pod) 求和符合條件的 metrics，這些 metrics 包含在 stream_clients_clients 中，該 metrics 必須滿足以下條件：在命名空間 stu-srs 中，容器名稱為 json-exporter，Pod 名稱符合正則表達式 srs-edge.+。

如果條件滿足，Prometheus 將會創建一個名為 stream_total_clients_by_pod 的時間序列，其中 pod 是標籤，值是符合條件的 Pod 名稱，這樣可以讓你在 Grafana 等圖形化界面上顯示時間序列並進行分析。

第二個規則組名為 quay-alert.rules，包含一個警報規則。該規則通過表達式 sum(stream_clients_clients{container="json-exporter", name=~".+",namespace=~"stu-srs",pod=~"srs-edge.+"}) by (pod) > 1000 檢查符合條件的 metrics 是否大於 1000。如果條件滿足 5 分鐘以上，Prometheus 將會發出名為 online-member-full 的警報，並設置一些額外的標籤和注釋以便進一步分析。

設定alert規則

我們也可以在Rule裡面設定Alert的規則，當有labels的Severity為warning時，就代表這個rule為一個告警，下面是代表當pod的人數大於1000多過五分鐘時，會觸發告警

  - name: quay-alert.rules
    rules:
    - alert: online-member-full
      annotations:
        message: online-member-full {{ $labels.pod }} at {{ $value }}%
      expr: sum(stream_clients_clients{container="json-exporter", name=~".+",namespace=~"default",pod=~"my-pod.+"})
        by (pod) > 1000
      for: 5m
      labels:
        severity: warning

可以在Prometheus Web UI的Alert頁籤裡找到這個設定值

Prometheus

Prometheus 資料顯示端

Post By Claire Chang 2022-12-16 下午 2:41

架構圖所在位置

Prometheus Web UI

如何在Rancher裡面查看Web UI

將Web UI轉到自己的電腦查看

kubectl -n cattle-monitoring-system port-forward prometheus-rancher-monitoring-prometheus-0 9090:9090

Web UI是普羅米修斯內建附帶的狀態查看頁面，可以從這邊來看現在普羅米修斯所使用的config或者endpoint的設定

Grafana

Grafana完整的支持PromQL，並且提供自動補完功能，非常方便
安裝

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
sudo wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key

添加此存儲庫以獲得穩定版本：

echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

接著

sudo apt-get update
# Install the latest OSS release:
sudo apt-get install grafana
# Install the latest Enterprise release:
sudo apt-get install grafana-enterprise

在k8s安裝請詳見: https://grafana.com/docs/grafana/latest/setup-grafana/installation/kubernetes/

PromQL 基本使用

PromQL 查詢結果主要有3 種類型：

瞬時數據(Instant vector): 包含一組時序，每個時序只有一個點，例如：http_requests_total
區間數據(Range vector): 包含一組時序，每個時序有多個點，例如：http_requests_total[5m]
純量數據(Scalar): 純量只有一個數字，沒有時序，例如：count(http_requests_total)

另外，可到target畫面找該目標可用以識別的labels來區別
不同的資料來源。例如: SRS core和edge都是SRS都吃相同的
資料，要讓Grafana個別顯示出資料就是需要label來做搜尋，
就可以用完全相同的exporter在不同的服務中。

提供了許多運算功能: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators
可使用運算子=~後面的值接regular expression來做模糊搜尋(eg: name=~“.+“ )

將普羅米修斯的資料放上.Net專案

普羅米修斯也有提供API讓程式呼叫並取得資料，而且各種語言都有第三方推出套件可以簡易的與普羅米修斯的API取資料
.Net的參考套件
https://github.com/prometheus-net/prometheus-net
可以使用.NET從普羅米修斯取資料，並開發自己想要的監控內容

Prometheus

Relabel設定

Post By Claire Chang 2022-12-16 下午 2:18

當我們在看使用Prometheus-operator產生出來的yaml檔案時，會發現裡面用了許多的source_labels標籤，這個是讓operator可以進一步處理資料標籤的方式(如增/刪要送出的資料、端點)

relabel_config

Endpoint 的值是由 __scheme__ + __address__ + __metrics_path__ 所組成

添加新標籤
更新現有標籤
重寫現有標籤
更新指標名稱
刪除不需要的標籤
刪除不需要的指標
在特定條件下刪除指標
修改標籤名稱
從多個現有標籤構建標籤

設定prometheus-operator

Post By Claire Chang 2022-12-16 下午 2:15

先決條件

需要一個具有管理員權限的 Kubernetes 集群。

安裝prometheus-operator

安裝prometheus-operator的自定義資源定義 (CRD) 以及運營商本身所需的 RBAC 資源。
運行以下命令以安裝 CRD 並將 Operator 部署到default命名空間中：
LATEST=$(curl -s https://api.github.com/repos/prometheus-operator/prometheus-operator/releases/latest | jq -cr .tag_name)
curl -sL https://github.com/prometheus-operator/prometheus-operator/releases/download/${LATEST}/bundle.yaml | kubectl create -f –
可以使用以下命令檢查是否完成：
kubectl wait –for=condition=Ready pods -l app.kubernetes.io/name=prometheus-operator -n default

佈署範例

這邊是使用OpenShift的yaml去設定相關佈署資訊，更多請見: https://docs.openshift.com/container-platform/4.11/rest_api/monitoring_apis/prometheus-monitoring-coreos-com-v1.html
部署一個簡單的Pod，其中包含 3 個image，用於偵聽端口並公開指標8080

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: example-app
        image: fabxc/instrumented_app
        ports:
        - name: web
          containerPort: 8080

用一個 Service 對象公開應用程序，該對象選擇所有app標籤具有example-app值的 Pod。Service 對像還指定公開指標的端口。

kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080

最後，我們創建一個 ServiceMonitor 對象，它選擇所有帶有app: example-app標籤的服務對象。ServiceMonitor 對像還有一個team 標籤（在本例中team: frontend為）來標識哪個團隊負責監視應用程序/服務。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

部署普羅米修斯

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

更多訊息請見: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/rbac.md
之前，我們已經創建了帶有team: frontend label 的 ServiceMonitor 對象，這裡我們定義 Prometheus 對象應該選擇所有帶有team: frontendlabel 的 ServiceMonitor。這使前端團隊能夠創建新的 ServiceMonitors 和服務，而無需重新配置 Prometheus 對象。

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

要驗證是否已啟動並正在運行，請運行：

kubectl get -n default prometheus prometheus -w

Prometheus

Prometheus Operator

Post By Claire Chang 2022-12-16 下午 12:48

Prometheus Operator介紹

官方網站: https://prometheus-operator.dev/

Prometheus Operator 提供Kubernetes原生部署和管理Prometheus及相關監控組件。該項目的目的是為 Kubernetes 集群簡化和自動化基於 Prometheus 的監控堆棧的配置。

Prometheus算子包括但不限於以下特性：

Kubernetes 自定義資源：使用 Kubernetes 自定義資源部署和管理 Prometheus、Alertmanager 及相關組件。
簡化的部署配置：配置 Prometheus 的基礎知識，例如版本、持久性、保留策略和本地 Kubernetes 資源的副本。
Prometheus Target Configuration：根據熟悉的Kubernetes標籤查詢，自動生成監控目標配置；無需學習普羅米修斯特定的配置語言。

Prometheus Operator優點

下面是一個最原始的普羅米修斯的設定範例，若是這樣設定，每一個監控目標都需要手動設定，當Pod自動增加/減少時，會需要有人幫忙更改監控對象，所以Prometheus Operator就出現了。

主要功能是去做Services Discovery，例如我們可以用Deployment去管理Pod的產生，用Service去管理Pod之間的互連，而Operator可以透過Service Monitor去發現需要監控的Service，並且可以隨著Pod的增減動態改變。
Prometheus Operator裡面有一個叫做prometheus-config-reloader的，可以透過ServiceMonitor產生新的prometheus.yml

Prometheus Operator能做什麼

要了解Prometheus Operator能做什麼，其實就是要了解Prometheus Operator為我們提供了哪些自定義的Kubernetes資源，列出了Prometheus Operator目前提供的️4類資源：

Prometheus：聲明式創建和管理Prometheus Server實例；
ServiceMonitor：負責聲明式的管理監控配置；
PrometheusRule：負責聲明式的管理告警配置；
Alertmanager：聲明式的創建和管理Alertmanager實例。

簡言之，Prometheus Operator能夠幫助用戶自動化的創建以及管理Prometheus Server以及其相應的配置。

Prometheus Operator的架構示意圖

K8S, Prometheus

在K8S裡為Prometheus增加exporter: 以pushgateway為例

Post By Claire Chang 2022-12-16 下午 12:32

PUSHGATEWAY介紹

Prometheus Pushgateway 的存在是為了允許臨時和批處理作業將其指標公開給 Prometheus。由於這類工作存在的時間可能不夠長，無法被抓取，因此他們可以將指標推送到 Pushgateway。Pushgateway 然後將這些指標公開給 Prometheus。

何時使用 PUSHGATEWAY

我們只建議在某些有限的情況下使用 Pushgateway。盲目地使用 Pushgateway 而不是 Prometheus 通常的 pull 模型來進行一般指標收集時，有幾個陷阱：

當通過單個 Pushgateway 監控多個實例時，Pushgateway 既成為單點故障又成為潛在的瓶頸。
up 你失去了普羅米修斯通過指標（在每次抓取時生成）的自動實例健康監控。
Pushgateway 永遠不會忘記推送給它的系列，並將它們永遠暴露給 Prometheus，除非這些系列是通過 Pushgateway 的 API 手動刪除的。

instance當作業的多個實例通過標籤或類似物在 Pushgateway 中區分它們的指標時，後一點尤其重要。即使原始實例被重命名或刪除，實例的指標也會保留在 Pushgateway 中。這是因為作為指標緩存的 Pushgateway 的生命週期從根本上獨立於將指標推送給它的進程的生命週期。將此與普羅米修斯通常的拉式監控進行對比：當一個實例消失時（有意或無意），其指標將隨之自動消失。使用 Pushgateway 時，情況並非如此，您現在必須手動刪除任何陳舊的指標或自己自動執行此生命週期同步。

通常，Pushgateway 的唯一有效用例是捕獲服務級批處理作業的結果。“服務級”批處理作業是在語義上與特定機器或作業實例不相關的作業（例如，為整個服務刪除多個用戶的批處理作業）。此類作業的指標不應包含機器或實例標籤，以將特定機器或實例的生命週期與推送的指標分離。這減少了在 Pushgateway 中管理陳舊指標的負擔。

取得pushgateway的image

官方檔案: https://hub.docker.com/r/prom/pushgateway
或在cmd輸入
docker pull prom/pushgateway

建立一個含有pushgateway的pod

為pushgateway寫Deployments

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: pushgateway
  name: pushgateway
  namespace: default
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: pushgateway
    spec:
      containers:
      - image: prom/pushgateway
        imagePullPolicy: Always
        name: pushgateway
        ports:
        - containerPort: 9091
          name: pushgateway
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always

為pushgateway的POD產生一個Headless Services

將Service指到對應的Pod

接著到同域名的容器打
echo “some_metric 3.14” | curl –data-binary @- http://pushgateway:9091/metrics/job/some_job
然後就可以用下面指令看資料
curl http://pushgateway:9091/metrics

Prometheus

Prometheus Exporter

Post By Claire Chang 2022-12-16 下午 12:17

資料提供端在架構圖的哪邊呢

資料提供端的資料長怎樣呢

Counter: 代表一個單調遞增的計數器
Gauge: 表示可以任意上下的單個數值
Histogram:直方圖對觀察結果進行採樣（通常是請求持續時間或響應大小等），並將它們計入可配置的存儲桶中。它還提供所有觀察值的總和。
Summary: 與histogram類似，摘要對觀察結果進行採樣（通常是請求持續時間和響應大小等）。雖然它還提供了觀察總數和所有觀察值的總和，但它計算了滑動時間窗口上的可配置分位數。

查看現有的資料提供端提供了那些資訊

打開Prometheus面板的Targets
選擇要查看的目標的endpoint連結，除了node-exporter外，都會需要在k8s的內網去讀取資料，以json-exporter來說，可在namespace內部使用下面指令查看:
curl “http://127.0.0.1:7979/probe?module=default&target=http://127.0.0.1:1985/api/v1/streams/”
但是同時我們也會發現有許多的網址是無法連上的，因為部分的exporter若是有需要使用密鑰

取得pod-exporter所提供的資料

rancher-monitoring-kubelet可以取得在所有node裡面的Pods的運行狀態，但是在k8s取得Pods的狀態需要認證，因此需要在yaml裡面設定所需要的Secrets，指令如下:

# 取得該namespace的所有密鑰
kubectl get secret -n cattle-monitoring-system
# 取得密鑰的內容
kubectl -n cattle-monitoring-system get secret rancher-monitoring-prometheus-token-hvlqt -o jsonpath={.data.token} | base64 –d

# 將pod-exporter的網址後面加上-H並帶入密鑰
curl https://172.17.2.22:10250/metrics/cadvisor-k -H “Authorization: Bearer ${TOKEN}”

更多資訊請見:Accessing the Kubernetes API from a Pod

了解資料提供端的樣子的重要性

可了解要怎麼在Grafana搜尋目標資料，並了解有哪些資料是可以取得的

上面的資料可用以下的PromQL來撈出，sum代表所有串流的數字加總，並以pod label做資料加總分組的依據。
(sum(stream_clients_clients{namespace=~”namespace_name”, pod=~”pod_name.+”, name=~”.+”}) by (pod))
因此了解有哪些資料，才能夠使用PromQL撈出所需資料

如何產生這些資料

官方提供許多各種語言可使用的函式庫
https://prometheus.io/docs/instrumenting/clientlibs/

以下為幾個我嘗試過的exporter:

Node.js的swagger-stats: https://github.com/slanatech/swagger-stats
將JSON轉為exporter格式: json-exporter
使用Pushgateway: https://github.com/prometheus/pushgateway

不論使用上面哪個方法，最終都需要有一個類似這個頁面的產出，一個靜態的純文字頁面，上面有著我們要觀察的值

Prometheus

Prometheus 介紹

Post By Claire Chang 2022-12-16 下午 12:15

Prometheus 簡介

我們在 SoundCloud 的官方博客中可以找到一篇關於他們爲什麼需要新開發一個監控系統的文章 Prometheus: Monitoring at SoundCloud，在這篇文章中，他們介紹到，他們需要的監控系統必須滿足下面四個特性：

簡單來說，就是下面四個特性：

多維度數據模型
方便的部署和維護
靈活的數據採集
強大的查詢語言

實際上，多維度數據模型和強大的查詢語言這兩個特性，正是時序數據庫所要求的，所以 Prometheus 不僅僅是一個監控系統，同時也是一個時序數據庫。那爲什麼 Prometheus 不直接使用現有的時序數據庫作爲後端存儲呢？這是因爲 SoundCloud 不僅希望他們的監控系統有着時序數據庫的特點，而且還需要部署和維護非常方便。

此外，Prometheus 數據採集方式也非常靈活。要採集目標的監控數據，首先需要在目標處安裝數據採集組件，這被稱之爲 Exporter，它會在目標處收集監控數據，並暴露出一個 HTTP 接口供 Prometheus 查詢，Prometheus 通過 Pull 的方式來採集數據，這和傳統的 Push 模式不同。

不過 Prometheus 也提供了一種方式來支持 Push 模式，你可以將你的數據推送到 Push Gateway，Prometheus 通過 Pull 的方式從 Push Gateway 獲取數據。目前的 Exporter 已經可以採集絕大多數的第三方數據，比如 Docker、HAProxy、StatsD、JMX 等等，官網有一份 Exporter 的列表。

Prometheus 的整體架構圖

從上圖可以看出，Prometheus 生態系統包含了幾個關鍵的組件：Prometheus server、Pushgateway、Alertmanager、Web UI 等，但是大多數組件都不是必需的，其中最核心的組件當然是 Prometheus server，它負責收集和存儲指標數據，支持表達式查詢，和告警的生成。

千萬不要使用PM2

可能發生的問題

PM2其他功能的替代方案

使用nodeJS所推出的log管理系統

下載Metricbeat的docker版本​

新建一個Metricbeat的Pod​

設定兩個config​

Prometheus Rule功能介紹

設定Prometheus Rule

設定alert規則

架構圖所在位置

Prometheus Web UI

Grafana

PromQL 基本使用

將普羅米修斯的資料放上.Net專案

relabel_config

更多教學

先決條件

安裝prometheus-operator

佈署範例

部署普羅米修斯

Prometheus Operator​介紹

Prometheus Operator優點​

Prometheus Operator能做什麼​

Prometheus Operator的架構示意圖

PUSHGATEWAY介紹

何時使用 PUSHGATEWAY

取得pushgateway的image

建立一個含有pushgateway的pod

為pushgateway的POD產生一個Headless Services​

資料提供端​在架構圖的哪邊呢

資料提供端的資料長怎樣呢

查看現有的資料提供端提供了那些資訊

取得pod-exporter所提供的資料​

了解資料提供端的樣子的重要性​

如何產生這些資料​

Prometheus 簡介

Prometheus 的整體架構圖

下載Metricbeat的docker版本

新建一個Metricbeat的Pod

設定兩個config

Prometheus Operator介紹

Prometheus Operator優點

Prometheus Operator能做什麼

為pushgateway的POD產生一個Headless Services

資料提供端在架構圖的哪邊呢

取得pod-exporter所提供的資料

了解資料提供端的樣子的重要性

如何產生這些資料