Prometheus – Claire's Blog

發佈日期: 2023-04-19 下午 7:33 作者: Claire Chang

Prometheus Exporter: json-exporter

甚麼是JSON Exporter

JSON Exporter是一個Prometheus的exporter（指標收集器），它能夠從提供JSON格式的API中收集指標數據，並將這些數據轉換為Prometheus所支持的格式，以便Prometheus進行分析和視覺化。

JSON Exporter的運行方式是通過設置JSON配置文件，其中包括API端點和相關的指標數據，然後使用Prometheus的配置文件將JSON Exporter添加到Prometheus的targets列表中。

JSON Exporter可以收集各種不同類型的指標數據，例如計數器（counters）、量規（gauges）和直方圖（histograms）等，並可以根據需要對數據進行轉換和聚合。

如何設置Exporter

在官方的Writing exporters文件中，有下面這一段

Each exporter should monitor exactly one instance application, preferably sitting right beside it on the same machine. That means for every HAProxy you run, you run a haproxy_exporter process. For every machine with a Mesos worker, you run the Mesos exporter on it, and another one for the master, if a machine has both.

The theory behind this is that for direct instrumentation this is what you’d be doing, and we’re trying to get as close to that as we can in other layouts. This means that all service discovery is done in Prometheus, not in exporters. This also has the benefit that Prometheus has the target information it needs to allow users probe your service with the blackbox exporter.

這段的意思是說，Exporter盡量和主要的Container放在同一個POD裡面，如下圖:

這樣做主要的原因是可以避免單點失敗，且更符合微服務架構的理念。

實作概要

下面是輸出的JSON檔案的範例

{
  "code": 0,
  "server": "vid-69t27o3",
  "streams": [
    {
      "id": "vid-0diw412",
      "name": "livestream",
      "vhost": "vid-y000397",
      "app": "live",
      "tcUrl": "rtmp://172.16.46.86:1935/live",
      "url": "/live/livestream",
      "live_ms": 1681903514993,
      "clients": 4,
      "frames": 0,
      "send_bytes": 45370,
      "recv_bytes": 34930,
      "kbps": {
        "recv_30s": 0,
        "send_30s": 0
      },
      "publish": {
        "active": false
      },
      "video": null,
      "audio": null
    }
  ]
}

下面是json-exporter的config.yml

modules:
  default:
    metrics:
    - name: server
      path: "{ .server}"

    - name: stream_clients
      type: object
      help: Example of sub-level value scrapes from a json
      path: '{.streams[?(@.name!="")]}'
      labels:
        name: '{.name}' 
      values:
        clients: '{.clients}' 
        send_bytes: '{.send_bytes}'
        recv_bytes: '{.recv_bytes}'
        frames: '{.frames}'
        publish: '{.publish.active}'

    headers:
      X-Dummy: my-test-header

設定要監控的POD的YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/instance: srs-core1
  name: srs-core1
  namespace: stu-srs
spec:
  template:
    metadata:
	  creationTimestamp: null
      labels:
        app.kubernetes.io/instance: srs-core1
	containers:
      ....Other container here
      - image: dev-registry.xycloud.org/ldr/streaming/json-exporter
        imagePullPolicy: IfNotPresent
        name: json-exporter
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
		args: ["--config.file", "/config.yml"]
        volumeMounts:
        - mountPath: /config.yml
          name: json-exporter
          subPath: config.yml
	volumes:  
      - configMap:
          defaultMode: 420
          name: json-exporter
        name: json-exporter

進入該POD的SHELL裡面查看Json-exporter所吐的資料

curl "http://localhost:7979/probe?target=http://127.0.0.1:1985/api/v1/streams/"

可以把資料串到Prometheus了

請參考以下文章

發佈日期: 2023-04-14 下午 7:142023-04-18 下午 4:40 作者: Claire Chang

HorizontalPodAutoscalers by customize metric

設定擴充的行為，sacle down及up的時候所做的行為，這邊的設定是假如維持300秒都穩定相同狀況，則做HPA縮放，最少維持這種狀態60秒，一次增加1個Pod

這邊則設定要參考的數值，若是要使用自訂義資料，則describedObject這邊要設定的與我們在Rules裡面設定的一致，target部分則設定每一個pods的目標值為多少

下面scaleTargetRef的部分則是設定要做HPA的目標是甚麼，有可能是Services，這邊做HPA的目標則為Pod

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: srs-edge
  namespace: srs3
spec:
  behavior:
    scaleDown:
      policies:
      - periodSeconds: 60
        type: Pods
        value: 1
      selectPolicy: Max
      stabilizationWindowSeconds: 300
    scaleUp:
      policies:
      - periodSeconds: 60
        type: Pods
        value: 1
      selectPolicy: Max
      stabilizationWindowSeconds: 300
  maxReplicas: 2
  metrics:
  - object:
      describedObject:
        apiVersion: v1
        kind: Service
        name: eventqueue
      metric:
        name: stream_total_clients_by_pod
      target:
        type: Value
        value: 1k
    type: Object
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: srs-edge

stream_total_clients_by_pod是我們的客製化變數，這個變數是來自於普羅米修斯，要把普羅米修斯的變數給k8s使用，請參見
http://claire-chang.com/2022/12/16/prometheus-rule-for-alert/
這邊很重要的是可以看到label裡面有name: eventqueue，在Prometheus Rule裡一定也要加上labels: service: eventqueue
這樣這個值才會可以被輸出給普羅米修斯使用

如下圖:

發佈日期: 2023-04-13 下午 2:562023-04-18 下午 4:40 作者: Claire Chang

查看普羅米修斯監控目標的exporter資訊

Prometheus Target

普羅米修斯 (Prometheus) 是一套開源的監控系統，其中一個重要的功能就是監控 Service (服務) 的運作狀態，這個功能被稱為 Service Monitoring。

Service Monitoring 可以藉由 exporter，透過定義 HTTP endpoints 的方式，監控這些服務的運作狀態。普羅米修斯會定期呼叫這些 endpoints ，並且收集回應的 metrics 以了解服務是否正常運作、服務的吞吐量、延遲、錯誤率等相關資訊。

Service Monitoring 可以提供以下的監控能力：

監控服務的可用性，例如偵測服務是否還在運作、是否正常回應等等。
監控服務的效能，例如服務的吞吐量、延遲等等。
記錄服務的執行狀態，例如錯誤率、請求數量、處理時間等等。
透過 Service Monitoring，可以提升系統的可用性、效能及穩定性，讓系統管理者能夠更快速地偵測到問題、快速修復問題，降低系統的 downtime，提高系統的可靠度。

如何瀏覽exporter的內容

先到普羅米修斯的網頁的Target的地方，會可以看到現在的監控目標，如果設定的目標沒有正確出現，則可以去Service Discovery的頁籤去確認原因。

內容長這樣

無法存取rancher-monitoring-kubelet的exporter吐出的內容

serviceMonitor/kube-system/rancher-monitoring-kubelet/1 (7/7 up)
普羅米修斯在取得Pod的相關狀態是利用rancher-monitoring-kubelet這個POD來取得相關資訊，但是我們會沒有辦法直接去讀取https://127.0.0.1:10250/metrics/cadvisor這個網址，這是因為要讀取需要先透過K8S的認證
# 取得該namespace的所有密鑰
kubectl get secret -n cattle-monitoring-system

# 取得密鑰的內容
kubectl -n cattle-monitoring-system get secret rancher-monitoring-prometheus-token-hvlqt -o jsonpath={.data.token} | base64 –d

# 將pod-exporter的網址後面加上-H並帶入密鑰
curl https://127.0.0.1:10250/metrics/cadvisor -k -H “Authorization: Bearer token_content_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”

Service Monitor抓不到監控目標的可能原因

以下為一個簡單範例

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  generation: 4
  labels:
    app.kubernetes.io/instance: srs-json-exporter
    manager: agent
    operation: Update
  name: json-exporter
  namespace: stu-srs
spec:
  endpoints:
  - interval: 30s
    params:
      module:
      - default
      target:
      - http://127.0.0.1:1985/api/v1/streams/
    port: json-exporter
  jobLabel: jobLabel
  namespaceSelector:
    matchNames:
    - stu-srs
  selector:
    matchLabels:
      app.kubernetes.io/instance: srs-json-exporter

在selector 字段指定要監控的 Service 的標籤選擇器，而不是 Pod 的標籤選擇器。

在這個 YAML 配置文件中，ServiceMonitor 的 selector 字段設置為 matchLabels: app: json-exporter，這代表 ServiceMonitor 將監控符合 app: json-exporter 標籤的 Service。
如果這個 Service 沒有符合 app: json-exporter 標籤，那麼 ServiceMonitor 就無法監控這個 Service。

如果您確認了 Service 的標籤設置是正確的，那麼可能是因為 ServiceMonitor 與 Service 所在的命名空間不匹配，導致 ServiceMonitor 無法監控 Service。
您可以檢查 ServiceMonitor 的 namespaceSelector 設置是否正確。namespaceSelector 字段指定要監控的命名空間。如果這個 Service 所在的命名空間不在 matchNames 列表中，那麼 ServiceMonitor 就無法監控這個 Service。

發佈日期: 2022-12-16 下午 2:522023-04-18 下午 4:41 作者: Claire Chang

把普羅米修斯的資料打到ELK

下載Metricbeat的docker版本

官網介紹: https://www.elastic.co/beats/metricbeat
其中給普羅米修斯使用的模組為: https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-prometheus.html
官方映像檔: https://hub.docker.com/r/elastic/metricbeat
下載映像檔

docker pull docker.elastic.co/beats/metricbeat:8.5.3

新建一個Metricbeat的Pod

設定metrucbeat的deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metricbeat
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-srs3-metricbeat
  template:
    spec:
      affinity: {}
      containers:
      - image: dev-registry.xycloud.org/ldr/streaming/metricbeat
        imagePullPolicy: Always
        name: metricbeat
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/metricbeat/metricbeat.yml
          name: vol0
          subPath: metricbeat.yml
        - mountPath: /usr/share/metricbeat/prometheus.yml
          name: vol0
          subPath: prometheus.yml
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: regsecret
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: filebeat-config
        name: vol0

設定兩個config

metricbeat.yml

metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
metricbeat.max_start_delay: 10s
output.logstash:
  enabled: true
  hosts: ["logstash-logstash.tool-elk.svc.cluster.local:5043"]
  index: 'metricbeat'
  logging.level: info
logging.metrics.enabled: false
logging.metrics.period: 30s

prometheus.yml

- module: prometheus
  metricsets: ["query"]
  hosts: ["http://rancher-monitoring-prometheus.cattle-monitoring-system.svc.cluster.local:9090"]
  period: 10s
  queries:
  - name: "stream_total_clients_by_pod"
    path: "/api/v1/query"
    params:
      query: "stream_total_clients_by_pod"

把這兩個config mount進pod的在/usr/share/metricbeat

發佈日期: 2022-12-16 下午 2:442023-04-18 下午 4:41 作者: Claire Chang — 1 則留言

Prometheus Rule for Alert

Prometheus Rule功能介紹

Prometheus Rule 是用於在 Prometheus 中定義規則的 YAML 配置文件。它可以根據指定的表達式或條件對 metrics 進行匹配和計算，並在達到一定條件時生成警報或創建新的 metrics。

Prometheus Rule 的主要功能如下：

Metrics 計算：通過表達式對符合條件的 metrics 進行匹配和計算，生成新的 metrics。
警報：當符合指定條件的 metrics 達到一定閾值時，生成警報。
規則繫結：可以為指定的 metrics 繫結指定的規則，進行自動化的警報觸發。
標註註釋：在生成警報時可以加上自定義的標註和註釋，方便後續的統計和分析。

通常在配合 Grafana 等圖形化界面使用時，Prometheus Rule 可以讓用戶方便的自定義需要監控的 metrics，並在 Grafana 上實現對這些 metrics 的實時監控和報警，以實現系統的實時監控和異常處理。

設定Prometheus Rule

這是一個 PrometheusRule YAML 配置文件，用於定義 Prometheus 規則，以檢測和警報指定的 metrics。

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    name: srs-online-member
  name: srs-online-member
  namespace: stu-srs
spec:
  groups:
  - name: srs-online-member
    rules:
    - expr: sum(stream_clients_clients{container="json-exporter", name=~".+",namespace=~"stu-srs",pod=~"srs-edge.+"})
        by (pod)
      labels:
        name: online-member-high
        namespace: stu-srs
        service: eventqueue
      record: stream_total_clients_by_pod
  - name: quay-alert.rules
    rules:
    - alert: online-member-full
      annotations:
        message: online-member-full {{ $labels.pod }} at {{ $value }}%
      expr: sum(stream_clients_clients{container="json-exporter", name=~".+",namespace=~"stu-srs",pod=~"srs-edge.+"})
        by (pod) > 1000
      for: 5m
      labels:
        severity: warning

該文件定義了兩個規則組，每個規則組包含一個或多個規則。

第一個規則組名為 srs-online-member，包含一個規則。該規則通過表達式 sum(stream_clients_clients{container="json-exporter", name=~".+",namespace=~"stu-srs",pod=~"srs-edge.+"}) by (pod) 求和符合條件的 metrics，這些 metrics 包含在 stream_clients_clients 中，該 metrics 必須滿足以下條件：在命名空間 stu-srs 中，容器名稱為 json-exporter，Pod 名稱符合正則表達式 srs-edge.+。

如果條件滿足，Prometheus 將會創建一個名為 stream_total_clients_by_pod 的時間序列，其中 pod 是標籤，值是符合條件的 Pod 名稱，這樣可以讓你在 Grafana 等圖形化界面上顯示時間序列並進行分析。

第二個規則組名為 quay-alert.rules，包含一個警報規則。該規則通過表達式 sum(stream_clients_clients{container="json-exporter", name=~".+",namespace=~"stu-srs",pod=~"srs-edge.+"}) by (pod) > 1000 檢查符合條件的 metrics 是否大於 1000。如果條件滿足 5 分鐘以上，Prometheus 將會發出名為 online-member-full 的警報，並設置一些額外的標籤和注釋以便進一步分析。

設定alert規則

我們也可以在Rule裡面設定Alert的規則，當有labels的Severity為warning時，就代表這個rule為一個告警，下面是代表當pod的人數大於1000多過五分鐘時，會觸發告警

  - name: quay-alert.rules
    rules:
    - alert: online-member-full
      annotations:
        message: online-member-full {{ $labels.pod }} at {{ $value }}%
      expr: sum(stream_clients_clients{container="json-exporter", name=~".+",namespace=~"default",pod=~"my-pod.+"})
        by (pod) > 1000
      for: 5m
      labels:
        severity: warning

可以在Prometheus Web UI的Alert頁籤裡找到這個設定值

發佈日期: 2022-12-16 下午 2:412023-04-18 下午 4:41 作者: Claire Chang

Prometheus 資料顯示端

架構圖所在位置

Prometheus Web UI

如何在Rancher裡面查看Web UI

將Web UI轉到自己的電腦查看

kubectl -n cattle-monitoring-system port-forward prometheus-rancher-monitoring-prometheus-0 9090:9090

Web UI是普羅米修斯內建附帶的狀態查看頁面，可以從這邊來看現在普羅米修斯所使用的config或者endpoint的設定

Grafana

Grafana完整的支持PromQL，並且提供自動補完功能，非常方便
安裝

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
sudo wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key

添加此存儲庫以獲得穩定版本：

echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

接著

sudo apt-get update
# Install the latest OSS release:
sudo apt-get install grafana
# Install the latest Enterprise release:
sudo apt-get install grafana-enterprise

在k8s安裝請詳見: https://grafana.com/docs/grafana/latest/setup-grafana/installation/kubernetes/

PromQL 基本使用

PromQL 查詢結果主要有3 種類型：

瞬時數據(Instant vector): 包含一組時序，每個時序只有一個點，例如：http_requests_total
區間數據(Range vector): 包含一組時序，每個時序有多個點，例如：http_requests_total[5m]
純量數據(Scalar): 純量只有一個數字，沒有時序，例如：count(http_requests_total)

另外，可到target畫面找該目標可用以識別的labels來區別
不同的資料來源。例如: SRS core和edge都是SRS都吃相同的
資料，要讓Grafana個別顯示出資料就是需要label來做搜尋，
就可以用完全相同的exporter在不同的服務中。

提供了許多運算功能: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators
可使用運算子=~後面的值接regular expression來做模糊搜尋(eg: name=~“.+“ )

將普羅米修斯的資料放上.Net專案

普羅米修斯也有提供API讓程式呼叫並取得資料，而且各種語言都有第三方推出套件可以簡易的與普羅米修斯的API取資料
.Net的參考套件
https://github.com/prometheus-net/prometheus-net
可以使用.NET從普羅米修斯取資料，並開發自己想要的監控內容

發佈日期: 2022-12-16 下午 2:182023-04-18 下午 4:41 作者: Claire Chang

Relabel設定

當我們在看使用Prometheus-operator產生出來的yaml檔案時，會發現裡面用了許多的source_labels標籤，這個是讓operator可以進一步處理資料標籤的方式(如增/刪要送出的資料、端點)

relabel_config

Endpoint 的值是由 __scheme__ + __address__ + __metrics_path__ 所組成

添加新標籤
更新現有標籤
重寫現有標籤
更新指標名稱
刪除不需要的標籤
刪除不需要的指標
在特定條件下刪除指標
修改標籤名稱
從多個現有標籤構建標籤

設定prometheus-operator

先決條件

需要一個具有管理員權限的 Kubernetes 集群。

安裝prometheus-operator

安裝prometheus-operator的自定義資源定義 (CRD) 以及運營商本身所需的 RBAC 資源。
運行以下命令以安裝 CRD 並將 Operator 部署到default命名空間中：
LATEST=$(curl -s https://api.github.com/repos/prometheus-operator/prometheus-operator/releases/latest | jq -cr .tag_name)
curl -sL https://github.com/prometheus-operator/prometheus-operator/releases/download/${LATEST}/bundle.yaml | kubectl create -f –
可以使用以下命令檢查是否完成：
kubectl wait –for=condition=Ready pods -l app.kubernetes.io/name=prometheus-operator -n default

佈署範例

這邊是使用OpenShift的yaml去設定相關佈署資訊，更多請見: https://docs.openshift.com/container-platform/4.11/rest_api/monitoring_apis/prometheus-monitoring-coreos-com-v1.html
部署一個簡單的Pod，其中包含 3 個image，用於偵聽端口並公開指標8080

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: example-app
        image: fabxc/instrumented_app
        ports:
        - name: web
          containerPort: 8080

用一個 Service 對象公開應用程序，該對象選擇所有app標籤具有example-app值的 Pod。Service 對像還指定公開指標的端口。

kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080

最後，我們創建一個 ServiceMonitor 對象，它選擇所有帶有app: example-app標籤的服務對象。ServiceMonitor 對像還有一個team 標籤（在本例中team: frontend為）來標識哪個團隊負責監視應用程序/服務。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

部署普羅米修斯

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

更多訊息請見: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/rbac.md
之前，我們已經創建了帶有team: frontend label 的 ServiceMonitor 對象，這裡我們定義 Prometheus 對象應該選擇所有帶有team: frontendlabel 的 ServiceMonitor。這使前端團隊能夠創建新的 ServiceMonitors 和服務，而無需重新配置 Prometheus 對象。

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

要驗證是否已啟動並正在運行，請運行：

kubectl get -n default prometheus prometheus -w

發佈日期: 2022-12-16 下午 12:482023-04-18 下午 4:41 作者: Claire Chang

Prometheus Operator

Prometheus Operator介紹

官方網站: https://prometheus-operator.dev/

Prometheus Operator 提供Kubernetes原生部署和管理Prometheus及相關監控組件。該項目的目的是為 Kubernetes 集群簡化和自動化基於 Prometheus 的監控堆棧的配置。

Prometheus算子包括但不限於以下特性：

Kubernetes 自定義資源：使用 Kubernetes 自定義資源部署和管理 Prometheus、Alertmanager 及相關組件。
簡化的部署配置：配置 Prometheus 的基礎知識，例如版本、持久性、保留策略和本地 Kubernetes 資源的副本。
Prometheus Target Configuration：根據熟悉的Kubernetes標籤查詢，自動生成監控目標配置；無需學習普羅米修斯特定的配置語言。

Prometheus Operator優點

下面是一個最原始的普羅米修斯的設定範例，若是這樣設定，每一個監控目標都需要手動設定，當Pod自動增加/減少時，會需要有人幫忙更改監控對象，所以Prometheus Operator就出現了。

主要功能是去做Services Discovery，例如我們可以用Deployment去管理Pod的產生，用Service去管理Pod之間的互連，而Operator可以透過Service Monitor去發現需要監控的Service，並且可以隨著Pod的增減動態改變。
Prometheus Operator裡面有一個叫做prometheus-config-reloader的，可以透過ServiceMonitor產生新的prometheus.yml

Prometheus Operator能做什麼

要了解Prometheus Operator能做什麼，其實就是要了解Prometheus Operator為我們提供了哪些自定義的Kubernetes資源，列出了Prometheus Operator目前提供的️4類資源：

Prometheus：聲明式創建和管理Prometheus Server實例；
ServiceMonitor：負責聲明式的管理監控配置；
PrometheusRule：負責聲明式的管理告警配置；
Alertmanager：聲明式的創建和管理Alertmanager實例。

簡言之，Prometheus Operator能夠幫助用戶自動化的創建以及管理Prometheus Server以及其相應的配置。

Prometheus Operator的架構示意圖

發佈日期: 2022-12-16 下午 12:322023-04-18 下午 4:41 作者: Claire Chang

在K8S裡為Prometheus增加exporter: 以pushgateway為例

PUSHGATEWAY介紹

Prometheus Pushgateway 的存在是為了允許臨時和批處理作業將其指標公開給 Prometheus。由於這類工作存在的時間可能不夠長，無法被抓取，因此他們可以將指標推送到 Pushgateway。Pushgateway 然後將這些指標公開給 Prometheus。

何時使用 PUSHGATEWAY

我們只建議在某些有限的情況下使用 Pushgateway。盲目地使用 Pushgateway 而不是 Prometheus 通常的 pull 模型來進行一般指標收集時，有幾個陷阱：

當通過單個 Pushgateway 監控多個實例時，Pushgateway 既成為單點故障又成為潛在的瓶頸。
up 你失去了普羅米修斯通過指標（在每次抓取時生成）的自動實例健康監控。
Pushgateway 永遠不會忘記推送給它的系列，並將它們永遠暴露給 Prometheus，除非這些系列是通過 Pushgateway 的 API 手動刪除的。

instance當作業的多個實例通過標籤或類似物在 Pushgateway 中區分它們的指標時，後一點尤其重要。即使原始實例被重命名或刪除，實例的指標也會保留在 Pushgateway 中。這是因為作為指標緩存的 Pushgateway 的生命週期從根本上獨立於將指標推送給它的進程的生命週期。將此與普羅米修斯通常的拉式監控進行對比：當一個實例消失時（有意或無意），其指標將隨之自動消失。使用 Pushgateway 時，情況並非如此，您現在必須手動刪除任何陳舊的指標或自己自動執行此生命週期同步。

通常，Pushgateway 的唯一有效用例是捕獲服務級批處理作業的結果。“服務級”批處理作業是在語義上與特定機器或作業實例不相關的作業（例如，為整個服務刪除多個用戶的批處理作業）。此類作業的指標不應包含機器或實例標籤，以將特定機器或實例的生命週期與推送的指標分離。這減少了在 Pushgateway 中管理陳舊指標的負擔。

取得pushgateway的image

官方檔案: https://hub.docker.com/r/prom/pushgateway
或在cmd輸入
docker pull prom/pushgateway

建立一個含有pushgateway的pod

為pushgateway寫Deployments

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: pushgateway
  name: pushgateway
  namespace: default
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: pushgateway
    spec:
      containers:
      - image: prom/pushgateway
        imagePullPolicy: Always
        name: pushgateway
        ports:
        - containerPort: 9091
          name: pushgateway
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always

為pushgateway的POD產生一個Headless Services

將Service指到對應的Pod

接著到同域名的容器打
echo “some_metric 3.14” | curl –data-binary @- http://pushgateway:9091/metrics/job/some_job
然後就可以用下面指令看資料
curl http://pushgateway:9091/metrics