使用Loki查询日志

1.png

loki 是 grafana 公司出的日志查询工具,区别es,只对标签不对数据做索引,更轻量。

1.png

helm 源

1
2
3
4
helm repo add loki https://grafana.github.io/loki/charts
helm repo update

loki 可以设置nodeSelector, promtail不要设置

查询语句

1
2
3
{job="ingress-nginx/nginx-ingress"} |="php-sht-payment-develop-http" |="refund/create"
{job="php-sht/payment-develop",stream="neo-log"} !="ShopNotifyJob"
{job=~"php-sht/payment-develop.*"} |~"shop_refund" !~"15712" #正则

promtail 作为loki的数据采集客户端,在k8s部署采用服务发现的形式监控所有容器标准输入输出。业务日志监控可以采用sidecar方式放在服务pod里,把日志文件mount 到本地,推给loki.

promtail.yaml 普通配置

1
2
3
4
5
6
7
8
9
10
11
12
server:
http_listen_port: 3101
scrape_configs:
- job_name: payment-develop
entry_parser: raw
static_configs:
- targets:
- localhost
labels:
job: php-sht/payment-develop
stream: neo-log
__path__: /var/www/payment/runtime/logs/*.log

自定义metrics pipeline 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
server:
http_listen_port: 3101
client:
url: http://172.16.101.117:3100/api/prom/push
scrape_configs:
- job_name: payment-develop #不参与查询
static_configs:
- targets:
- localhost
labels:
job: php-sht/payment-develop #生成查询标签
stream: neo-log
__path__: /var/www/payment/runtime/logs/*.log
pipeline_stages:
- match:
selector: '{stream="neo-log"}'
stages:
- regex:
expression: "^(?P<message>.*)$"
- regex:
expression: "^.*(?P<warning_msg>(warning|WARNING)).*$"
- regex:
expression: "^.*(?P<error_msg>(error|ERROR)).*$"
- metrics: #根据日志生成mertrics,注意此统计只能针对当前job
log_lines_total:
type: Counter
description: "log total"
source: message
config:
action: inc
error_log_total: #统计错误日志总数
type: Counter
description: "error message total"
source: error_msg
config:
action: inc
warning_log_total: #统计warning日志总数
type: Counter
description: "warning message total"
source: warning_msg
config:
action: inc

服务启动后会在 3101 端口产生自定义metrics数据,以promtail_custom开头,如:promtail_custom_log_lines_total

k8s中配置prometheus服务发现,在service 中配置:

1
2
3
annotations:
prometheus.io/port: "3101"
prometheus.io/scrape: "true"

在 grafana 新建监控指标:

2.png

监控日志总数,warning日志、error日志增长速率:

3.png

使用minio存储chunk

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
server:
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: s3
aws:
s3: http://***:***@172.2.4.3:9000/loki
s3forcepathstyle: true

参考:

https://github.com/google/re2/wiki/Syntax