Redis Sentinel 集群部署
# Redis Sentinel 集群部署
docker-compose搭建redis-sentinel (opens new window)
# 概述
Redis 集群可以在一组 redis 节点之间实现高可用性和 sharding。在集群中会有 1 个 master 和多个 slave 节点。当 master 节点失效时,应选举出一个 slave 节点作为新的 master。然而 Redis 本身(包括它的很多客户端)没有实现自动故障发现并进行主备切换的能力,需要外部的监控方案来实现自动故障恢复,实现 Redis HA 的方案有很多,其中Sentinel是官方推荐的方案。
# Redis HA 方案
HA(High Available),高可用性群集系统简称,是保证业务连续性的有效解决方案,一般有两个或两个以上的节点,且分为活动节点及备用节点。通常把正在执行业务的称为活动节点,而作为活动节点的一个备份的则称为备用节点。当活动节点出现问题,导致正在运行的业务不能正常运行时,备用节点此时就会侦测到,并立即接续活动节点来执行业务。从而实现业务的不中断或短暂中断。
Redis 一般以主-从方式部署(主实例提供读写,从实例主要用于备份),要实现 该方式的 HA 主要有如下几种方案:
- keepalived: 通过 keepalived 的虚拟 IP,提供主从的统一访问,在主出现问题时, 通过 keepalived 运行脚本将从提升为主,待主恢复后,先同步后自动变为主,该方案的好处是主从切换后,客户端是无感知的(因为访问的虚拟 IP 不变),坏处是引入 keepalived 增加部署复杂性,在有些情况下会导致数据丢失。
- zookeeper: 通过 zookeeper 来监控主从实例, 维护最新有效的 IP, 应用通过 zookeeper 取得 IP,对 Redis 进行访问,该方案需要编写大量的监控代码。
- sentinel: 通过 Sentinel 监控主从实例,自动进行故障恢复,该方案有个缺陷:因为主从实例地址( IP & PORT )是不同的,当故障发生进行主从切换后,应用程序无法知道新地址,故在 Jedis 2.2.2 中新增了对 Sentinel 的支持,应用通过
redis.clients.jedis.JedisSentinelPool.getResource()
取得的 Jedis 实例会及时更新到新的主实例地址。
PS:
- sentinel 是解决 HA 问题的,cluster 是解决主从复制问题的,本身不具备主/从切换问题。
- 从Spring Boot 2.x 以后Redis客户端已经默认使用Lettuce,而不是Jedis。
# Sentinel 概述
Redis Sentinel 是官方推荐的高可用性解决方案(Redis HA方案)。它是 Redis 集群的监控管理工具,可以提供节点监控、通知、自动故障恢复和客户端配置发现服务。
简单来说,Redis Sentinel提供的主要功能有:
- 当主节点发生故障时,它将自动选择一个备用节点并将其升级为主节点。
- 充当客户端发现的中心授权来源,客户端连接到Sentinel以获取主节点的地址。
此次搭建使用docker-compose
实现,搭建简单的Sentinel 模型(1-master; 2-slave; 3-sentinel),Redis
版本为lasted-(6.2.6)
。
注意,我这里是在一台机器上采用伪分布式的方式部署,在生产环境应该是多台机器。
# 搭建Redis集群
搭建Redis一主两从环境,创建redis
文件夹,docker-compose.yml
配置如下:
version: '3.1'
services:
master:
environment:
- TZ=Asia/Shanghai
image: redis
container_name: redis-master
command: redis-server --replica-announce-ip 192.168.0.111 --replica-announce-port 6379 --requirepass 123456 --masterauth 123456 --appendonly yes
ports:
- 6379:6379
volumes:
- ./master-data:/data
slave1:
environment:
- TZ=Asia/Shanghai
image: redis
container_name: redis-slave1
command: redis-server --slaveof redis-master 6379 --requirepass 123456 --replica-announce-ip 192.168.0.111 --replica-announce-port 6380 --masterauth 123456 --appendonly yes
ports:
- 6380:6379
volumes:
- ./slave1-data:/data
slave2:
environment:
- TZ=Asia/Shanghai
image: redis
container_name: redis-slave2
command: redis-server --slaveof redis-master 6379 --requirepass 123456 --replica-announce-ip 192.168.0.111 --replica-announce-port 6381 --masterauth 123456 --appendonly yes
ports:
- 6381:6379
volumes:
- ./slave2-data:/data
command参数配置说明:
--requirepass
: redis密码。如果主从服务器设置密码,需主从服务器密码保持一致,否则哨兵机制会失败!masterauth
:主服务的授权密码。注意,由于哨兵模式能够完成主从切换,现有的 Master 可能会变成 Slave,故在当前 Master 容器中也要携带masterauth
参数。replica-announce-ip
:该指令用于指定当在端口转发或NAT网络环境中,当redis有多个ip地址,可以使用该选项指定redis的ip地址。上面的192.168.0.111
为本人虚拟机的地址,请根据实际情况填写自己机器的IP。如果单纯使用容器内的IP,会导致一些不必要的麻烦:例如处理sentinel
容器的访问、sentinel
故障转移失败等等。replica-announce-port
:该指令用于指定当在端口转发或NAT网络环境中,指定redis的端口。这里应该是映射到宿主机的端口,而不是容器默认端口。
可执行以下命令,查看主从是否生效:
$ docker exec redis-master redis-cli -a 123456 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.0.111,port=6380,state=online,offset=1055,lag=0
slave1:ip=192.168.0.111,port=6381,state=online,offset=1055,lag=0
master_failover_state:no-failover
master_replid:81fc0e8be130cbd11fb9236fe86c0dc43f326f8d
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1055
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1055
# 搭建 Sentinel 集群
创建三个 Sentinel 服务,docker-compose.yml
配置如下:
version: '3.1'
services:
sentinel1:
environment:
- TZ=Asia/Shanghai
image: redis
container_name: redis-sentinel1
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
ports:
- 26379:26379
volumes:
- ./s1:/usr/local/etc/redis/conf
sentinel2:
environment:
- TZ=Asia/Shanghai
image: redis
container_name: redis-sentinel2
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
ports:
- 26380:26379
volumes:
- ./s2:/usr/local/etc/redis/conf
sentinel3:
environment:
- TZ=Asia/Shanghai
image: redis
container_name: redis-sentinel3
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
ports:
- 26381:26379
volumes
- ./s3:/usr/local/etc/redis/conf
注意:
如果直接使用文件映射指定
sentinel.conf
到容器内,这么做有可能导致哨兵没有写入配置文件的权限,logs表现为WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
。解决方案:使用文件夹映射。提示,
Sentinel
容器如果想要能访问到自分配IP的Redis主从模式的容器,可以使用该命令可查看Redis主的IP,后面Sentinel配置可以用到,这里我们固定了IP(192.168.0.111),所以直接配置即可。docker inspect redis-master |grep Networks -A 15
# Sentinel 配置文件
在启动容器前需要编写一份通用的 sentinel.conf
配置文件,然后分别创建目录 s1
,s2
,s3
,并且分别复制一份到这三个目录下:
daemonize yes
port 26379
# sentinel announce-ip 192.168.0.111(可选)
# sentinel announce-port 26379 (可选)如果加此配置,需要将起改为对容器映射到宿主机的端口
dir /tmp
# 哨兵sentinel监控的redis主节点的 ip port
# master-name 自定义主节名称(mymaster),只能由字母A-z、数字0-9 、这三个字符".-_"组成。
# quorum 最小投票数,当quorum数量的哨兵认为master失联,那么客观上认为主节点失联
# 官方建议取值为哨兵数量的一半加1,我们有 3 台 Sentinel,所以可以设置成 2
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
sentinel monitor mymaster 192.168.0.111 6379 2
# 当在Redis实例中开启了requirepass foobared 授权密码,那么所有连接Redis实例的客户端都要提供密码
# 哨兵连接主从的密码必须与主一样
# sentinel auth-pass <master-name> <password>
sentinel auth-pass mymaster 123456
# requirepass 123456 redis5.x之后才支持Sentinel配置密码,可以不设置,否则在使用lettuce会遇到Error: NOAUTH
# 个人水平有限,发现处理起来比较麻烦
# Authentication required,
# 指定多少毫秒之后主节点没有应答哨兵,此时,哨兵主观上认为主节点下线 默认30秒
# sentinel down-after-milliseconds <master-name> <milliseconds>
sentinel down-after-milliseconds mymaster 30000
# 这个配置项指定了在发生failover主备切换时,最多可以有多少个slave同时对新的master进行同步,
# 这个数字越小,完成failover所需的时间就越长,但是如果这个数字越大,就意味着越多的slave因为replication而不可用。
# 可以通过将这个值设为 1 来保证每次只有一个slave 处于不能处理命令请求的状态。
# sentinel parallel-syncs <master-name> <numslaves>
sentinel parallel-syncs mymaster 1
# 故障转移的超时时间 failover-timeout (默认三分钟)可以用在以下这些方面:
# 1.同一个sentinel对同一个master两次failover之间的间隔时间。
# 2.当一个slave从一个错误的master那里同步数据开始计算时间。直到slave被纠正为向正确的master同步数据时。
# 3.当想要取消一个正在进行的failover所需要的时间。
# 4.当进行failover时,配置所有slaves指向新的master所需的最大时间。不过,即使过了这时间,slaves依然会被正确配置为指向master,但是就不按parallel-syncs所配置的规则来了
# sentinel failover-timeout <master-name> <milliseconds>
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
更详细的配置说明,见下文的附录。创建完成后就可以启动容器了。
# 查看Sentinel集群是否生效
进入 Sentinel 容器,使用 Sentinel API 查看监控情况:
docker exec -it redis-sentinel1 /bin/bash
redis-cli -p 26379
sentinel master mymaster # 查看redis主信息
sentinel slaves mymaster # 查看从redis信息
执行上述指令,当看到以下的信息,即集群已经生效:
......
31) "num-slaves"
32) "2"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
.......
注意尝试证明使用哨兵模式能够完成故障转移,网上很多教程是有问题的(主要是无法访问到Redis主从模式的容器),即建议尝试手动停掉Redis主服务,通过查看redis-sentinel
日志,看其将哪个redis选为主,然后这个升为主的redis此时是具备写的能力的。
$ docker logs -f redis-sentinel1
......
1:X 11 Dec 2021 19:29:22.329 # +monitor master mymaster 192.168.0.111 6379 quorum 2
1:X 11 Dec 2021 19:29:22.331 * +slave slave 192.168.0.111:6380 192.168.0.111 6380 @ mymaster 192.168.0.111 6379
1:X 11 Dec 2021 19:29:22.333 * +slave slave 192.168.0.111:6381 192.168.0.111 6381 @ mymaster 192.168.0.111 6379
1:X 11 Dec 2021 19:29:24.211 * +sentinel sentinel bf68f0c1d648053a5db608ae56b2d6adb879fbc4 172.21.0.2 26379 @ mymaster 192.168.0.111 6379
1:X 11 Dec 2021 19:29:24.277 * +sentinel sentinel 4dc7e383eb36346c880bc42536b2ebfd19b7caaf 172.21.0.3 26379 @ mymaster 192.168.0.111 6379
# ===此处我down掉了redis-master,sentinel将redis-slave1作为了主
1:X 11 Dec 2021 19:33:07.835 # -odown master mymaster 192.168.0.111 6379
1:X 11 Dec 2021 19:33:07.836 * +slave-reconf-inprog slave 192.168.0.111:6381 192.168.0.111 6381 @ mymaster 192.168.0.111 6379
1:X 11 Dec 2021 19:33:07.836 * +slave-reconf-done slave 192.168.0.111:6381 192.168.0.111 6381 @ mymaster 192.168.0.111 6379
1:X 11 Dec 2021 19:33:07.912 # +failover-end master mymaster 192.168.0.111 6379
1:X 11 Dec 2021 19:33:07.912 # +switch-master mymaster 192.168.0.111 6379 192.168.0.111 6380
1:X 11 Dec 2021 19:33:07.913 * +slave slave 192.168.0.111:6381 192.168.0.111 6381 @ mymaster 192.168.0.111 6380
......
# Spring Boot 2.x 配置yml 参考
spring:
redis:
password: 123456
lettuce:
pool:
max-active: 10 # 连接池最大连接数(使用负值表示没有限制)
max-idle: 8 # 连接池中的最大空闲连接
min-idle: 2 # 连接池中的最小空闲连接
max-wait: -1ms
sentinel:
master: mymaster
nodes: 192.168.0.111:26379,192.168.0.111:26380,192.168.0.111:26381
# 附:Redis 主从核心配置说明
主从配置redis.conf
的一些核心配置说明,建议直接阅读英文注释。
daemonize yes # 将`daemonize`由`no`改为`yes`
# bind 127.0.0.1 # 默认绑定的是回环地址,默认不能被其他机器访问
protected-mode no # 是否开启保护模式,由yes该为no
port 6380 # 端口号
replicaof 127.0.0.1 6379 # masterIP 端口
# Sentinel执行原理*
# 执行流程
Sentinel的执行流程为启动并初始化Sentinel,获取主从服务器信息,通过命令连接、订阅/推送的方式与主从服务器通讯,然后检查主服务器的主观下线、检查客观下线状态,当一个主服务器发现客观下线,随后选举一个Sentinel Leader,进行故障转移。
# 启动并初始化Sentinel
Sentinel是一个特殊的Redis服务器,它并不会进行持久化。在Sentinel实例启动后,每个Sentinel会创建2个连向主服务器的网络连接:
- 命令连接:用于向主服务器发送命令,并接收响应;
- 订阅连接:用于订阅主服务器的—sentinel—:hello频道。
# 获取主从服务器信息(命令)
Sentinel默认每10s一次,向被监控的主服务器发送info
命令(命令连接),获取主服务器和其下属从服务器的信息。
# 这个命令可以查看很多信息,Clients、Memory、Persistence、Stats、CPU、Cluster、Keyspace....
127.0.0.1:6379> info
# Server
redis_version:5.0.5
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:8be439e7661fc37f
redis_mode:standalone
os:Linux 3.10.0-229.el7.x86_64 x86_64
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=7115,lag=1
master_replid:0aa75695a9fd13040fc3447feadaa62109715c22
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:7115
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:7115
.....
当Sentinel发现主服务器有新的从服务器出现时,Sentinel还会向从服务器建立命令连接和订阅连接。
在命令连接建立之后,Sentinel还是默认10s一次,向从服务器发送info
命令,并记录从服务器的信息。
# 向主从服务器发送消息(订阅)
默认情况下,Sentinel每2s一次,向所有被监视的主服务器和从服务器所订阅的—sentinel—:hello频道上发送消息,消息中会携带Sentinel自身的信息和主服务器的信息。
PUBLISH _sentinel_:hello "< s_ip > < s_port >< s_runid >< s_epoch > < m_name > < m_ip >< m_port ><m_epoch>"
# 接收来自主从服务器的频道信息
当Sentinel与主服务器或从服务器建立起订阅连接之后,Sentinel就会通过订阅连接,向服务器发送以下命令:
subscribe —sentinel—:hello
注意,Sentinel彼此之间只创建命令连接,而不创建订阅连接,因为Sentinel通过订阅主服务器或从服务器的—sentinel—:hello频道,就可以感知到新的Sentinel的加入,而一旦新Sentinel加入后,相互感知的Sentinel只需要通过命令连接来通信就可以了。
# 检测主观下线状态
Sentinel每秒一次向所有与它建立了命令连接的实例(主服务器、从服务器和其他Sentinel)发送PING命令,当示例的回复满足以下的情况,Sentinel就会认为该实例主观下线(SDown):
- 无效回复:实例在
down-after-milliseconds
毫秒内返回除了+PONG、-LOADING、-MASTERDOWN外的回复; - 超时:实例在
down-after-milliseconds
毫秒内无回复。
这个时候只是某个Sentinel认为某个实例下线了,也仅仅是它认为(主观)所以它后续还要去问问大家(其他Sentinel)这个实例是不是下线了,下一步的操作就是追求统一意见达到客观下线。
# 检查客观下线状态
当一个Sentinel将一个主服务器判断为主观下线后,Sentinel会向同时监控这个主服务器的所有其他Sentinel发送查询命令(问是不是下线了):
SENTINEL is-master-down-by-addr <ip> <port> <current_epoch> <runid>
其他Sentinel回复:
<down_state>< leader_runid >< leader_epoch >
根据回复,判断它们是否也认为主服务器下线。如果达到Sentinel配置中的quorum数量的Sentinel实例都判断主服务器为主观下线,则该主服务器就会被判定为客观下线(ODown)。
# 选举Leader Sentinel
好了,这回所有的Sentinel中大多数都认为一个主服务器下线了(客观下线),那么监视这个主服务器的所有Sentinel会通过选举算法(raft),选出一个Leader Sentinel去执行failover(故障转移)操作。
# 哨兵Leader选举
# Raft协议
Raft协议是用来解决分布式系统一致性问题的协议。
- Raft协议描述的节点共有三种状态:Leader(主),Follower(从),Candidate(候选)。
- Term:Raft协议将时间切分为一个个的 term(任期),可以认为是一种“逻辑时间”。
选举流程:
- Raft采用心跳机制触发Leader选举
- 系统启动后,全部节点初始化为Follower,term为0。
- 节点如果收到了
RequestVote
(选举请求)或者AppendEntries
(主追加),就会保持自己的Follower身份 - 如果某个节点一段时间内(随机)没收到
AppendEntries
消息,即在该节点的超时时间内还没发现Leader,Follower就会转换成Candidate,自己开始竞选Leader。一旦转化为Candidate,该节点立即开始下面几件事情:- 增加自己的term。
- 启动一个新的定时器(随机)。
- 给自己投一票。
- 向所有其他节点发送RequestVote,并等待其他节点的回复。
- 如果在计时器超时前,节点收到多数节点的同意投票,就转换成Leader。同时向所有其他节点发送
AppendEntries
,告知自己成为了Leader。每个节点在一个term内只能投一票,采取先到先得的策略,Candidate前面说到已经投给了自己,Follower会投给第一个收到RequestVote
的节点。
Raft协议选举Leader的关键是定时器采取随机超时时间,并且在同一个term内,先转为Candidate的节点会先发起投票,从而获得多数票。
# Sentinel的Leader选举流程
- 某Sentinel认定master客观下线后,该Sentinel会先看看自己有没有投过票,如果自己已经投过票给其他Sentinel了,在一定时间内自己就不会成为Leader。
- 如果该Sentinel还没投过票,那么它就成为Candidate。 那么它需要完成几件事情:
- 更新故障转移状态为start
- 当前
epoch
加1,相当于进入一个新term,在Sentinel中epoch就是Raft协议中的term。 - 向其他节点发送
is-master-down-by-addr
命令请求投票,命令会带上自己的epoch。 - 给自己投一票(leader、leader_epoch)
- 当其它哨兵收到此命令时,可以同意或者拒绝它成为领导者(通过判断epoch);
- Candidate会不断的统计自己的票数,直到他发现认同他成为Leader的票数超过一半而且超过配置的quorum,这时它就成为了Leader。
- 其他Sentinel等待Leader从slave选出master后,检测到新的master正常工作后,就会去掉客观下线的标识。
# 主服务器的选择与故障转移
Leader Sentinel根据以下规则从客观下线的主服务器的从服务器中选择出新的主服务器,即将失效 master 的其中一个 slave 升级为新的 master:
- 过滤掉主观下线的节点
- 选择slave-priority最高的子节点,如果由则返回没有就继续选择
- 选择出复制偏移量最大的子节点,因为复制偏移量越大则数据复制的越完整,如果由就返回了,没有就继续
- 选择run_id最小的节点,因为run_id越小说明重启次数越少
在选择到新的master后,Leader Sentinel的故障转移操作主要有三个步骤:
- (选出新master)让失效 Master 的其他 Slave 改为复制新的 Master ;
- 当客户端试图连接失效的 Master 时,集群也会向客户端返回新 Master 的地址,使得集群可以使用现在的 Master 替换失效 Master 。
- Master 和 Slave 服务器切换后, Master 的
redis.conf
、 Slave 的redis.conf
和sentinel.conf
的配置文件的内容都会发生相应的改变,即 Master 主服务器的redis.conf
配置文件中会多一行replicaof
的配置,sentinel.conf
的监控目标会随之调换。(修改配置)
# 附:Redis Sentinel 核心配置说明
以下为官方Redis Sentinel的配置模版文件-sentinel.conf (opens new window),更多的说明可以参考官方文档 (opens new window):
# Example sentinel.conf
# *** IMPORTANT ***
#
# By default Sentinel will not be reachable from interfaces different than
# localhost, either use the 'bind' directive to bind to a list of network
# interfaces, or disable protected mode with "protected-mode no" by
# adding it to this configuration file.
#
# Before doing that MAKE SURE the instance is protected from the outside
# world via firewalling or other means.
#
# For example you may use one of the following:
# 指定可以连接的ip
# bind 127.0.0.1 192.168.1.1
# 是否开启保护模式
# protected-mode no
# port <sentinel-port>
# The port that this sentinel instance will run on
# 哨兵sentinel实例运行的端口 默认26379
port 26379
# By default Redis Sentinel does not run as a daemon. Use 'yes' if you need it.
# Note that Redis will write a pid file in /var/run/redis-sentinel.pid when
# daemonized.
# 是否已守护进程模式运行
daemonize no
# When running daemonized, Redis Sentinel writes a pid file in
# /var/run/redis-sentinel.pid by default. You can specify a custom pid file
# location here.
pidfile /var/run/redis-sentinel.pid
# Specify the log file name. Also the empty string can be used to force
# Sentinel to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /dev/null
# 指定日志文件
logfile ""
# sentinel announce-ip <ip>
# sentinel announce-port <port>
#
# The above two configuration directives are useful in environments where,
# because of NAT, Sentinel is reachable from outside via a non-local address.
#
# When announce-ip is provided, the Sentinel will claim the specified IP address
# in HELLO messages used to gossip its presence, instead of auto-detecting the
# local address as it usually does.
#
# Similarly when announce-port is provided and is valid and non-zero, Sentinel
# will announce the specified TCP port.
#
# The two options don't need to be used together, if only announce-ip is
# provided, the Sentinel will announce the specified IP and the server port
# as specified by the "port" option. If only announce-port is provided, the
# Sentinel will announce the auto-detected local IP and the specified port.
#
# Example:
#
# sentinel announce-ip 1.2.3.4
# dir <working-directory>
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
# 哨兵sentinel的工作目录
dir /tmp
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least <quorum> sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Replicas are auto-discovered, so you don't need to specify replicas in
# any way. Sentinel itself will rewrite this configuration file adding
# the replicas using additional configuration options.
# Also note that the configuration file is rewritten when a
# replica is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
#
# sentinel monitor是哨兵最核心的配置,其中:
# masterName指定了主节点名称,
# masterIp和masterPort指定了主节点地址,
# quorum是判断主节点客观下线的哨兵数量阈值:当判定主节点下线的哨兵数量达到quorum时,对主节点进行客观下线。
# 建议取值为哨兵数量的一半加1。
sentinel monitor mymaster 127.0.0.1 6379 2
# sentinel auth-pass <master-name> <password>
#
# Set the password to use to authenticate with the master and replicas.
# Useful if there is a password set in the Redis instances to monitor.
#
# Note that the master password is also used for replicas, so it is not
# possible to set a different password in masters and replicas instances
# if you want to be able to monitor these instances with Sentinel.
#
# However you can have Redis instances without the authentication enabled
# mixed with Redis instances requiring the authentication (as long as the
# password set is the same for all the instances requiring the password) as
# the AUTH command will have no effect in Redis instances with authentication
# switched off.
#
# 当在Redis实例中开启了requirepass foobared 授权密码 这样所有连接Redis实例的客户端都要提供密码
# 设置哨兵sentinel 连接主从的密码 注意必须为主从设置一样的验证密码
# Example:
#
# sentinel auth-pass mymaster MySUPER--secret-0123passw0rd
# sentinel auth-user <master-name> <username>
#
# This is useful in order to authenticate to instances having ACL capabilities,
# that is, running Redis 6.0 or greater. When just auth-pass is provided the
# Sentinel instance will authenticate to Redis using the old "AUTH <pass>"
# method. When also an username is provided, it will use "AUTH <user> <pass>".
# In the Redis servers side, the ACL to provide just minimal access to
# Sentinel instances, should be configured along the following lines:
#
# user sentinel-user >somepassword +client +subscribe +publish \
# +ping +info +multi +slaveof +config +client +exec on
# sentinel down-after-milliseconds <master-name> <milliseconds>
#
# Number of milliseconds the master (or any attached replica or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.
#
# 主观下线的判断配置:哨兵使用ping命令对其他节点进行心跳检测,如果其他节点超过down-after-milliseconds配置的时间没有回复,哨兵就会将其进行主观下线。该配置对主节点、从节点和哨兵节点的主观下线判定都有效。
# down-after-milliseconds的默认值是30000,即30s;
# 可以根据不同的网络环境和应用要求来调整:值越大,对主观下线的判定会越宽松,好处是误判的可能性小,坏处是故障发现和故障转移的时间变长,客户端等待的时间也会变长。
# 例如,如果应用对可用性要求较高,则可以将值适当调小,当故障发生时尽快完成转移;如果网络环境相对较差,可以适当提高该阈值,避免频繁误判
sentinel down-after-milliseconds mymaster 30000
# requirepass <password>
#
# You can configure Sentinel itself to require a password, however when doing
# so Sentinel will try to authenticate with the same password to all the
# other Sentinels. So you need to configure all your Sentinels in a given
# group with the same "requirepass" password. Check the following documentation
# for more info: https://redis.io/topics/sentinel
# sentinel parallel-syncs <master-name> <numreplicas>
#
# How many replicas we can reconfigure to point to the new replica simultaneously
# during the failover. Use a low number if you use the replicas to serve query
# to avoid that all the replicas will be unreachable at about the same
# time while performing the synchronization with the master.
#
# 故障转移之后从节点的复制有关配置:它规定了每次向新的主节点发起复制操作的从节点个数。
# 例如,假设主节点切换完成之后,有3个从节点要向新的主节点发起复制;
# 如果parallel-syncs=1,则从节点会一个一个开始复制;
# 如果parallel-syncs=3,则3个从节点会一起开始复制。
# parallel-syncs取值越大,从节点完成复制的时间越快,但是对主节点的网络负载、硬盘负载造成的压力也越大;应根据实际情况设置。
# 例如,如果主节点的负载较低,而从节点对服务可用的要求较高,可以适量增加parallel-syncs取值。parallel-syncs的默认值是1
sentinel parallel-syncs mymaster 1
# sentinel failover-timeout <master-name> <milliseconds>
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
# already tried against the same master by a given Sentinel, is two
# times the failover timeout.
#
# - The time needed for a replica replicating to a wrong master according
# to a Sentinel current configuration, to be forced to replicate
# with the right master, is exactly the failover timeout (counting since
# the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
# did not produced any configuration change (SLAVEOF NO ONE yet not
# acknowledged by the promoted replica).
#
# - The maximum time a failover in progress waits for all the replicas to be
# reconfigured as replicas of the new master. However even after this time
# the replicas will be reconfigured by the Sentinels anyway, but not with
# the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.
#
# 故障转移超时的判断有关配置,但是该参数不是用来判断整个故障转移阶段的超时,而是其几个子阶段的超时。
# 例如如果主节点晋升从节点时间超过timeout,或从节点向新的主节点发起复制操作的时间(不包括复制数据的时间)超过timeout,都会导致故障转移超时失败。
# failover-timeout的默认值是180000,即180s;如果超时,则下一次该值会变为原来的2倍。
sentinel failover-timeout mymaster 180000
# SCRIPTS EXECUTION
#
# sentinel notification-script and sentinel reconfig-script are used in order
# to configure scripts that are called to notify the system administrator
# or to reconfigure clients after a failover. The scripts are executed
# with the following rules for error handling:
#
# If script exits with "1" the execution is retried later (up to a maximum
# number of times currently set to 10).
#
# If script exits with "2" (or an higher value) the script execution is
# not retried.
#
# If script terminates because it receives a signal the behavior is the same
# as exit code 1.
#
# A script has a maximum running time of 60 seconds. After this limit is
# reached the script is terminated with a SIGKILL and the execution retried.
# NOTIFICATION SCRIPT
#
# sentinel notification-script <master-name> <script-path>
#
# Call the specified notification script for any sentinel event that is
# generated in the WARNING level (for instance -sdown, -odown, and so forth).
# This script should notify the system administrator via email, SMS, or any
# other messaging system, that there is something wrong with the monitored
# Redis systems.
#
# The script is called with just two arguments: the first is the event type
# and the second the event description.
#
# The script must exist and be executable in order for sentinel to start if
# this option is provided.
#
# Example:
#
# sentinel notification-script mymaster /var/redis/notify.sh
#
# 通知型脚本:当sentinel有任何警告级别的事件发生时(比如说redis实例的主观失效和客观失效等等),将会去调用这个脚本。
# 这时这个脚本应该通过邮件,SMS等方式去通知系统管理员关于系统不正常运行的信息。调用该脚本时,将传给脚本两个参数:第一个是事件的类型,第二个是事件的描述。
# 如果sentinel.conf配置文件中配置了这个脚本路径,那么必须保证这个脚本存在于这个路径,并且是可执行的,否则sentinel无法正常启动成功。
#
# 对于脚本的运行结果有以下规则:
# 若脚本执行后返回1,那么该脚本稍后将会被再次执行,重复次数目前默认为10
# 若脚本执行后返回2,或者比2更高的一个返回值,脚本将不会重复执行。
# 如果脚本在执行过程中由于收到系统中断信号被终止了,则同返回值为1时的行为相同。
# 一个脚本的最大执行时间为60s,如果超过这个时间,脚本将会被一个SIGKILL信号终止,之后重新执行。
# CLIENTS RECONFIGURATION SCRIPT
# sentinel client-reconfig-script <master-name> <script-path>
#
# When the master changed because of a failover a script can be called in
# order to perform application-specific tasks to notify the clients that the
# configuration has changed and the master is at a different address.
#
# The following arguments are passed to the script:
#
# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
#
# <state> is currently always "failover"
# <role> is either "leader" or "observer"
#
# The arguments from-ip, from-port, to-ip, to-port are used to communicate
# the old address of the master and the new address of the elected replica
# (now a master).
#
# This script should be resistant to multiple invocations.
#
# Example:
#
# sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
# SECURITY
#
# By default SENTINEL SET will not be able to change the notification-script
# and client-reconfig-script at runtime. This avoids a trivial security issue
# where clients can set the script to anything and trigger a failover in order
# to get the program executed.
sentinel deny-scripts-reconfig yes
# REDIS COMMANDS RENAMING
#
# Sometimes the Redis server has certain commands, that are needed for Sentinel
# to work correctly, renamed to unguessable strings. This is often the case
# of CONFIG and SLAVEOF in the context of providers that provide Redis as
# a service, and don't want the customers to reconfigure the instances outside
# of the administration console.
#
# In such case it is possible to tell Sentinel to use different command names
# instead of the normal ones. For example if the master "mymaster", and the
# associated replicas, have "CONFIG" all renamed to "GUESSME", I could use:
#
# SENTINEL rename-command mymaster CONFIG GUESSME
#
# After such configuration is set, every time Sentinel would use CONFIG it will
# use GUESSME instead. Note that there is no actual need to respect the command
# case, so writing "config guessme" is the same in the example above.
#
# SENTINEL SET can also be used in order to perform this configuration at runtime.
#
# In order to set a command back to its original name (undo the renaming), it
# is possible to just rename a command to itsef:
#
# SENTINEL rename-command mymaster CONFIG CONFIG