Orchestrator是GO编写的MySQL高可用性和复制拓扑管理工具,支持复制拓扑结构的调整,自动故障转移和手动主从切换等。提供Web界面展示MySQL复制的拓扑关系及状态,也可以在Web上更改MySQL的复制关系和部分配置信息,同时也提供命令行和api接口,方便运维管理。相比较MHA来看最重要的是解决了管理节点的单点问题,其通过raft协议保证本身的高可用
安装MySQL数据库
参考MySQL安装部署(一)
安装Orchestrator服务
下载Orchestrator
Download Orchestrator
安装依赖
1
2
|
$ yum install lib64onig2-5.9.2-4-mdv2012.0.x86_64 –y
$ yum install jq-1.5-1.el7.x86_64 –y
|
安装Orchestrator
1
2
|
$ rpm -ivh orchestrator-3.1.2-1.x86_64.rpm
$ rpm –ivh orchestrator-client-3.1.2-1.x86_64.rpm
|
配置环境变量
1
2
3
|
$ echo “export PATH=$PATH:/usr/local/orchestrator” >> ~/.bash_profile
$ echo “export ORCHESTRATOR_API=”localhost:3000/api” >> ~/.bash_profile
$ source ~/.bash_profile
|
配置数据库
Orchestrator的相关配置信息都保存在数据库中,可以使用MySQL或者sqlite,这里我们采用MySQL来存储
1
2
3
|
root@(none) 15:14> create database orchestrator;
root@(none) 15:14> grant all on orchestrator.* to 'orchestrator'@'10.0.%' identified by 'Abcd123#';
root@(none) 15:14> flush privileges;
|
配置参数文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
$ cat orchestrator.conf.json
{
"Debug":true,
"ListenAddress":":18888",
"MySQLTopologyUser":"admin",
"MySQLTopologyPassword":"Abcd123#",
"MySQLOrchestratorHost": "10.0.137.103",
"MySQLOrchestratorPort": 33006,
"MySQLOrchestratorDatabase": "orchestrator",
"MySQLOrchestratorUser": "orchestrator",
"MySQLOrchestratorPassword": "Abcd123#",
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],
"FailureDetectionPeriodBlockMinutes": 60,
"RecoveryPeriodBlockSeconds": 3600,
“RaftEnabled”:true,
“RaftDatadir”:”/var/lib/orchestrator”,
“RaftBind”:10.0.137.103,
“DefaultRaftPort”:10008,
“RaftNodes”:[“10.0.137.103”,”10.0.137.104”,”10.0.137.105”]
}
|
- ListenAddress:WEB控制台访问端口
- MySQLTopologPassword:后端数据库用户
- MySQLTopologPassword:后端数据库用户密码
- RecoverMasterClusterFilters:自动切换配置
- RecoverIntermediateMasterClusterFilters:自动切换配置
- FailureDetectionPeriodBlockMinutes和RecoveryPeriodBlockSeconds都为1个小时,表示如果发生切换之后,一个小时之内,如果主库再次故障将不被检测到,也不会触发切换。
- Orchestrator自身高可用是通过Raft协议来实现的,因此需要配置相关参数开启Raft并设置Leader以及成员。
启动WEB控制台
1
|
$ cd /usr/local/orchestrator && ./orchestrator --config=./orchestrator.conf http &
|
添加后端数据节点
1
|
orchestrator --config=/usr/local/orchestrator/orchestrator.conf.json -c discover -i t-luhx01-v-szzb:33006
|
访问WEB控制台(http://ip:18888)
关闭主节点,尝试自动切换(旧主节点会被隔离,需要手动加入),在自动切换后,需要手动确认知晓该切换记录,否则后续的切换将会被阻塞,出现如下错误
手动确认可以通过WEB上audit->recovery去查看记录
也可以通过下列命令确认
1
|
$ orchestrator-client -c ack-all-recoveries --reason='yes'
|
VIP切换脚本
在Orchestrator配置文件中PostFailoverProcesses模块设置如下语句
1
2
3
4
|
"PostFailoverProcesses": [
"echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log",
"/usr/local/bin/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
],
|
orch_hook.sh(注意替换VIP和网卡)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
|
#!/bin/bash
isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchestrator"
export MYSQL_PWD="xxxpassxxx"
logfile="/var/log/orch_hook.log"
# list of clusternames
clusternames=(t-luhx01-v-szzb t-luhx02-v-szzb t-luhx03-v-szzb)
# clustername=( interface IP user Inter_IP)
luhxdb=( ens192 "10.0.139.201" root "10.0.139.201")
if [[ $isitdead == "DeadMaster" ]]; then
array=$cluster
interface=$array[0]
IP=$array[1]
user=$array[2]
if [ ! -z ${!IP} ] ; then
echo $(date)
echo "Revocering from: $isitdead"
echo "New master is: $newmaster"
echo "/usr/local/bin/orch_vip.sh -d 1 -n $newmaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
/usr/local/bin/orch_vip.sh -d 1 -n $newmaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster
else
echo "Cluster does not exist!" | tee $logfile
fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then
array=$cluster
interface=$array[0]
IP=$array[3]
user=$array[2]
slavehost=`echo $5 | cut -d":" -f1`
echo $(date)
echo "Revocering from: $isitdead"
echo "New intermediate master is: $slavehost"
echo "/usr/local/bin/orch_vip.sh -d 1 -n $slavehost -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
/usr/local/bin/orch_vip.sh -d 1 -n $slavehost -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster
elif [[ $isitdead == "DeadIntermediateMaster" ]]; then
array=$cluster
interface=$array[0]
IP=$array[3]
user=$array[2]
slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`
echo $(date)
echo "Revocering from: $isitdead"
echo "New intermediate master is: $newintermediatemaster"
echo "/usr/local/bin/orch_vip.sh -d 1 -n $newintermediatemaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
/usr/local/bin/orch_vip.sh -d 1 -n $newintermediatemaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster
fi
|
orch_vip.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
|
#!/bin/bash
emailaddress="email@example.com"
sendmail=0
function usage {
cat << EOF
usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]
OPTIONS:
-h Show this message
-o string Old master hostname or IP address
-d int If master is dead should be 1 otherweise it is 0
-s string SSH options
-n string New master hostname or IP address
-i string Interface exmple eth0:1
-I string Virtual IP
-u string SSH user
EOF
}
while getopts ho:d:s:n:i:I:u: flag; do
case $flag in
o)
orig_master="$OPTARG";
;;
d)
isitdead="${OPTARG}";
;;
s)
ssh_options="${OPTARG}";
;;
n)
new_master="$OPTARG";
;;
i)
interface="$OPTARG";
;;
I)
vip="$OPTARG";
;;
u)
ssh_user="$OPTARG";
;;
h)
usage;
exit 0;
;;
*)
usage;
exit 1;
;;
esac
done
if [ $OPTIND -eq 1 ]; then
echo "No options were passed";
usage;
fi
shift $(( OPTIND - 1 ));
# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)
# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*} "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*} "
vip_stop() {
rc=0
# ensure the vip is removed
$ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
"[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
rc=$?
return $rc
}
vip_start() {
rc=0
# ensure the vip is added
# this command should exit with failure if we are unable to add the vip
# if the vip already exists always exit 0 (whether or not we added it)
$ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
"[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
rc=$?
$cmd_local_arp_fix
return $rc
}
vip_status() {
$arping -c 1 -I ${interface} ${vip%/*}
if ping -c 1 -W 1 "$vip"; then
return 0
else
return 1
fi
}
if [[ $isitdead == 0 ]]; then
echo "Online failover"
if vip_stop; then
if vip_start; then
echo "$vip is moved to $new_master."
if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null ; fi
else
echo "Can't add $vip on $new_master!"
if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
exit 1
fi
else
echo $rc
echo "Can't remove the $vip from orig_master!"
if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
exit 1
fi
elif [[ $isitdead == 1 ]]; then
echo "Master is dead, failover"
# make sure the vip is not available
if vip_status; then
if vip_stop; then
if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null ; fi
else
if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null ; fi
exit 1
fi
fi
if vip_start; then
echo "$vip is moved to $new_master."
if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null ; fi
else
echo "Can't add $vip on $new_master!"
if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
exit 1
fi
else
echo "Wrong argument, the master is dead or live?"
fi
|
注意:该脚本仅提供切换功能,第一次需要手动挂载VIP。另外需要轮流切换为每个节点的群集都设置相同的群集别名,这里为LUHXDB
附录
命令行与API
列出所有群集
1
|
orchestrator-client -c clusters
|
列出所有群集别名
1
|
orchestrator-client -c clusters-alias
|
发现实例
1
|
orchestrator-client -c discover -i t-luhx01-v-szzb:33006
|
遗忘实例
1
|
orchestrator-client -c forget -i t-luhx02-v-szzb:33006
|
打印指定群集拓扑
1
|
orchestrator-client -c topology-tabulated -i t-luhx03-v-szzb:33006
|
查看使用的API
1
|
orchestrator-client -c which-api
|
搜索实例
1
|
orchestrator-client -c search -i luhx
|
打印出集群中可作为pt-online-schema-change可操作的建康副本
1
|
orchestrator-client -c which-cluster-osc-running-replicas -i luhxdb
|
1
2
|
将集群的主提交到KV存储,可用于服务自动发现
orchestrator-client -c submit-masters-to-kv-stores
|
迁移一个从库到另一个实例上
1
|
orchestrator-client -c relocate -i t-luhx01-v-szzb:33006 -d t-test-v-szzb:33006
|
迁移一个实例所有从库到另一个实例上
1
|
orchestrator-client -c relocate-replicas -i t-luhx01-v-szzb:33006 -d t-test-v-szzb:33006
|
创建双主复制
1
|
Orchestrator-client -c make-to-master -i t-luhx01-v-szzb:33006
|
提升实例权重,切换时会优先成为主(有效期一个小时)
1
|
orchestrator-client -c register-candidate -i t-luhx02-v-szzb:33006 –promotion-rule prefer
|
指定实例停止复制
1
|
orchestrator-client -c stop-replica-nice -i t-luhx02-v-szzb
|
指定实例重启复制
1
|
Orchestrator-client -c restart-replica -i t-luhx02-v-szzb
|
手动执行恢复,指定一个宕机的实例
1
|
orchestrator-client -c recover -i t-luhx01-v-szzb:33006
|
优雅的进行主从切换
1
|
orchestrator-client -c graceful-master-takeover -a t-luhx01-v-szzb:33006 -d t-luhx03-v-szzb:33006
|
手动强制恢复
1
|
orchestrator-client -c force-master-failover -i t-luhx01-v-szzb:33006
|
强行丢弃master并制定一个实例,旧主独立,新主作为master
1
|
orchestrator-client -c force-master-takeover -i t-luhx01-v-szzb:33006 -d t-luhx02-v-szzb:33006
|
确认群集恢复理由
1
|
orchestrator-client -c ack-all-recoveries --reason=’yes’
|
Orchestrator Hook
①"OnFailureDetectionProcesses": [] —检测故障时执行
②"PreGracefulTakeoverProcesses":[] —在主变为只读节点之前执行
③"PreFailoverProcesses":[] —在执行恢复操作之前执行
④"PostMasterFailoverProcesses":[] —在主恢复成功结束时执行
⑤"PostFailoverProcesses":[] —在任何成功的恢复结束时执行
⑥"PostUnsuccessfulFailoverProcesses":[] —在任何不成功恢复结束时执行
⑦"PostIntermediateMasterFailoverProcesses":[] —在成功的中间恢复结束时执行
⑧"PostGracefulTakeoverProcesses":[] —在旧主位于新晋升的主之后执行
情形一:主库宕机,自动切换
① –> ① –> ③ –> ④ –> ⑤
情形二:优雅的主从切换
② –> ① –> ③ –> ④ –> ⑤ –> ⑦
情形三:手动恢复,当从库宕机或处于维护模式,此时主机宕机不会进行failover,需要手动恢复
① –> ① –> ③ –> ④ –> ⑤
情形四:手动强制恢复
① –> ③ –> ① –> ④ –> ⑤
参考链接
MySQL高可用复制管理工具 —— Orchestrator介绍