MySQL安装部署之Orchestrator

Orchestrator是GO编写的MySQL高可用性和复制拓扑管理工具,支持复制拓扑结构的调整,自动故障转移和手动主从切换等。提供Web界面展示MySQL复制的拓扑关系及状态,也可以在Web上更改MySQL的复制关系和部分配置信息,同时也提供命令行和api接口,方便运维管理。相比较MHA来看最重要的是解决了管理节点的单点问题,其通过raft协议保证本身的高可用

安装MySQL数据库

参考MySQL安装部署(一)

安装Orchestrator服务

下载Orchestrator Download Orchestrator

安装依赖

1
2
$ yum install lib64onig2-5.9.2-4-mdv2012.0.x86_64 –y
$ yum install jq-1.5-1.el7.x86_64 –y

安装Orchestrator

1
2
$ rpm -ivh orchestrator-3.1.2-1.x86_64.rpm
$ rpm –ivh orchestrator-client-3.1.2-1.x86_64.rpm

配置环境变量

1
2
3
$ echo export PATH=$PATH:/usr/local/orchestrator >> ~/.bash_profile
$ echo export ORCHESTRATOR_API=localhost:3000/api >> ~/.bash_profile
$ source ~/.bash_profile

配置数据库 Orchestrator的相关配置信息都保存在数据库中,可以使用MySQL或者sqlite,这里我们采用MySQL来存储

1
2
3
root@(none) 15:14> create database orchestrator;
root@(none) 15:14> grant all on orchestrator.* to 'orchestrator'@'10.0.%' identified by 'Abcd123#';
root@(none) 15:14> flush privileges;

配置参数文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
$ cat orchestrator.conf.json
{
  "Debug":true,
  "ListenAddress":":18888",
  "MySQLTopologyUser":"admin",
  "MySQLTopologyPassword":"Abcd123#",
  "MySQLOrchestratorHost": "10.0.137.103",
  "MySQLOrchestratorPort": 33006,
  "MySQLOrchestratorDatabase": "orchestrator",
  "MySQLOrchestratorUser": "orchestrator",
  "MySQLOrchestratorPassword": "Abcd123#",
  "RecoverMasterClusterFilters": ["*"],
  "RecoverIntermediateMasterClusterFilters": ["*"],
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPeriodBlockSeconds": 3600,
  RaftEnabled”:true,
  RaftDatadir”:”/var/lib/orchestrator,
  RaftBind”:10.0.137.103,
  DefaultRaftPort”:10008,
  RaftNodes”:[10.0.137.103,10.0.137.104,10.0.137.105]
}
  • ListenAddress:WEB控制台访问端口
  • MySQLTopologPassword:后端数据库用户
  • MySQLTopologPassword:后端数据库用户密码
  • RecoverMasterClusterFilters:自动切换配置
  • RecoverIntermediateMasterClusterFilters:自动切换配置
  • FailureDetectionPeriodBlockMinutes和RecoveryPeriodBlockSeconds都为1个小时,表示如果发生切换之后,一个小时之内,如果主库再次故障将不被检测到,也不会触发切换。
  • Orchestrator自身高可用是通过Raft协议来实现的,因此需要配置相关参数开启Raft并设置Leader以及成员。

启动WEB控制台

1
$ cd /usr/local/orchestrator && ./orchestrator --config=./orchestrator.conf http &

添加后端数据节点

1
orchestrator --config=/usr/local/orchestrator/orchestrator.conf.json -c discover -i t-luhx01-v-szzb:33006

访问WEB控制台(http://ip:18888) Orchestrator_1

关闭主节点,尝试自动切换(旧主节点会被隔离,需要手动加入),在自动切换后,需要手动确认知晓该切换记录,否则后续的切换将会被阻塞,出现如下错误 Orchestrator_2

手动确认可以通过WEB上audit->recovery去查看记录 Orchestrator_3

也可以通过下列命令确认

1
$ orchestrator-client -c ack-all-recoveries --reason='yes'

VIP切换脚本 在Orchestrator配置文件中PostFailoverProcesses模块设置如下语句

1
2
3
4
"PostFailoverProcesses": [
    "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log",
    "/usr/local/bin/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
  ],

orch_hook.sh(注意替换VIP和网卡)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#!/bin/bash


isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchestrator"
export MYSQL_PWD="xxxpassxxx"

logfile="/var/log/orch_hook.log"

# list of clusternames
clusternames=(t-luhx01-v-szzb t-luhx02-v-szzb t-luhx03-v-szzb)

# clustername=( interface IP user Inter_IP)
luhxdb=( ens192 "10.0.139.201" root "10.0.139.201")

if [[ $isitdead == "DeadMaster" ]]; then

	array=$cluster
	interface=$array[0]
	IP=$array[1]
	user=$array[2]

	if [ ! -z ${!IP} ] ; then

		echo $(date)
		echo "Revocering from: $isitdead"
		echo "New master is: $newmaster"
		echo "/usr/local/bin/orch_vip.sh -d 1 -n $newmaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
		/usr/local/bin/orch_vip.sh -d 1 -n $newmaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster

	else

		echo "Cluster does not exist!" | tee $logfile

	fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

	array=$cluster
	interface=$array[0]
	IP=$array[3]
	user=$array[2]
	slavehost=`echo $5 | cut -d":" -f1`

	echo $(date)
	echo "Revocering from: $isitdead"
	echo "New intermediate master is: $slavehost"
	echo "/usr/local/bin/orch_vip.sh -d 1 -n $slavehost -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
	/usr/local/bin/orch_vip.sh -d 1 -n $slavehost -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster


elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

	array=$cluster
	interface=$array[0]
	IP=$array[3]
	user=$array[2]
	slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
	showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
	newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

	echo $(date)
	echo "Revocering from: $isitdead"
	echo "New intermediate master is: $newintermediatemaster"
	echo "/usr/local/bin/orch_vip.sh -d 1 -n $newintermediatemaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
	/usr/local/bin/orch_vip.sh -d 1 -n $newintermediatemaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster

fi

orch_vip.sh

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
#!/bin/bash

emailaddress="email@example.com"
sendmail=0

function usage {
  cat << EOF
 usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]
 
 OPTIONS:
    -h        Show this message
    -o string Old master hostname or IP address 
    -d int    If master is dead should be 1 otherweise it is 0
    -s string SSH options
    -n string New master hostname or IP address
    -i string Interface exmple eth0:1
    -I string Virtual IP
    -u string SSH user
EOF

}

while getopts ho:d:s:n:i:I:u: flag; do
  case $flag in
    o)
      orig_master="$OPTARG";
      ;;
    d)
      isitdead="${OPTARG}";
      ;;
    s)
      ssh_options="${OPTARG}";
      ;;
    n)
      new_master="$OPTARG";
      ;;
    i)
      interface="$OPTARG";
      ;;
    I)
      vip="$OPTARG";
      ;;
    u)
      ssh_user="$OPTARG";
      ;;
    h)
      usage;
      exit 0;
      ;;
    *)
      usage;
      exit 1;
      ;;
  esac
done


if [ $OPTIND -eq 1 ]; then 
    echo "No options were passed"; 
    usage;
fi

shift $(( OPTIND - 1 ));

# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)

# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "

vip_stop() {
    rc=0

    # ensure the vip is removed
    $ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
    "[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    return $rc
}

vip_start() {
    rc=0

    # ensure the vip is added
    # this command should exit with failure if we are unable to add the vip
    # if the vip already exists always exit 0 (whether or not we added it)
    $ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
     "[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    $cmd_local_arp_fix
    return $rc
}

vip_status() {
    $arping -c 1 -I ${interface} ${vip%/*}   
    if ping -c 1 -W 1 "$vip"; then
        return 0
    else
        return 1
    fi
}

if [[ $isitdead == 0 ]]; then
    echo "Online failover"
    if vip_stop; then 
        if vip_start; then
            echo "$vip is moved to $new_master."
            if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            echo "Can't add $vip on $new_master!" 
            if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    else
        echo $rc
        echo "Can't remove the $vip from orig_master!"
        if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
        exit 1
    fi


elif [[ $isitdead == 1 ]]; then
    echo "Master is dead, failover"
    # make sure the vip is not available 
    if vip_status; then 
        if vip_stop; then
            if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    fi

    if vip_start; then
          echo "$vip is moved to $new_master."
          if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi

    else
          echo "Can't add $vip on $new_master!" 
          if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
          exit 1
    fi
else
    echo "Wrong argument, the master is dead or live?"

fi

注意:该脚本仅提供切换功能,第一次需要手动挂载VIP。另外需要轮流切换为每个节点的群集都设置相同的群集别名,这里为LUHXDB orchestrator_4

附录

命令行与API 列出所有群集

1
orchestrator-client -c clusters

列出所有群集别名

1
orchestrator-client -c clusters-alias

发现实例

1
orchestrator-client -c discover -i t-luhx01-v-szzb:33006

遗忘实例

1
orchestrator-client -c forget -i t-luhx02-v-szzb:33006

打印指定群集拓扑

1
orchestrator-client -c topology-tabulated -i t-luhx03-v-szzb:33006

查看使用的API

1
orchestrator-client -c which-api

搜索实例

1
orchestrator-client -c search -i luhx

打印出集群中可作为pt-online-schema-change可操作的建康副本

1
orchestrator-client -c which-cluster-osc-running-replicas -i luhxdb
1
2
将集群的主提交到KV存储,可用于服务自动发现
orchestrator-client -c submit-masters-to-kv-stores

迁移一个从库到另一个实例上

1
orchestrator-client -c relocate -i t-luhx01-v-szzb:33006 -d t-test-v-szzb:33006

迁移一个实例所有从库到另一个实例上

1
orchestrator-client -c relocate-replicas -i t-luhx01-v-szzb:33006 -d t-test-v-szzb:33006

创建双主复制

1
Orchestrator-client -c make-to-master -i t-luhx01-v-szzb:33006

提升实例权重,切换时会优先成为主(有效期一个小时)

1
orchestrator-client -c register-candidate -i t-luhx02-v-szzb:33006 –promotion-rule prefer

指定实例停止复制

1
orchestrator-client -c stop-replica-nice -i t-luhx02-v-szzb

指定实例重启复制

1
Orchestrator-client -c restart-replica -i t-luhx02-v-szzb

手动执行恢复,指定一个宕机的实例

1
orchestrator-client -c recover -i t-luhx01-v-szzb:33006

优雅的进行主从切换

1
orchestrator-client -c graceful-master-takeover -a t-luhx01-v-szzb:33006 -d t-luhx03-v-szzb:33006

手动强制恢复

1
orchestrator-client -c force-master-failover -i t-luhx01-v-szzb:33006

强行丢弃master并制定一个实例,旧主独立,新主作为master

1
orchestrator-client -c force-master-takeover -i t-luhx01-v-szzb:33006 -d t-luhx02-v-szzb:33006

确认群集恢复理由

1
orchestrator-client -c ack-all-recoveries --reason=’yes’

Orchestrator Hook ①"OnFailureDetectionProcesses": [] —检测故障时执行 ②"PreGracefulTakeoverProcesses":[] —在主变为只读节点之前执行 ③"PreFailoverProcesses":[] —在执行恢复操作之前执行 ④"PostMasterFailoverProcesses":[] —在主恢复成功结束时执行 ⑤"PostFailoverProcesses":[] —在任何成功的恢复结束时执行 ⑥"PostUnsuccessfulFailoverProcesses":[] —在任何不成功恢复结束时执行 ⑦"PostIntermediateMasterFailoverProcesses":[] —在成功的中间恢复结束时执行 ⑧"PostGracefulTakeoverProcesses":[] —在旧主位于新晋升的主之后执行

情形一:主库宕机,自动切换 ① –> ① –> ③ –> ④ –> ⑤

情形二:优雅的主从切换 ② –> ① –> ③ –> ④ –> ⑤ –> ⑦

情形三:手动恢复,当从库宕机或处于维护模式,此时主机宕机不会进行failover,需要手动恢复 ① –> ① –> ③ –> ④ –> ⑤

情形四:手动强制恢复 ① –> ③ –> ① –> ④ –> ⑤

参考链接 MySQL高可用复制管理工具 —— Orchestrator介绍

Licensed under CC BY-NC-SA 4.0
comments powered by Disqus