项目概述
Otter是基于数据库增量日志解析,准实时同步到本地机房或异地机房的MySQL/Oracle数据库。

- Otter基于canal实现数据库增量日志数据的获取
- manager运行过程中推送同步配置到node节点
- node节点将同步状态反馈到manager
- 基于zookeeper,解决分布式状态调度,允许多个node节点协同工作
otter的S/E/T/L stage阶段模型
Tips:select阶段为解决数据来源的差异性,比如接入canal获取增量数据,Extract/Transform/Load 阶段类似于数据仓库的ETL模型,具体可为数据join,数据转化,数据Load
Otter应用场景
- 异构数据库同步:目前源端只支持MySQL,目标端可以是MySQL或者Oracle,功能主要取决于canal
- 单机房同步:RTT<1ms,可用于数据库版本升级、数据迁移、异步二级索引
- 异地机房同步:可实现机房容灾
- 双向同步:避免回环算法以及数据最终一致性算法
- 文件同步:进行数据复制的同时,复制相关的图片等文件
名词说明

- pipeline:从源端到目标端的整个过程描述,主要由一些同步映射过程组成
- channel:同步通道,单向同步中有一个,双向同步有两个
- DataMediaPair:根据业务表定义映射关系
- DataMedia:抽象的数据介质概念,可以理解为数据表
- DataMediaSource:抽象的数据介质源信息,补充描述DataMedia
- ColumnPair:定义字段映射关系
- ColumnGroup:定义字段映射组
- Node:处理同步过程的工作节点,对应一个jvm
项目部署
IP |
角色 |
10.0.139.161 |
Otter Manager,MySQL(配置库),Zookeeper集群 |
10.0.139.162 |
Otter Node,MySQL(数据节点),Zookeeper集群 |
10.0.139.163 |
Otter Node,MySQL(数据节点),Zookeeper集群 |
MySQL部署步骤就省略了,其中要求数据库开启binlog,并且binlog_format必须为ROW模式
Zookeeper集群部署
解压安装包
1
2
|
$ tar -xvf apache-zookeeper-3.5.5-bin.tar.gz
$ mv apache-zookeeper-3.5.5-bin /usr/local/zookeeper
|
配置环境变量
1
2
|
$ echo "export PATH=$PATH:/usr/local/zookeeper/bin" >> /etc/profile
$ echo "export ZOOKEEPER_HOME=/usr/local/zookeeper"
|
创建数据目录
1
2
|
$ mkdir /service/zookeeper/data -p
$ mkdir /service/zookeeper/logs -p
|
编辑配置文件
1
2
3
4
5
6
7
8
9
10
|
$ cat /usr/local/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/service/zookeeper/data
dataLogDir=/service/zookeeper/logs
clientPort=2181
server.1=10.0.139.161:2888:3888
server.2=10.0.139.162:2888:3888
server.3=10.0.139.163:2888:3888
|
Tips:在datadir下创建myid文件,将当前节点的编号写入进去(例如161对应的就是1)
启动服务
1
2
3
4
5
6
7
8
9
10
11
12
|
$ zkServer.sh start
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
$ zkServer.sh status
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader
|
Manager部署
解压安装包
1
2
|
$ mkdir /usr/local/otter-manager
$ tar -xvf manager.deployer-4.2.18.tar.gz -C /usr/local/otter-manager/
|
初始化manager系统表
1
2
|
root@(none) 16:04: source /media/mysql/otter-otter-4.2.18/manager/deployer/src/main/resources/sql/otter-manager-schema.sql
Query OK, 1 row affected (0.00 sec)
|
修改配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
$ cat /usr/local/otter-manager/conf/otter.properties
## otter manager domain name
otter.domainName = 10.0.139.161
## otter manager http port
otter.port = 8080
## jetty web config xml
otter.jetty = jetty.xml
## otter manager database config
otter.database.driver.class.name = com.mysql.jdbc.Driver
otter.database.driver.url = jdbc:mysql://10.0.139.161:3306/otter
otter.database.driver.username = root
otter.database.driver.password = Abcd123#
## otter communication port
otter.communication.manager.port = 1099
## otter communication payload size (default = 8388608)
otter.communication.payload = 8388608
## otter communication pool size
otter.communication.pool.size = 10
## default zookeeper address
otter.zookeeper.cluster.default = 10.0.139.161:2181
## default zookeeper sesstion timeout = 60s
otter.zookeeper.sessionTimeout = 60000
## otter arbitrate connect manager config
otter.manager.address = ${otter.domainName}:${otter.communication.manager.port}
## should run in product mode , true/false
otter.manager.productionMode = true
## self-monitor enable or disable
otter.manager.monitor.self.enable = true
## self-montir interval , default 120s
otter.manager.monitor.self.interval = 120
## auto-recovery paused enable or disable
otter.manager.monitor.recovery.paused = true
# manager email user config
otter.manager.monitor.email.host = smtp.gmail.com
otter.manager.monitor.email.username =
otter.manager.monitor.email.password =
otter.manager.monitor.email.stmp.port = 465
|
启动manager
1
|
$ sh /usr/local/otter-manager/bin/startup.sh
|
访问http://10.0.139.161:8090/login.htm,默认用户名密码为admin/admin

添加zookeeper集群

Node部署
在manager添加机器,添加完成会自动生成node对应的唯一标识nid

- 机器名称:可自定义
- 机器ip:对应node节点将要部署的机器ip,如果有多ip时,可选择其中一个ip进行暴露
- 机器端口:node数据通讯端口
- 下载端口:node数据下载端口
- 外部ip:可以通过外部ip走公网处理
- zookeeper集群:为提升通讯效率,不同机房的机器可选择就近的zookeeper集群
node节点进行跨机房传输时,会使用到HTTP多线程传输技术,目前主要依赖了aria2c做为其下载客户端,因此还需要部署一下aria2cDownLoad
1
2
3
4
5
6
|
$ tar -xvf aria2-1.19.0.tar.gz
$ cd aria2-1.19.0
$ ./configure --prefix=/usr/local/aria2
$ make && make install -j 4
$ echo " export PATH=$PATH:/usr/local/aria2/bin" >> /etc/profile
$ source /etc/profile
|
解压node安装包
1
2
|
$ mkdir /usr/local/otter-node
$ tar -xvf node.deployer-4.2.18.tar.gz -C /usr/local/otter-node/
|
nid配置(对应manager上的nid)
1
|
$ echo 1 > /usr/local/otter-node/conf/nid
|
配置修改
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
# otter node root dir
otter.nodeHome = ${user.dir}/../
## otter node dir
otter.htdocs.dir = ${otter.nodeHome}/htdocs
otter.download.dir = ${otter.nodeHome}/download
otter.extend.dir= ${otter.nodeHome}/extend
## default zookeeper sesstion timeout = 60s
otter.zookeeper.sessionTimeout = 60000
## otter communication payload size (default = 8388608)
otter.communication.payload = 8388608
## otter communication pool size
otter.communication.pool.size = 10
## otter arbitrate & node connect manager config
otter.manager.address = 10.0.139.161:1099
|
启动node
1
|
$ sh /usr/local/otter-node/bin/startup.sh
|

同步配置(双A同步,即两个节点都可以写入)
创建用户
1
2
3
4
5
6
7
8
|
root@(none) 10:23: create user 'canal'@'%' identified by 'Abcd123#';
Query OK, 0 rows affected (0.01 sec)
root@(none) 10:24: GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT on *.* to 'canal'@'%';
Query OK, 0 rows affected (0.01 sec)
root@(none) 10:24: flush privileges;
Query OK, 0 rows affected (0.01 sec)
|
数据源配置

数据表配置

添加canal

双向同步需要在node节点的数据库上配置数据表
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
/*
供 otter 使用, otter 需要对 retl.* 的读写权限,以及对业务表的读写权限
1. 创建database retl
*/
CREATE DATABASE retl;
/* 2. 用户授权 给同步用户授权 */
CREATE USER retl@'%' IDENTIFIED BY 'retl';
GRANT USAGE ON *.* TO `retl`@'%';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO `retl`@'%';
GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE ON `retl`.* TO `retl`@'%';
/* 业务表授权,这里可以限定只授权同步业务的表 */
GRANT SELECT, INSERT, UPDATE, DELETE ON *.* TO `retl`@'%';
/* 3. 创建系统表 */
USE retl;
DROP TABLE IF EXISTS retl.retl_buffer;
DROP TABLE IF EXISTS retl.retl_mark;
DROP TABLE IF EXISTS retl.xdual;
CREATE TABLE retl_buffer
(
ID BIGINT(20) AUTO_INCREMENT,
TABLE_ID INT(11) NOT NULL,
FULL_NAME varchar(512),
TYPE CHAR(1) NOT NULL,
PK_DATA VARCHAR(256) NOT NULL,
GMT_CREATE TIMESTAMP NOT NULL,
GMT_MODIFIED TIMESTAMP NOT NULL,
CONSTRAINT RETL_BUFFER_ID PRIMARY KEY (ID)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE retl_mark
(
ID BIGINT AUTO_INCREMENT,
CHANNEL_ID INT(11),
CHANNEL_INFO varchar(128),
CONSTRAINT RETL_MARK_ID PRIMARY KEY (ID)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE xdual (
ID BIGINT(20) NOT NULL AUTO_INCREMENT,
X timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (ID)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
/* 4. 插入初始化数据 */
INSERT INTO retl.xdual(id, x) VALUES (1,now()) ON DUPLICATE KEY UPDATE x = now();
|
添加channel,一个channel即看作一个同步任务

添加两个pipeline,其中一个需要设置为主节点,并且只支持一个节点进行DDL(高级设置),另外需要注意两个pipeline的select和load选择不一样

针对不同的pipeline配置数据库表映射关系

回到同步管理界面启动channel即可

检查pipeline对应的监控和日志

更多项目文档内容请参考项目Otter Wiki