Codis介绍

Redis的集群解决方案除了官方的Cluster,也有社区的Codis和Twemproxy。Codis是由国内豌豆荚团队开发的。

三种不同解决方案的差异

\ Codis Twemproxy Redis Cluster
Resharding without restarting cluster YES NO YES
pipeline YES YES NO
Hash targs for multi-key operations YES YES YES
Multi-key operations while resharding YES - No(details)
Redis clients supporting Any clients Any clients Clients have to support cluster protocol

Codis是用GO语言开发的,分为下列组件:

  • Codis Server:基于Redis开发的分支。增加了额外的数据结构以及slot相关数据迁移命令
  • Codis Proxy:Redis代理服务,除部分命令不支持外基于与原生Redis一致。一套Codis群集可以有多个Codis Proxy,多个Codis由Codis Dashboard保持同步
  • Codis Dashboard:集群管理工具,支持Codis Proxy、Codis Server的添加、删除以及数据迁移的操作。Codis Dashboard维护群集下所有Codis Proxy状态一致性,所有对集群的操作都必须由Codis Dashboard完成,一个群集只能有1个Codis Dashboard。
  • Codis Admin:集群管理的命令行工具,可以控制 Codis Proxy、Codis Dashboard状态以及访问外部存储、
  • Codis FE:集群管理界面。多个群集可共享同一个前端界面
  • Storage:为集群状态提供外部存储

Codis分片原理
Codis会把所有的Key分成1024个槽,这1024个槽就对应着Redis集群,Codis在内存中维护这1024个槽与redis实例的映射关系,这个值是可以手动设置的。Codis中的key分配算法是先将key进行CRC32得到一个32位的数字,然后再通过hash % 1024后得到一个余数,这个值就是key对应的槽。

Codis扩容
因为Codis只是一个中间代理,当需要扩容redis实例时,可以直接增加redis实例,在槽位分配时可以通过Codis Dashboard手动指定。Codis实现了slotsscan命令,可以扫描slot下面的key并迁移到新的Redis实例。在迁移时当前节点和新节点都会保存需要迁移槽位的信息,当槽位新增了key,Codis会强制迁移至新节点并通知后续新增的key都写入新节点。

Codis特点

  • 可以无缝迁移到Codis中
  • 可以动态扩容或缩容
  • 对业务完全透明
  • 支持多核心CPU,twemproxy只能单核
  • 部分redis命令不被支持,例如keys *
  • 支持group划分,group中可以设置主从并通过哨兵进行监控

不支持的Redis命令
Codis不支持部分Redis命令,具体请查看unsupport commands

Codis部署

资源清单

IP 服务 端口
10.240.204.157 Codis Server 10001/1002
Condis Sentinel 10003
Codis Proxy 11000/19000
Codis Dashboard(157) 18080
Codis FE(157) 18090
Zookeeper监听端口 21000
Zookeeper内部通讯端口 21001
Zookeeper选举端口 21002

GO安装

解压安装包

$ tar -xvf go1.8.5.linux-amd64.tar.gz -C /usr/local

添加环境变量

$ echo "export GOROOT=/usr/local/go" >> /etc/profile
$ echo "export PATH=$PATH:$GOROOT/bin" >> /etc/profile
$ source /etc/profile

查看GO版本

$ go version
go version go1.8.5 linux/amd64

Zookeeper安装

安装JDK

$ java -version
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-b10)
OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode)

解压zookeeper

$ tar -xvf zookeeper-3.4.14.tar.gz
$ mv zookeeper-3.4.14 /usr/local/zookeeper

创建相关目录

$ mkdir /service/zookeeper/data -p
$ mkdir /service/zookeeper/log -p

设置环境变量

$ echo "export ZOOKEEPER_HOME=/usr/local/zookeeper" >> /etc/profile
$ echo "export PATH=$PATH:$ZOOKEEPER_HOME/bin" >> /etc/profile
$ source /etc/profile

编辑配置文件(/usr/local/zookeeper/conf/zoo.cfg)

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/service/zookeeper/data
dataLogDir=/service/zookeeper/log
clientPort=21000
#maxClientCnxns=60
#autopurge.snapRetainCount=3
#autopurge.purgeInterval=1
server.1=10.240.204.157:21001:21002
server.2=10.240.204.165:21001:21002
server.3=10.240.204.149:21001:21002

配置文件参数说明

  • tickTime:zookeeper中使用的基本时间单元,单位为毫秒,默认为2000
  • initLimit:默认值为10,即tickTime的10倍,用于配置允许followers连接并同步leader的最大时间
  • syncLimit:默认值为5,即tickTime的5倍,用于配置leader和followers之间进行心跳检测的最大延迟时间
  • dataDir:zookeeper用来存储内存数据库快照的目录,并且除非指定其它目录,否则数据库更新的日志也将会存储在该目录下。
  • dataLogDir:日志目录,不设置默认为dataDir
  • clientPort:服务监听端口,默认为2181
  • maxClientCnxns:在socket级别限制单个客户端与单个服务器之间的并发连接数量,可以通过IP来区分不同客户端,默认值为60。设置为0则完全放开限制
  • autopurge.snapRetainCount:配置Zookeeper在自动清理的时候需要保留的数据文件快照的数量和对应的事务日志文件,默认值为3
  • autopurge.purgeInterval:用于配置zookeeper自动清理文件的频率,默认值为1表示开启自动清理
  • server.id:集群服务配置,21001为内部通讯端口,21002为选举端口

创建myid文件,对应zoo.cfg的server.id

[root@t-luhxdb01-p-szzb media]# echo "1" >/service/zookeeper/data/myid
[root@t-luhxdb02-p-szzb media]# echo "2" >/service/zookeeper/data/myid
[root@t-luhxdb03-p-szzb media]# echo "3" >/service/zookeeper/data/myid

启动zookeeper

$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

Codis Server安装

创建目录

$ mkdir /service/codis/ -p

解压安装包

$ tar -xvf codis3.2.2-go1.8.5-linux.tar.gz
$ mv codis3.2.2-go1.8.5-linux /usr/local/codis/

配置环境变量

$ echo "export PATH=$PATH:/usr/local/codis" >> /etc/profile
$ source /etc/profile

配置参数文件

$ cat /etc/codis/codis-server/codis.conf 
daemonize yes
port 10001
timeout 0
tcp-keepalive 60
loglevel notice
databases 6
protected-mode no
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "codis1.rdb"
dir "/service/codis"
pidfile "/service/codis/pid.file"
logfile "/service/codis/codis-server.log"
requirepass "Abcd123#"
maxmemory 24576000000
bind 10.240.204.157
repl-timeout 3600
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 0 0 0
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
slave-priority 50

Tips: 从库需要添加slaveof和masterauth参数

启动Codis

$ codis-server /etc/codis/codis-server/codis.conf

Codis Dashboard部署(157)

配置参数文件

##################################################
# #
# Codis-Dashboard #
# #
##################################################

# Set Coordinator, only accept "zookeeper" & "etcd" & "filesystem".
# for zookeeper/etcd, coorinator_auth accept "user:password"
# Quick Start
#coordinator_name = "filesystem"
#coordinator_addr = "/tmp/codis"
coordinator_name = "zookeeper"
coordinator_addr = "10.240.204.157:21000,10.240.204.165:21000,10.240.204.149:21000"
#coordinator_auth = ""

# Set Codis Product Name/Auth.
product_name = "codis-test"
product_auth = "Abcd123#”


# Set bind address for admin(rpc), tcp only.
admin_addr = "10.240.204.157:18080"

# Set arguments for data migration (only accept 'sync' & 'semi-async').
migration_method = "semi-async"
migration_parallel_slots = 100
migration_async_maxbulks = 200
migration_async_maxbytes = "32mb"
migration_async_numkeys = 500
migration_timeout = "30s"

# Set configs for redis sentinel.
sentinel_client_timeout = "10s"
sentinel_quorum = 2
sentinel_parallel_syncs = 1
sentinel_down_after = "30s"
sentinel_failover_timeout = "5m"
sentinel_notification_script = ""
sentinel_client_reconfig_script = ""
  • coordinator_name:外部存储类型,接受zookeeper/etcd
  • coordinator_addr:外部存储地址
  • product_name:集群名称
  • product_auth:集群密码,默认为空
  • admin_addr:RESETFUL API端口

启动Dashboard

$ codis-dashboard --ncpu=2 --config=/etc/codis/codis-dashboard/dashboard.toml --log=/service/codis/dashboard.log --log-level=warn &

关闭Dashboard

$ codis-admin --dashboard=10.240.204.157:18080 --shutdown

Codis proxy部署

配置参数文件

$ cat /etc/codis/codis-proxy/proxy.toml
##################################################
# #
# Codis-Proxy #
# #
##################################################

# Set Codis Product Name/Auth.
product_name = "codis-test"
product_auth = "Abcd123#"

# Set auth for client session
# 1. product_auth is used for auth validation among codis-dashboard,
# codis-proxy and codis-server.
# 2. session_auth is different from product_auth, it requires clients
# to issue AUTH <PASSWORD> before processing any other commands.
session_auth = ""

# Set bind address for admin(rpc), tcp only.
admin_addr = "10.240.204.157:11000"

# Set bind address for proxy, proto_type can be "tcp", "tcp4", "tcp6", "unix" or "unixpacket".
proto_type = "tcp4"
proxy_addr = "10.240.204.157:19000"

# Set jodis address & session timeout
# 1. jodis_name is short for jodis_coordinator_name, only accept "zookeeper" & "etcd".
# 2. jodis_addr is short for jodis_coordinator_addr
# 3. jodis_auth is short for jodis_coordinator_auth, for zookeeper/etcd, "user:password" is accepted.
# 4. proxy will be registered as node:
# if jodis_compatible = true (not suggested):
# /zk/codis/db_{PRODUCT_NAME}/proxy-{HASHID} (compatible with Codis2.0)
# or else
# /jodis/{PRODUCT_NAME}/proxy-{HASHID}
jodis_name = "zookeeper"
jodis_addr = "10.240.204.157:21000,10.240.204.165:21000,10.240.204.149:21000"
jodis_timeout = "20s"
jodis_compatible = false
backend_ping_period = 5
session_max_timeout = 1800
session_max_bufsize = 131072
session_max_pipeline = 1024
session_keepalive_period = 60
  • product_name:集群名称,与dashboard一致
  • product_auth:集群密码,默认为空
  • admin_addr:RESTfulAPI端口
  • proxy_type:redis端口类型
  • proxy_addr:redis端口地址或者路径
  • jodis_timeout:jodis注册session timeout参数,单位为秒
  • jodis_compatible:Jodis注册zookeeper的路径
  • backend_ping_period:与codis-server探活周期,单位为秒,0表示禁止
  • session_max_timeout:与client连接最大超时时间,单位为秒,0表示禁止
  • session_max_bufsize:与client连接读写缓冲区大小,单位byte
  • session_max_pipeline:与client连接最大的pipeline大小,官方建议不要超过1M,否则在迁移时会有卡顿感
  • session_keepalive_period:与client的tcp keepalive周期,仅针对TCP,0表示禁止

启动proxy

$ codis-proxy --ncpu=2 --config=/etc/codis/codis-proxy/proxy.toml --log=/service/codis/proxy.log --log-level=warn &

关闭proxy

$ codis-admin --proxy=10.240.204.157:11000 --shutdown

Codis FE部署

生产参数文件

$  codis-admin --dashboard-list --zookeeper=10.240.204.157:21000 | tee codis.json
2019/08/09 16:38:34 zkclient.go:23: [INFO] zookeeper - zkclient setup new connection to 10.240.204.157:21000
2019/08/09 16:38:34 zkclient.go:23: [INFO] zookeeper - Connected to 10.240.204.157:21000
2019/08/09 16:38:34 zkclient.go:23: [INFO] zookeeper - Authenticated: id=73835832729600004, timeout=40000
2019/08/09 16:38:34 zkclient.go:23: [INFO] zookeeper - Re-submitting `0` credentials after reconnect
[
{
"name": "codis-test",
"dashboard": "10.240.204.157:18080"
}
]
2019/08/09 16:38:34 zkclient.go:23: [INFO] zookeeper - Recv loop terminated: err=EOF
2019/08/09 16:38:34 zkclient.go:23: [INFO] zookeeper - Send loop terminated: err=<nil>

启动FE

$ codis-fe --ncpu=1 --dashboard-list=/etc/codis/codis-fe/codis.json --listen=10.240.204.157:18090 --log=/service/codis/fe.log --log-level=warn --assets-dir=/usr/local/codis/assets/ &

关闭FE

$ ps -ef|grep codis-fe|grep -v grep|awk '{print $2}'|xargs kill

Sentinel配置

配置参数文件

port 10003
dir "/service/codis"
logfile "/service/codis/sentinel.log"
daemonize yes
protected-mode no

启动哨兵服务

$ codis-server /etc/codis/codis-server/sentinel.conf --sentinel

Codis群集配置

添加proxy
访问WEB控制台http://10.240.204.157:18090
PROXY

点击new proxy添加proxy
new proxy

添加Codis Server
先点击NEW GROUP添加一个组,再点击Add Server添加成员到组钟
new group

添加Sentinel
添加哨兵服务,并点击sync,会自动获取主从关系
Sentinel

分配slot
默认1024个slot,可以将它们进行分组,1-300为group1,301-800为group2,801-1023为group3
get slot

迁移slot
若group1内存不足,则添加group4,将指定数量的slot从group1迁到group4中,迁移过程无影响
move slot

附录

dashboard异常关闭

dashboard异常关闭,错误日志如下

2019/02/11 13:31:43 topom.go:189: [ERROR] store: acquire lock of codis-testX failed
[error]: zk: node already exists
6 /opt/gowork/src/github.com/CodisLabs/codis/pkg/models/zk/zkclient.go:247
github.com/CodisLabs/codis/pkg/models/zk.(*Client).create
5 /opt/gowork/src/github.com/CodisLabs/codis/pkg/models/zk/zkclient.go:196
github.com/CodisLabs/codis/pkg/models/zk.(*Client).Create.func1
4 /opt/gowork/src/github.com/CodisLabs/codis/pkg/models/zk/zkclient.go:129
github.com/CodisLabs/codis/pkg/models/zk.(*Client).shell
3 /opt/gowork/src/github.com/CodisLabs/codis/pkg/models/zk/zkclient.go:195
github.com/CodisLabs/codis/pkg/models/zk.(*Client).Create
2 /opt/gowork/src/github.com/CodisLabs/codis/pkg/models/store.go:119
github.com/CodisLabs/codis/pkg/models.(*Store).Acquire
1 /opt/gowork/src/github.com/CodisLabs/codis/pkg/topom/topom.go:188
github.com/CodisLabs/codis/pkg/topom.(*Topom).Start
0 /opt/gowork/src/github.com/CodisLabs/codis/cmd/dashboard/main.go:169
main.main
... ...
[stack]:
1 /opt/gowork/src/github.com/CodisLabs/codis/pkg/topom/topom.go:189
github.com/CodisLabs/codis/pkg/topom.(*Topom).Start
0 /opt/gowork/src/github.com/CodisLabs/codis/cmd/dashboard/main.go:169
main.main

退出dashboard

$ codis-admin --dashboard=10.240.204.157:18080 --shutdown

删除lock

$ codis-admin --remove-lock --product=codis-test --zookeeper=10.240.204.157:21000

启动dashboard

$ codis-dashboard --ncpu=2 --config=/etc/codis/codis-dashboard/dashboard.toml --log=/service/codis/dashboard.log --log-level=warn &