当前业务出现死锁情况,其操作流程为批量DELETE再批量INSERT,经过排查发现是由于二级唯一索引在DELETE之后未完全被purge回收,再批量插入时互相等待造成的死锁。这里需要说明的是,当记录被DELETE删除后该行记录会被标识为DELETE_MARK状态,此时并没有真正地被回收,只有等purge线程回收后才是真正意义的被删除。

接下来,我们就在测试环境模拟并分析其中的原因,为了准确复现当时的情形,我们需要以mysqld_debug启动并关闭purge线程

$ mysqld-debug --user=mysql&
root@(none) 16:36: set global innodb_purge_stop_now=1;

数据库中存在如下数据表,A列为主键,B列为唯一索引,隔离级别为RC

root@test 16:25:  desc z;
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| a | int | NO | PRI | NULL | |
| b | int | YES | UNI | NULL | |
+-------+------+------+-----+---------+-------+

root@test 16:25: select * from z;
+---+------+
| a | b |
+---+------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
| 5 | 50 |
| 6 | 60 |
| 7 | 70 |
+---+------+

先delete删除b=20和b=50的记录,由于已经关闭了purge线程,这两条记录并未真正删除

root@test 16:39:  delete from z where b=20;
Query OK, 1 row affected (0.01 sec)

root@test 16:40: delete from z where b=40;
Query OK, 1 row affected (0.01 sec)

开启两个事务对表插入下列记录

TIME TX1 TX2
1 begin; begin;
2 insert into z values(8,20); insert into z values(9,40);
3 insert into z values(10,45); insert into z values(11,25);

通过上诉测试结果,我们可以看到TX2最后一个操作会报死锁退出,但我们这插入的数据并不重复,为什么会出现这么多锁呢。一起来看看show engine innodb status的信息

LATEST DETECTED DEADLOCK
------------------------
2020-05-31 16:53:04 0x7f4af9c67700
*** (1) TRANSACTION:
TRANSACTION 6303, ACTIVE 77 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 2
MySQL thread id 81, OS thread handle 139959995107072, query id 150 localhost root update
insert into z values(10,45)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 31 page no 4 n bits 80 index b of table `test`.`z` trx id 6303 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80000032; asc 2;;
1: len 4; hex 80000005; asc ;;

*** (2) TRANSACTION:
TRANSACTION 6308, ACTIVE 69 sec inserting, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1160, 5 row lock(s), undo log entries 2
MySQL thread id 88, OS thread handle 139959994840832, query id 152 localhost root update
insert into z values(11,25)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 31 page no 4 n bits 80 index b of table `test`.`z` trx id 6308 lock mode S
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000028; asc (;;
1: len 4; hex 80000004; asc ;;

Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80000032; asc 2;;
1: len 4; hex 80000005; asc ;;

Record lock, heap no 7 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 8000003c; asc <;;
1: len 4; hex 80000006; asc ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 31 page no 4 n bits 80 index b of table `test`.`z` trx id 6308 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 8000001e; asc ;;
1: len 4; hex 80000003; asc ;;

*** WE ROLL BACK TRANSACTION (2)

TX1在等待b=60上的gap lock,而TX2中info bits 32则是我们已经删除的b=40那条记录,但并没有真正回收该记录,重新插入时会在该记录上添加S-Lock,并且对b=50和b=60的记录也添加了S-Lock,TX2则在等待B=30上的gap lock,而TX1持有该锁。因此,出现了互相等待对方的锁,从而导致死锁发生。

这里业务采用并发插入,插入是根据主键顺序组织的插入的,容易造成这种死锁情况,可以针对B列进行排序,按照B列索引顺序插入。