MYSQLINNODBreplaceinto死锁及nextkeylock浅析

原创:全文带入了大量自我认知和理解,可能错误,因为水平有限,但是代表我努力分析过。



一、问题提出
问题是由姜大师提出的、问题如下:
表:
MySQL> show create table c \G
*************************** 1. row ***************************
       Table: c
Create Table: CREATE TABLE `c` (
  `a` int(11) NOT NULL AUTO_INCREMENT,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`),
  UNIQUE KEY `b` (`b`)
) ENGINE=InnoDB 
1 row in set (0.01 sec)
开启两个会话不断的执行
replace into c values(NULL,1);
会触发死锁。问死锁触发的原因。

我使用的环境:
MYSQL 5.7.14 debug版本、隔离级别RR、自动提交,很显然这里的c表中的可以select出来的记录始终是1条
只是a列不断的增大,但是这里实际存储空间确不止1条,因为从heap no来看二级索引中,heap no 已经到了
7,也就是有至少7(7-1)条记录,只是其他记录标记为del并且被purge线程放到了page free_list中。

二、准备工作和使用方法
1、稍微修改了源码关于锁的打印部分,我们知道每个事物下显示锁内存结构lock 
   struct会连接成一个链表,只要按照顺序打印出内存lock struct就打印出了
   所有关于这个事物显示锁全部信息和加锁顺序如下:

点击(此处)折叠或打开

10年积累的做网站、成都网站设计经验,可以快速应对客户对网站的新想法和需求。提供各种问题对应的解决方案。让选择我们的客户得到更好、更有力的网络服务。我虽然不认识你,你也不认识我。但先网站设计后付款的网站建设流程,更有邳州免费网站建设让你可以放心的选择与我们合作。

  1. ---TRANSACTION 184771, ACTIVE 45 sec
  2. 4 lock struct(s), heap size 1160, 3 row lock(s)
  3. MySQL thread id 2, OS thread handle 140737154311936, query id 642 localhost root cleaning up
  4. ---lock strcut(1):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
  5. TABLE LOCK table `test`.`c4` trx id 184771 lock mode IX
  6. ---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
  7. RECORD LOCKS space id 413 page no 4 n bits 72 index id2 of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X)
  8. Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
  9.  0: len 4; hex 80000014; asc ;;
  10.  1: len 4; hex 80000014; asc ;;
  11. ---lock strcut(3):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
  12. RECORD LOCKS space id 413 page no 3 n bits 72 index PRIMARY of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP)
  13. Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
  14.  0: len 4; hex 80000014; asc ;;
  15.  1: len 6; hex 00000002d1bd; asc ;;
  16.  2: len 7; hex a600000e230110; asc # ;;
  17.  3: len 4; hex 80000014; asc ;;
  18. ---lock strcut(4):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
  19. RECORD LOCKS space id 413 page no 4 n bits 72 index id2 of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP)
  20. Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
  21.  0: len 4; hex 8000001e; asc ;;
  22.  1: len 4; hex 8000001e; asc ;;


正常的版本只有

点击(此处)折叠或打开

  1. ---TRANSACTION 184771, ACTIVE 45 sec
  2. 4 lock struct(s), heap size 1160, 3 row lock(s)
  3. MySQL thread id 2, OS thread handle 140737154311936, query id 642 localhost root cleaning up
部分后面的都是我加上的,其实修改很简单,innodb其实自己写好了只是没有开启,我开启后加上了序号来表示顺序。
上面是一个 select * from c where  id2= 20 for update; b列为辅助索引的所有4 lock struct(s),可以看到有了这些信息分析
不那么难了。
这里稍微分析一下
表结构为:
mysql> show create table c4;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                  |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| c4    | CREATE TABLE `c4` (
  `id1` int(11) NOT NULL,
  `id2` int(11) DEFAULT NULL,
  PRIMARY KEY (`id1`),
  KEY `id2` (`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
数据为:
mysql> select * from c4;
+-----+------+
| id1 | id2  |
+-----+------+
|   1 |    1 |
|  10 |   10 |
|  20 |   20 |
|  30 |   30 |
+-----+------+
4 rows in set (0.00 sec)
语句为:
 select * from c where  id2= 20 for update;
RR模式
从锁结构链表来看,这个语句在辅助索引分别锁定了
id2:20 id1:20 LOCK_X|LOCK_ORDINARY 也就是NEXT KEY LOCK
同时锁定了
id2:30 id1:30 LOCK_X|LOCK_GAP也就是gap lock不包含这一列
那么画个图容易理解黄色部分为锁定部分:
MYSQL INNODB replace into 死锁 及 next key lock 浅析
是不是一目了然?如果是rc那么锁定的只有记录了两个黄色箭头
表示gap没有了就不在画图了

2、在死锁检测回滚前调用这个打印函数打印到err日志文件中,打印出全部的事物的显示内存lock struct如下,这里就
不给出了,后面会有replace触发死锁千事物锁结构的一个输出


3、使用MYSQL TRACE SQL语句得到大部分的函数调用来分析replace的过程

修改出现的问题:修改源码打印出所有lock struct 在线上显然是不能用的。因为打印出来后show engine innodb status 会非常
长,甚至引发其他问题,但是测试是可以,其次修改了打印死锁事物锁链表到日志后,每次只要遇到死锁信息可以打印
到日志,但是每次MYSQLD都会挂掉,但是不影响分析了。

三、预备知识(自我理解)
1、
Precise modes:
#define LOCK_ORDINARY 0 /*!< this flag denotes an ordinary
next-key lock in contrast to LOCK_GAP
or LOCK_REC_NOT_GAP */
默认是LOCK_ORDINARY及普通的next_key_lock,锁住行及以前的间隙
#define LOCK_GAP 512 /*!< when this bit is set, it means that the
lock holds only on the gap before the record;
for instance, an x-lock on the gap does not
give permission to modify the record on which
the bit is set; locks of this type are created
when records are removed from the index chain
of records */
间隙锁,锁住行以前的间隙,不锁住本行
#define LOCK_REC_NOT_GAP 1024 /*!< this bit means that the lock is only on
the index record and does NOT block inserts
to the gap before the index record; this is
used in the case when we retrieve a record
with a unique key, and is also used in
locking plain SELECTs (not part of UPDATE
or DELETE) when the user has set the READ
COMMITTED isolation level */
行锁,锁住行而不锁住任何间隙
#define LOCK_INSERT_INTENTION 2048 /*!< this bit is set when we place a waiting
gap type record lock request in order to let
an insert of an index record to wait until
there are no conflicting locks by other
transactions on the gap; note that this flag
remains set when the waiting lock is granted,
or if the lock is inherited record */
插入意向锁,如果插入的记录在某个已经锁定的间隙内为这个锁
2、参数innodb_autoinc_lock_mode的值为1,也许不能保证replace into的顺序。
3、infimum和supremum
   一个page中包含这两个伪列,页中所有的行未删除(删除未purge)的行都连接到这两个虚列之间,其中
   supremum伪列的锁始终为next_key_lock。
4、heap no
   此行在page中的heap no heap no存储在fixed_extrasize 中,heap no 为物理存储填充的序号,页的空闲空间挂载在page free链表中(头插法)可以重用,
   但是重用此heap no不变,如果一直是insert 则heap no 不断增加,并非按照KEY大小排序的逻辑链表顺序,而是物理填充顺序 
5、n bits
   和这个page相关的锁位图的大小如果我的表有9条数据 还包含2个infimum和supremum虚拟列 及 64+11 bits,及75bits但是必须被8整除为一个字节就是
   80 bits
6、隐含锁(Implicit lock)和显示锁(explict)
锁有隐含和显示之分。隐含锁通常发生在 insert 的时候对cluster index和second index 都加隐含锁,如果是UPDATE(DELETE)对cluster index加显示锁 辅助
索引加隐含锁。目的在于减少锁结构的内存开销,如果有事务需要和这个隐含锁而不兼容,这个事务需要帮助 insert或者update(delete)事物将隐含
锁变为显示锁,然后给自己加锁,通常insert主键检查会给自己加上S锁,REPLACE、delete、update通常会给自己加上X锁。

四、replace过程分析
通过replace的trace找到了这些步骤的大概调用:
首先我们假设
TRX1:replace 不提交
TRX2:replace 堵塞
TRX1:replace 提交
TRX2:replace 继续执行直到完成
这样做的目的在于通过trace找到TRX2在哪里等待,确实如我所愿我找到了。

1、检查是否冲突,插入主键 

点击(此处)折叠或打开

  1.     569 T@4: | | | | | | | | >row_ins
  2.     570 T@4: | | | | | | | | | row_ins: table: test/c
  3.     571 T@4: | | | | | | | | | >row_ins_index_entry_step
  4.     572 T@4: | | | | | | | | | | >row_ins_clust_index_entry
  5.     573 T@4: | | | | | | | | | | | >row_ins_clust_index_entry_low
  6.     574 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
  7.     575 T@4: | | | | | | | | | | | | 
  8.     576 T@4: | | | | | | | | | | | | ib_cur: insert PRIMARY (366) by 183808: TUPLE (info_bits=0, 4 fields): {[4] %(0x00000005),[6] (0x000000020E00),[7] (0x 0A000001010100),[4] (0x00000001)}
  9.     577 T@4: | | | | | | | | | | | 
  10.     578 T@4: | | | | | | | | | | 
  11.     579 T@4: | | | | | | | | | 
2、检查是否冲突,插入辅助索引,这里实际上就是会话2被堵塞的地方,如下解释
(如果冲突回滚先前插入的主键内容)

点击(此处)折叠或打开

  1.     580 T@4: | | | | | | | | | >row_ins_index_entry_step 3589
  2.     581 T@4: | | | | | | | | | | >row_ins_sec_index_entry_low
  3.     582 T@4: | | | | | | | | | | | >btr_cur_search_to_nth_level
  4.     583 T@4: | | | | | | | | | | | 
  5.     584 T@4: | | | | | | | | | | | >row_ins_scan_sec_index_for_duplicate
  6.     585 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
  7.     586 T@4: | | | | | | | | | | | | 
  8.     587 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
  9.     588 T@4: | | | | | | | | | | | | 
  10.     589 T@4: | | | | | | | | | | | | >row_vers_impl_x_locked_low
  11.     590 T@4: | | | | | | | | | | | | | info: Implicit lock is held by trx:183803
  12.     591 T@4: | | | | | | | | | | | | 
  13.     592 T@4: | | | | | | | | | | | | >thd_report_row_lock_wait
  14.     593 T@4: | | | | | | | | | | | | 
  15.     594 T@4: | | | | | | | | | | | 
  16.     595 T@4: | | | | | | | | | | 
  17.     596 T@4: | | | | | | | | | 
  18.     597 T@4: | | | | | | | | 
  19.     598 //wait here
  20.     这里我做trace的时候事物的trace停止在了这里我特意加上了598//wait here从下面的输出
  21.     我们也能肯定确实这里触发了锁等待 
  22.     >row_vers_impl_x_locked_low
  23.     | info: Implicit lock is held by trx:183803
  24.     
  25.     >thd_report_row_lock_wait
  26.     
  27.     等待获得锁过后重新检查:
  28.     599 T@4: | | | | | | | | >row_ins
  29.     600 T@4: | | | | | | | | | row_ins: table: test/c
  30.     601 T@4: | | | | | | | | | >row_ins_index_entry_step
  31.     602 T@4: | | | | | | | | | | >row_ins_sec_index_entry_low
  32.     603 T@4: | | | | | | | | | | | >btr_cur_search_to_nth_level
  33.     604 T@4: | | | | | | | | | | | 
  34.     605 T@4: | | | | | | | | | | | >row_ins_scan_sec_index_for_duplicate
  35.     606 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
  36.     607 T@4: | | | | | | | | | | | | 
  37.     608 T@4: | | | | | | | | | | | 
  38.     609 T@4: | | | | | | | | | | 
  39.     610 T@4: | | | | | | | | | 
  40.     611 T@4: | | | | | | | | 
我们可以隐隐约约看到row_ins_sec_index_entry_low和row_ins_clust_index_entry_low回检查是否有重复的行
    分别代表是二级索引和聚集索引的相关检查,因为就这个案例主键不可能出现重复值,而二级索引这个例子中肯定是
    重复的,索引row_ins_sec_index_entry_low触发了等待,其实我们知道这里的锁方式如下列子:
    
---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK chain! for this Trx
RECORD LOCKS space id 406 page no 4 n bits 72 index b of table `test`.`c` trx id 177891 lock_mode X(LOCK_X) waiting(LOCK_WAIT)
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000006; asc     ;;
 
LOCK_X|LOCK_ORDINARY|LOCK_WAIT:需要X的next_key lock处于等待状态他需要锁定(infimum,{1,6}]这个区间。
这也是死锁发生的关键一个环节。

3、这里涉及到了回滚操作,从下面的trace输出我们也能看到确实做了回滚
   实际上事物2会堵塞在这里,因为我做trace的时候他一直停在
   这里不动了。为此我还加上598行说明在这里wait了

点击(此处)折叠或打开

  1.     612 T@4: | | | | | | | | ib_que: Execute 9 (QUERY THREAD) at 0x7fffe804b7d8
  2.     613 T@4: | | | | | | | | ib_que: Execute 12 (ROLLBACK) at 0x7fffe804b6b0
  3.     614 T@4: | | | | | | | | ib_que: Execute 12 (ROLLBACK) at 0x7fffe804b6b0
  4.     615 T@4: | | | | | | | | ib_que: Execute 9 (QUERY THREAD) at 0x7fffe804b7d8
  5.     616 T@4: | | | | | | | | ib_que: Execute 9 (QUERY THREAD) at 0x7fffe800eec8
  6.     617 T@4: | | | | | | | | ib_que: Execute 10 (UNDO ROW) at 0x7fffe801b090
  7.     618 T@4: | | | | | | | | >btr_cur_search_to_nth_level
  8.     619 T@4: | | | | | | | | 
  9.     620 T@4: | | | | | | | | >btr_cur_search_to_nth_level
  10.     621 T@4: | | | | | | | | 
  11.     622 T@4: | | | | | | | | ib_que: Execute 10 (UNDO ROW) at 0x7fffe801b090
  12.     623 T@4: | | | | | | | | ib_que: Execute 9 (QUERY THREAD) at 0x7fffe800eec8
4、这个重复key会传递给SERVER层次,并且貌似重新初始化了事物(只是从trace猜测)

点击(此处)折叠或打开

  1. 639 T@4: | | | | | | >handler::get_dup_key
  2.     640 T@4: | | | | | | | >info
  3.     641 T@4: | | | | | | | | >ha_innobase::update_thd
  4.     642 T@4: | | | | | | | | | ha_innobase::update_thd: user_thd: 0x7fffe8000b90 -> 0x7fffe8000b90
  5.     643 T@4: | | | | | | | | | >innobase_trx_init
  6.     644 T@4: | | | | | | | | | 
  7.     645 T@4: | | | | | | | | 
  8.     646 T@4: | | | | | | | 
  9.     647 T@4: | | | | | | 
  10.     648 T@4: | | | | | | >column_bitmaps_signal
  11.     649 T@4: | | | | | | | info: read_set: 0x7fffc8941da0 write_set: 0x7fffc8941da0
  12.     650 T@4: | | | | | | 
  13.     651 T@4: | | | | | | >innobase_trx_init
  14.     652 T@4: | | | | | | 
  15.     653 T@4: | | | | | | >index_init
  16.     654 T@4: | | | | | | 
5、接下就是真正删除插入主键

点击(此处)折叠或打开

  1.     689 T@4: | | | | | | | | >row_update_for_mysql_using_upd_graph
  2.     690 T@4: | | | | | | | | | >row_upd_step
  3.     691 T@4: | | | | | | | | | | >row_upd
  4.     692 T@4: | | | | | | | | | | | row_upd: table: test/c
  5.     693 T@4: | | | | | | | | | | | row_upd: info bits in update vector: 0x0
  6.     694 T@4: | | | | | | | | | | | row_upd: foreign_id: NULL
  7.     695 T@4: | | | | | | | | | | | ib_cur: delete-mark clust test/c (366) by 183808: COMPACT RECORD(info_bits=32, 4 fields): {[4] $(0x00000004),[6] (0x000000020D 0B),[7] (0x00000001090100),[4] (0x00000001)}
  8.     696 T@4: | | | | | | | | | | | >row_ins_clust_index_entry
  9.     697 T@4: | | | | | | | | | | | | >row_ins_clust_index_entry_low
  10.     698 T@4: | | | | | | | | | | | | | >btr_cur_search_to_nth_level
  11.     699 T@4: | | | | | | | | | | | | | 
  12.     700 T@4: | | | | | | | | | | | | | ib_cur: insert PRIMARY (366) by 183808: TUPLE (info_bits=0, 4 fields): {[4] %(0x00000005),[6](0x000000020E00),[7] ( 0x00000001090100),[4] (0x00000001)}
  13.     701 T@4: | | | | | | | | | | | | 
  14.     702 T@4: | | | | | | | | | | | 
  15.     703 T@4: | | | | | | | | | | | >btr_cur_search_to_nth_level
  16.     704 T@4: | | | | | | | | | | | 
  17.     705 T@4: | | | | | | | | | | | ib_cur: delete-mark=1 sec 406:4:2 in b(367) by 183808
6、接下就是真正插入辅助索引

点击(此处)折叠或打开

  1.     706 T@4: | | | | | | | | | | | >row_ins_sec_index_entry_low
  2.     707 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
  3.     708 T@4: | | | | | | | | | | | | 
  4.     709 T@4: | | | | | | | | | | | | >row_ins_scan_sec_index_for_duplicate
  5.     710 T@4: | | | | | | | | | | | | | >btr_cur_search_to_nth_level
  6.     711 T@4: | | | | | | | | | | | | | 
  7.     712 T@4: | | | | | | | | | | | | | >btr_cur_search_to_nth_level
  8.     713 T@4: | | | | | | | | | | | | | 
  9.     714 T@4: | | | | | | | | | | | | | >row_vers_impl_x_locked_low
  10.     715 T@4: | | | | | | | | | | | | | 
  11.     716 T@4: | | | | | | | | | | | | | >btr_cur_search_to_nth_level
  12.     717 T@4: | | | | | | | | | | | | | 
  13.     718 T@4: | | | | | | | | | | | | | >row_vers_impl_x_locked_low
  14.     719 T@4: | | | | | | | | | | | | | | info: Implicit lock is held by trx:183808
  15.     720 T@4: | | | | | | | | | | | | | 
  16.     721 T@4: | | | | | | | | | | | | 
  17.     722 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
  18.     723 T@4: | | | | | | | | | | | | 
  19.     724 T@4: | | | | | | | | | | | | ib_cur: insert b (367) by 183808: TUPLE (info_bits=0, 2 fields): {[4] (0x00000001),[4] %(0x00000005)}
  20.     725 T@4: | | | | | | | | | | | 
  21.     726 T@4: | | | | | | | | | | 
  22.     727 T@4: | | | | | | | | | 
  23.     728 T@4: | | | | | | | | 
  24.     729 T@4: | | | | | | | 
注意:上面只是看trace出来的过程,很多是根据函数调用进行的猜测。

五、死锁前事物锁信息打印分析
打印出死锁前事物的全部信息

点击(此处)折叠或打开

  1. ------------------------
  2. LATEST DETECTED DEADLOCK
  3. ------------------------
  4. 2017-06-29 14:10:30 0x7fa48148b700
  5. *** (1) TRANSACTION:
  6. TRANSACTION 4912797, ACTIVE 0 sec inserting
  7. mysql tables in use 1, locked 1
  8. LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
  9. MySQL thread id 2, OS thread handle 140344520656640, query id 3371 localhost root update
  10. replace into c values(num,1)
  11. *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
  12. RECORD LOCKS space id 598 page no 4 n bits 80 index b of table `test`.`c` trx id 4912797 lock_mode X waiting
  13. Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
  14.  0: len 4; hex 80000001; asc ;;
  15.  1: len 4; hex 800007d5; asc ;;

  16. *** (2) TRANSACTION:
  17. TRANSACTION 4912793, ACTIVE 0 sec updating or deleting
  18. mysql tables in use 1, locked 1
  19. 6 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2
  20. MySQL thread id 3, OS thread handle 140344520390400, query id 3365 localhost root update
  21. replace into c values(num,1)
  22. *** (2) HOLDS THE LOCK(S):
  23. RECORD LOCKS space id 598 page no 4 n bits 80 index b of table `test`.`c` trx id 4912793 lock_mode X
  24. Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
  25.  0: len 4; hex 80000001; asc ;;
  26.  1: len 4; hex 800007d5; asc ;;

  27. *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
  28. RECORD LOCKS space id 598 page no 4 n bits 80 index b of table `test`.`c` trx id 4912793 lock_mode X locks gap before rec insert intention waiting
  29. Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
  30.  0: len 4; hex 80000001; asc ;;
  31.  1: len 4; hex 800007d5; asc ;;

  32. **
    本文名称:MYSQLINNODBreplaceinto死锁及nextkeylock浅析
    文章转载:http://scyanting.com/article/ipehec.html