關(guān)于MySQL報警的一次分析處理詳解

更新時間：2021年03月19日 09:07:48 投稿：daisy

這篇文章主要給大家介紹了關(guān)于MySQL報警的一次分析處理的相關(guān)資料，文中通過示例代碼介紹的非常詳細(xì)，對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧

最近有一個服務(wù)出現(xiàn)了報警，已經(jīng)讓我到了忍無可忍的地步，報警信息如下：

Metric:mysql.innodb_row_lock_waits Tags:port=4306,service=xxxx diff(#1): 996>900

大概的意思是有一個數(shù)據(jù)庫監(jiān)控指標(biāo) innodb_row_lock_waits 目前超出了閾值900

但是尷尬的是，每次報警后去環(huán)境中查看，得到的信息都很有限，慢日志，錯誤日志里面都沒有充分的信息可以分析，一來二去之后，我開始靜下心來分析這個問題的原因。

首先這個報警信息的時間點貌似是有些規(guī)律的，我拿著最近幾天的報警時間做了比對，發(fā)現(xiàn)還是比較有規(guī)律的，那么在系統(tǒng)層面有哪些任務(wù)可能會觸發(fā)呢，我查找比對了相關(guān)的任務(wù)配置，發(fā)現(xiàn)有一個定時任務(wù)每1分鐘會執(zhí)行一次，但是到了這里疑問就來了，如果每1分鐘執(zhí)行1次，為什么在特定的時間會產(chǎn)生差異較大的處理結(jié)果？當(dāng)然這個現(xiàn)象的解釋是個起始。

其實要證明這一點還是蠻容易的，今天我就采取了守株待兔的模式，我在臨近報警的時間前后打開了通用日志，從日志輸出來看，操作的頻率還是相對有限的。

很快得到了規(guī)律性的報警，于是我開始抓取相關(guān)的通用日志記錄，比如11:18分，我們可以采用如下的模式得到相關(guān)的日志，首先得到一個臨時的通用日志文件，把各種DML和執(zhí)行操作都網(wǎng)羅進(jìn)來。

cat general.log|grep -E "insert|delete|update|select|exec" > general_tmp.log

我們以11:18分為例，可以在前后1兩分鐘做比對，結(jié)果如下：

# less general_tmp.log |grep "11:18"|wc -l

400

# less general_tmp.log |grep "11:17"|wc -l

666

# less general_tmp.log |grep "11:16"|wc -l

15

發(fā)現(xiàn)在報警的那1分鐘前后，數(shù)量是能夠?qū)Φ蒙系摹?/p>

這個表的數(shù)據(jù)量有200多萬，表結(jié)構(gòu)如下：

CREATE TABLE `task_queue` (
 `AccID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '自增ID',
 `TaskStepID` bigint(20) DEFAULT NULL COMMENT '任務(wù)步驟ID task_step_conf',
 `QOrder` int(11) DEFAULT NULL COMMENT '隊列排序 task_step_confi.Step_ID',
 `QState` tinyint(4) DEFAULT '1' COMMENT '隊列狀態(tài) 1：待執(zhí)行 2：執(zhí)行中 3：執(zhí)行成功 4：執(zhí)行失敗',
 `QExcCount` int(11) DEFAULT '1' COMMENT '執(zhí)行次數(shù)',
 `CrtTime` datetime DEFAULT NULL COMMENT '創(chuàng)建時間',
 `ModTime` datetime DEFAULT NULL COMMENT '修改時間',
 PRIMARY KEY (`AccID`),
 KEY `idx_taskstepid` (`TaskStepID`),
 KEY `idx_qstate` (`QState`)
) ENGINE=InnoDB AUTO_INCREMENT=3398341 DEFAULT CHARSET=utf8

在日志中根據(jù)分析和比對，基本能夠鎖定SQL是在一類Update操作上面，SQL的執(zhí)行計劃如下：

>>explain update task_queue set QState=1,QExcCount=QExcCount+1,modtime=now() where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: UPDATE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_taskstepid,idx_qstate
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where; Using temporary

這個執(zhí)行結(jié)果中key_len是2,9，是和以往的ken_len計算法則不一樣的。其中Extra列已經(jīng)給出了明確的提示，這是一個intersect處理，特別的是它是基于二級索引級別的處理，在優(yōu)化器層面是有一個相關(guān)的參數(shù)index_merge_intersection。

我們知道在MySQL中主鍵是一等公民，而二級索引最后都會映射到主鍵層面處理，而索引級別的intersect其實有點我們的左右手，左手對應(yīng)一些數(shù)據(jù)結(jié)果映射到一批主鍵id,右手對應(yīng)一些數(shù)據(jù)結(jié)果映射到另外一批主鍵id,把兩者的主鍵id值進(jìn)行intersect交集計算,所以在當(dāng)前的場景中，索引級別的intersect到底好不好呢？

在此我設(shè)想了3個對比場景進(jìn)行分析，首先這是一個update語句，我們?yōu)榱吮ＷC后續(xù)測試的可重復(fù)性，可以轉(zhuǎn)換為一個select語句。

select * from task_queue where QState=0 and taskstepid =411;

所以我們的對比測試基于查詢語句進(jìn)行比對分析。

場景1:優(yōu)化器保持默認(rèn)index_merge_intersection開啟，基于profile提取執(zhí)行明細(xì)信息

>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where
1 row in set, 1 warning (0.00 sec)

profile信息為：

場景2:優(yōu)化器關(guān)閉index_merge_intersection，基于profile進(jìn)行對比

>set session optimizer_switch='index_merge_intersection=off';


>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: ref
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate
  key_len: 2
   ref: const
   rows: 1451
  filtered: 0.82
  Extra: Using where
1 row in set, 1 warning (0.00 sec)

profile信息為：

場景3：重構(gòu)索引，進(jìn)行比對分析

根據(jù)業(yè)務(wù)邏輯，如果創(chuàng)建一個復(fù)合索引，是能夠大大減少結(jié)果集的量級的，同時依然保留 idx_ qsta te 索引，使得一些業(yè)務(wù)依然能夠正常使用。

>alter table task_queue drop key idx_taskstepid;
>alter table task_queue add key `idx_taskstepid` (`TaskStepID`,QState);
explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
      id: 1
 select_type: SIMPLE
    table: task_queue
  partitions: NULL
     type: ref
possible_keys: idx_qstate,idx_taskstepid
     key: idx_taskstepid
   key_len: 11
     ref: const,const
     rows: 1
   filtered: 100.00
    Extra: NULL
1 row in set, 1 warning (0.00 sec)

profile信息為：