在MySQL中,针对不同的类型的子查询,优化器选择的优化策略也不同。对于IN、=ANY类子查询,优化器会选择如下策略:

  • semijoin
  • Materialization
  • exists

对于NOT IN、<>ALL类子查询,优化器会选择如下策略:

  • Materialization
  • exists

对于派生表,优化器会选择如下策略:

  • derived_merge,将派生表合并到外部查询中
  • 将派生表物化为内部临时表

注:对于update或delete,不支持semijoin和Materialization优化。可以将其转化为使用join关联的方式

Semijoin优化

semijoin是MySQL5.6引入的新的优化策略,在此之前都只有exists一种优化策略,我们先来观察exists的性能情况。

先把semijoin和Materialization策略禁用掉

root@employees 15:35:  set session optimizer_switch="semijoin=off,materialization=off";
Query OK, 0 rows affected (0.00 sec)

接下来查看一个子查询操作的执行计划

root@employees 15:35:  explain select emp_no from dept_emp where dept_no in (select dept_no from departments where dept_name = 'Marketing');
+----+--------------------+-------------+------------+-------+-------------------+-----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------------+------------+-------+-------------------+-----------+---------+-------+--------+----------+--------------------------+
| 1 | PRIMARY | dept_emp | NULL | index | NULL | dept_no | 16 | NULL | 331143 | 100.00 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | departments | NULL | const | PRIMARY,dept_name | dept_name | 162 | const | 1 | 100.00 | Using index |
+----+--------------------+-------------+------------+-------+-------------------+-----------+---------+-------+--------+----------+--------------------------+

可以观察到子查询使用了DEPENDENT SUBQUERY,查看WARNING信息可以看到SQL被转化为exists的方式

root@employees 15:39:  explain select emp_no from dept_emp where dept_no in (select dept_no from departments where dept_name = 'Marketing');
+----+--------------------+-------------+------------+-----------------+---------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------------+------------+-----------------+---------------+---------+---------+------+--------+----------+--------------------------+
| 1 | PRIMARY | dept_emp | NULL | index | NULL | dept_no | 16 | NULL | 331143 | 100.00 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | departments | NULL | unique_subquery | PRIMARY | PRIMARY | 16 | func | 1 | 11.11 | Using where |
+----+--------------------+-------------+------------+-----------------+---------------+---------+---------+------+--------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

root@employees 15:39: show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `employees`.`dept_emp`.`emp_no` AS `emp_no` from `employees`.`dept_emp` where <in_optimizer>(`employees`.`dept_emp`.`dept_no`,<exists>(<primary_index_lookup>(<cache>(`employees`.`dept_emp`.`dept_no`) in departments on PRIMARY where ((`employees`.`departments`.`dept_name` = 'Marketing') and (<cache>(`employees`.`dept_emp`.`dept_no`) = `employees`.`departments`.`dept_no`))))) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

查询会从dept_emp表中取出一行dept_no,然后执行子查询,如果匹配成功就放到结果集中,重复执行该操作。通过慢查询日志可以观察到SQL的扫描行数非常大,时间也会比较长。

接起来我们开启semijoin和materialization观察一下

dba@employees 16:17:  explain select emp_no from dept_emp where dept_no in (select dept_no from departments where dept_name = 'Marketing');
+----+-------------+-------------+------------+------+---------------+---------+---------+-------------------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+---------+---------+-------------------------------+-------+----------+-------------+
| 1 | SIMPLE | departments | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9 | 11.11 | Using where |
| 1 | SIMPLE | dept_emp | NULL | ref | dept_no | dept_no | 16 | employees.departments.dept_no | 41392 | 100.00 | Using index |
+----+-------------+-------------+------------+------+---------------+---------+---------+-------------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

dba@employees 16:17: show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `employees`.`dept_emp`.`emp_no` AS `emp_no` from `employees`.`departments` join `employees`.`dept_emp` where ((`employees`.`dept_emp`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`departments`.`dept_name` = 'Marketing')) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

可以看到优化器将SQL改写为Join方式了,只会返回匹配的记录,通过slow log可以看到扫描行数大大减少了,性能也有所提升。

想要用到semijoin,需要满足下列条件:

  • 它必须是一个不包含union操作的单个查询
  • 它不能包含group by,having
  • 它不能包含任何聚合函数
  • 它不能包含order by,limit
  • 它不能出现STRAIGHT_JOIN
  • 外部表和内部表的数量必须小于联接中允许的最大表数量

Materialization优化

在这种模式下会先执行子查询,物化生成一个临时表(通常在内存中)来提升性能,临时表通过主键(hash索引)去重,这样子查询只需要执行一次,而不是对于外层查询的每一行都得执行一遍,接着从临时表中不断取数据到dept_emp中查找,匹配成功则存入结果集中。

对于物化子查询存在下列限制:

  • 内部表达式不能是BLOB类型
  • 内部表达式和外部表达式的类型必须匹配

Tips:subquery-optimization