PLSQL-ROWID分区实现高效DELETE

概述

在大表中做批量更新或删除时往往会很慢，在操作过程中还需要额外去维护索引，无疑会使得其更慢。如果是分区表还能通过加并行或truncate分区来加速操作，但是如果是普通表的话的就不太好去优化了，nologging和parallel也没有明显的效果。

这时我们可以考虑将ROWID将表分割成多个区间，获取到分割区间的开始rowid和结束rowid，再利用between去构造最终的DML语句。由于这些语句互不冲突，可以通过多个SQL窗口去执行从而实现并行操作。

ROWID分区测试

通过下列语句以ROWID将数据分割成多个区间

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65


REM  rowid_ranges should be at least 21
REM  utilize this script help delete large table
REM  if update large table  Why not online redefinition or CTAS
-- This script spits desired number of rowid ranges to be used for any parallel operations.
-- Best to use it for copying a huge table with out of row lob columns in it or CTAS/copy the data over db links.
-- This can also be used to simulate parallel insert/update/delete operations.
-- Maximum number of rowid ranges you can get here is 255.
-- Doesn't work for partitioned tables, but with minor changes it can be adopted easily.
  
-- Doesn't display any output if the total table blocks are less than rowid ranges times 128.
  
-- It can split a table into more ranges than the number of extents
From Saibabu Devabhaktuni  http://sai-oracle.blogspot.com/2006/03/how-to-split-table-into-rowid-ranges.html
  
  
  
set verify off
undefine rowid_ranges
undefine segment_name
undefine owner
set head off
set pages 0
set trimspool on
  
select 'where rowid between ''' ||sys.dbms_rowid.rowid_create(1, d.oid, c.fid1, c.bid1, 0) ||''' and ''' ||sys.dbms_rowid.rowid_create(1, d.oid, c.fid2, c.bid2, 9999) || '''' ||';'
    from (select distinct b.rn,
                                                first_value(a.fid) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) fid1,
                                                last_value(a.fid) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) fid2,
                                                first_value(decode(sign(range2 - range1),
                                                                                      1,
                                                                                      a.bid +
                                                                                      ((b.rn - a.range1) * a.chunks1),
                                                                                      a.bid)) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) bid1,
                                                last_value(decode(sign(range2 - range1),
                                                                                    1,
                                                                                    a.bid +
                                                                                    ((b.rn - a.range1 + 1) * a.chunks1) - 1,
                                                                                    (a.bid + a.blocks - 1))) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) bid2
                    from (select fid,
                                              bid,
                                              blocks,
                                              chunks1,
                                              trunc((sum2 - blocks + 1 - 0.1) / chunks1) range1,
                                              trunc((sum2 - 0.1) / chunks1) range2
                                    from (select /*+ rule */
                                                  relative_fno fid,
                                                  block_id bid,
                                                  blocks,
                                                  sum(blocks) over() sum1,
                                                  trunc((sum(blocks) over()) / &&rowid_ranges) chunks1,
                                                  sum(blocks) over(order by relative_fno, block_id) sum2
                                                    from dba_extents
                                                  where segment_name = upper('&&segment_name')
                                                      and owner = upper('&&owner'))
                                  where sum1 > &&rowid_ranges) a,
                              (select rownum - 1 rn
                                    from dual
                                connect by level <= &&rowid_ranges) b
                  where b.rn between a.range1 and a.range2) c,
              (select max(data_object_id) oid
                    from dba_objects
                  where object_name = upper('&&segment_name')
                      and owner = upper('&&owner')
                      and data_object_id is not null) d
                      /

调用该脚本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


SQL> @rowid_chunk
  
Enter value for rowid_ranges: 24                   ==>这里输入要构造的rowid分区个数
  
Enter value for segment_name: table_name           ==> 输入表名
  
Enter value for owner: user_name                   ==> owner名
  
where rowid between 'AAANJFAAEAAEXlBAAA' and 'AAANJFAAEAAEYjdCcP';
where rowid between 'AAANJFAAEAAEYjeAAA' and 'AAANJFAAEAAEZDdCcP';
where rowid between 'AAANJFAAEAAEZDeAAA' and 'AAANJFAAEAAEZhdCcP';
where rowid between 'AAANJFAAEAAEZheAAA' and 'AAANJFAAEAAEaBdCcP';
where rowid between 'AAANJFAAEAAEaBeAAA' and 'AAANJFAAEAAEahdCcP';
where rowid between 'AAANJFAAEAAEaheAAA' and 'AAANJFAAEAAEa3dCcP';
where rowid between 'AAANJFAAEAAEa3eAAA' and 'AAANJFAAEAAEbfdCcP';

利用脚本输出结果构建DELETE语句，例如要删除ID小于1000000的数据

1
2


SQL> DELETE FROM table_name where rowid between ‘AAANJFAAEAAEXlBAAA’ and ‘AAANJFAAEAAEYjdCcP’ and id<1000000; 
SQL> COMMIT;

方案优化

在上述方法中存在两点不足的地方

该脚本不支持分区表
该脚本是根据段的大小分布均匀的切割指定数目的区间，如果要删除的数据大多集中在部分段中则此时该脚本构建出来的DML意义则不大

改进脚本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


REM   put it in GUI TOOLS! otherwise caused ORA-00933
REM   control commit yourself, avoid ORA-1555
  
select 'and rowid between ''' || ora_rowid || ''' and ''' ||
              lead(ora_rowid, 1) over(order by rn asc) || '''' || ';'
    from (
  
              with cnt as (select count(*) from table_name)             -- 注意更换表名！！
                  select rn, ora_rowid
                      from (select rownum rn, ora_rowid
                                      from (select rowid ora_rowid
                                                      from table_name                       -- 注意更换表名！！
                                                      where id < 1000000
                                                    order by rowid))
                    where rn in (select (rownum - 1) *
                                                            trunc((select * from cnt) / &rowid_ranges) + 1
                                                  from dba_tables
                                                where rownum < &rowid_ranges             --输入分区的数目
                                              union
                                              select * from cnt))

上述脚本也实现了ROWID分区功能，并且支持分区表，还能在分区中定义条件过滤。请将语句放置PL/SQL或Toad下执行，在SQL PLUS中运行可能会碰到ORA-00933。

DBMS_PARALLEL_EXECUTE

Oracle在版本11.2中引入了DBMS_PARALLEL_EXECUTE 的新特性来帮助更新超大表,文档<11.2 New Feature : Using DBMS_PARALLEL_EXECUTE to Update Large Tables in Parallel [ID 1066555.1]>对该特性做了详细的介绍

该DBMS_PARALLEL_EXECUTE新特性的一大亮点就是可以对表上的行数据进行分组为更小的块(chunks),且并行更新这些小的块。

DBMS_PARALLEL_EXECUTE可以以3种方式将数据分块：

CREATE_CHUNKS_BY_NUMBER_COL
CREATE_CHUNKS_BY_ROWID
CREATE_CHUNKS_BY_SQL

其中CREATE_CHUNKS_BY_ROWID就是按照ROWID的方式来分区。

参考链接

利用rowid分块实现非分区表的并行update与delete