I came across this situation just now, even I thought maybe it should be an bug in CBO, before I discovered that the cause is the automatic statistics gathering.
Below is the output of this case.
SQL> delete from A a
2 WHERE exists
3 (SELECT 1 FROM B b
4 WHERE a.ID=b.ID)
5 ;
9888 rows deleted.
SQL> select * from table(dbms_xplan.display_cursor(null,null,’allstats last’));
PLAN_TABLE_OUTPUT
—————————————————————————————————————–
SQL_ID 0ztxv101z6bfu, child number 1
————————————-
delete from A a WHERE exists (SELECT 1 FROM
B b WHERE a.ID=b.ID)
Plan hash value: 106077456
—————————————————————————————————————-
| Id | Operation | Name| Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
—————————————————————————————————————-
| 1 | DELETE | A | 1 | | 0 |00:00:01.47 | 14808 | | | |
|* 2 | HASH JOIN SEMI | | 1 | 1 | 9888 |00:00:01.26 | 2728 | 3178K| 1130K| 5197K (0)|
| 3 | TABLE ACCESS FULL| A | 1 | 1 | 76611 |00:00:00.23 | 2712 | | | |
| 4 | TABLE ACCESS FULL| B | 1 | 74535 | 3906 |00:00:00.01 | 16 | | | |
—————————————————————————————————————-
Predicate Information (identified by operation id):
—————————————————
2 - access(”A”.”ID”=”B”.”ID”)
21 rows selected.
Firstly, dynamic sampling is not happened here, this is unusual as I haven’t gather statistics on these temporary tables before. secondly, the cardinality of A is 1, or to say, it’s empty!
One only explanation is that the statistics gathering job happened to gather statistics at the moment that I truncating the table A. As at that moment, there are more than 10% changes on that table. So when doing query next, the CBO assumes that the table table is almost empty, and use this table as the inner table when joining.
So please keep an eye on the bad side of the automation, the cause of stranger behaviors.
There are one way to disable the automatic statistics gathering job to gather the user tables, use the statement below:
exec DBMS_STATS.SET_PARAM('AUTOSTATS_TARGET','ORACLE');
Why we don’t completely disable automatic statistics gathering job is that it’s necessary for us to gather the system dictionary tables. And automatic statistics gathering job can do it well.
]]>Similarly, firstly we create the table for testing. then insert data, then gather statistics.
SQL> CREATE TABLE fact_a( 2 date_key date, 3 part_col1 NUMBER 4 ) 5 PARTITION BY HASH (part_col1) PARTITIONS 8 6 parallel 7 ; Table created. SQL> SQL> insert into fact_a select null,rownum from dba_objects, dba_objects where rownum<1000000; 999999 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(user,'FACT_A'); PL/SQL procedure successfully completed.
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process; DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES ---------- ---------- --------------- ---------- ---------- ---------- 1 0 Producer P002 472794 2854042 1 0 Producer P003 527205 3188256 1 0 Consumer P000 500212 3022499 1 0 Consumer P001 499787 3019799 1 1 Producer P000 500212 2021748 1 1 Producer P001 499787 2020028 1 1 Consumer QC 999999 4041776 Execution Plan ---------------------------------------------------------- Plan hash value: 3448586717 ----------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1006K| 4913K| | 159 (26)| 00:00:02 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 1006K| 4913K| | 159 (26)| 00:00:02 | | | Q1,01 | P->S | QC (RAND) | | 3 | HASH GROUP BY | | 1006K| 4913K| 23M| 159 (26)| 00:00:02 | | | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 1006K| 4913K| | 126 (6)| 00:00:02 | | | Q1,01 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 1006K| 4913K| | 126 (6)| 00:00:02 | | | Q1,00 | P->P | HASH | | 6 | PX BLOCK ITERATOR | | 1006K| 4913K| | 126 (6)| 00:00:02 | 1 | 8 | Q1,00 | PCWC | | | 7 | TABLE ACCESS FULL| FACT_A | 1006K| 4913K| | 126 (6)| 00:00:02 | 1 | 8 | Q1,00 | PCWP | | -----------------------------------------------------------------------------------------------------------------------------------------
On my system, the cpu count is 2, so the default DOP will be 4. But the actual DOP is 2, as the query above indicated, this is governed by the Database Resource Manager, as I have limited the maximum parallelism degree of user peta to be 2.
This is the bad execution plan.
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process; DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES ---------- ---------- --------------- ---------- ---------- ---------- 1 0 Producer P000 499463 2018712 1 0 Producer P001 500536 2023044 1 0 Consumer QC 999999 4041756 Execution Plan ---------------------------------------------------------- Plan hash value: 2505098799 ----------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1006K| 4913K| | 320 (26)| 00:00:04 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 1006K| 4913K| | 320 (26)| 00:00:04 | | | Q1,00 | P->S | QC (RAND) | | 3 | PX PARTITION HASH ALL| | 1006K| 4913K| | 320 (26)| 00:00:04 | 1 | 8 | Q1,00 | PCWC | | | 4 | HASH GROUP BY | | 1006K| 4913K| 23M| 320 (26)| 00:00:04 | | | Q1,00 | PCWP | | | 5 | TABLE ACCESS FULL | FACT_A | 1006K| 4913K| | 251 (6)| 00:00:04 | 1 | 8 | Q1,00 | PCWP | | -----------------------------------------------------------------------------------------------------------------------------------------
This execution plan is what we expected, in which the DOP is less than half of number of hash partitions.
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process; DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES ---------- ---------- --------------- ---------- ---------- ---------- 1 0 Producer P002 502709 3037531 1 0 Producer P003 497290 3004747 1 0 Consumer P000 500212 3022499 1 0 Consumer P001 499787 3019779 1 1 Producer P000 500212 2021748 1 1 Producer P001 499787 2020028 1 1 Consumer QC 999999 4041776 Execution Plan ---------------------------------------------------------- Plan hash value: 3448586717 ----------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1006K| 4913K| | 159 (26)| 00:00:02 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 1006K| 4913K| | 159 (26)| 00:00:02 | | | Q1,01 | P->S | QC (RAND) | | 3 | HASH GROUP BY | | 1006K| 4913K| 23M| 159 (26)| 00:00:02 | | | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 1006K| 4913K| | 126 (6)| 00:00:02 | | | Q1,01 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 1006K| 4913K| | 126 (6)| 00:00:02 | | | Q1,00 | P->P | HASH | | 6 | PX BLOCK ITERATOR | | 1006K| 4913K| | 126 (6)| 00:00:02 | 1 | 8 | Q1,00 | PCWC | | | 7 | TABLE ACCESS FULL| FACT_A | 1006K| 4913K| | 126 (6)| 00:00:02 | 1 | 8 | Q1,00 | PCWP | | -----------------------------------------------------------------------------------------------------------------------------------------
The output should be the same as default parallelism.
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process; DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES ---------- ---------- --------------- ---------- ---------- ---------- 1 0 Producer P002 519278 3136430 1 0 Producer P003 480721 2905868 1 0 Consumer P000 500212 3022499 1 0 Consumer P001 499787 3019799 1 1 Producer P000 500212 2021748 1 1 Producer P001 499787 2020028 1 1 Consumer QC 999999 4041776 Execution Plan ---------------------------------------------------------- Plan hash value: 3448586717 ----------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1006K| 4913K| | 79 (26)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 1006K| 4913K| | 79 (26)| 00:00:01 | | | Q1,01 | P->S | QC (RAND) | | 3 | HASH GROUP BY | | 1006K| 4913K| 23M| 79 (26)| 00:00:01 | | | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 1006K| 4913K| | 63 (7)| 00:00:01 | | | Q1,01 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 1006K| 4913K| | 63 (7)| 00:00:01 | | | Q1,00 | P->P | HASH | | 6 | PX BLOCK ITERATOR | | 1006K| 4913K| | 63 (7)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | | 7 | TABLE ACCESS FULL| FACT_A | 1006K| 4913K| | 63 (7)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | -----------------------------------------------------------------------------------------------------------------------------------------
Also a bad execution plan.
]]>First, I will create two tables, fact_a and fact_b, both are partitioned by hash, and have hash partitions 4.
SQL> CREATE TABLE fact_a( 2 date_key date, 3 part_col1 NUMBER 4 ) 5 PARTITION BY HASH (part_col1) PARTITIONS 4 6 parallel 7 ; Table created. SQL> CREATE TABLE fact_b( 2 date_key date, 3 part_col1 NUMBER 4 ) 5 PARTITION BY HASH (part_col1) PARTITIONS 4 6 parallel 7 ; Table created.
Then, insert some data into both tables, and gather statistics.
SQL> insert into fact_a select null,rownum from dba_objects, dba_objects where rownum<1000000; 999999 rows created. SQL> insert into fact_b select null,rownum from dba_objects, dba_objects where rownum<100000; 99999 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(user,'FACT_A'); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(user,'FACT_B'); PL/SQL procedure successfully completed.
Then I will execute the SQL statement below, with different DOPs, using the hint such as “parallel(a 4)”.
select * from fact_a a, fact_b b where a.part_col1 = b.part_col1;
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process; DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES ---------- ---------- --------------- ---------- ---------- ---------- 1 0 Producer P000 50099 798746 1 0 Producer P001 49900 795506 1 0 Consumer QC 99999 1594252 Execution Plan ---------------------------------------------------------- Plan hash value: 196040303 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99999 | 976K| 128 (10)| 00:00:02 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 99999 | 976K| 128 (10)| 00:00:02 | | | Q1,00 | P->S | QC (RAND) | | 3 | PX PARTITION HASH ALL| | 99999 | 976K| 128 (10)| 00:00:02 | 1 | 4 | Q1,00 | PCWC | | |* 4 | HASH JOIN | | 99999 | 976K| 128 (10)| 00:00:02 | | | Q1,00 | PCWP | | | 5 | TABLE ACCESS FULL | FACT_B | 99999 | 488K| 12 (9)| 00:00:01 | 1 | 4 | Q1,00 | PCWP | | | 6 | TABLE ACCESS FULL | FACT_A | 995K| 4860K| 112 (7)| 00:00:02 | 1 | 4 | Q1,00 | PCWP | | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("A"."PART_COL1"="B"."PART_COL1")
On my system, the cpu count is 2, so the default DOP will be 4. But the actual DOP is 2, as the query above indicated, this is governed by the Database Resource Manager, as I have limited the maximum parallelism degree of user peta to be 2.
This execution plan is what we want to get.
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process; DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES ---------- ---------- --------------- ---------- ---------- ---------- 1 0 Producer P002 500024 4031706 1 0 Producer P003 499975 4031490 1 0 Consumer P000 500024 4031706 1 0 Consumer P001 499975 4031490 1 1 Producer P000 50164 799700 1 1 Producer P001 49835 794572 1 1 Consumer QC 99999 1594272 Execution Plan ---------------------------------------------------------- Plan hash value: 2613174272 -------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99999 | 976K| 256 (10)| 00:00:04 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 99999 | 976K| 256 (10)| 00:00:04 | | | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | 99999 | 976K| 256 (10)| 00:00:04 | | | Q1,01 | PCWP | | | 4 | PX BLOCK ITERATOR | | 99999 | 488K| 23 (5)| 00:00:01 | 1 | 4 | Q1,01 | PCWC | | | 5 | TABLE ACCESS FULL | FACT_B | 99999 | 488K| 23 (5)| 00:00:01 | 1 | 4 | Q1,01 | PCWP | | | 6 | BUFFER SORT | | | | | | | | Q1,01 | PCWC | | | 7 | PX RECEIVE | | 995K| 4860K| 224 (6)| 00:00:03 | | | Q1,01 | PCWP | | | 8 | PX SEND BROADCAST LOCAL| :TQ10000 | 995K| 4860K| 224 (6)| 00:00:03 | | | Q1,00 | P->P | BCST LOCAL | | 9 | PX BLOCK ITERATOR | | 995K| 4860K| 224 (6)| 00:00:03 | 1 | 4 | Q1,00 | PCWC | | | 10 | TABLE ACCESS FULL | FACT_A | 995K| 4860K| 224 (6)| 00:00:03 | 1 | 4 | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("A"."PART_COL1"="B"."PART_COL1")
The execution plan begin to change worse.
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process; DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES ---------- ---------- --------------- ---------- ---------- ---------- 1 0 Producer P000 50099 798746 1 0 Producer P001 49900 795506 1 0 Consumer QC 99999 1594252 Execution Plan ---------------------------------------------------------- Plan hash value: 196040303 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99999 | 976K| 128 (10)| 00:00:02 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 99999 | 976K| 128 (10)| 00:00:02 | | | Q1,00 | P->S | QC (RAND) | | 3 | PX PARTITION HASH ALL| | 99999 | 976K| 128 (10)| 00:00:02 | 1 | 4 | Q1,00 | PCWC | | |* 4 | HASH JOIN | | 99999 | 976K| 128 (10)| 00:00:02 | | | Q1,00 | PCWP | | | 5 | TABLE ACCESS FULL | FACT_B | 99999 | 488K| 12 (9)| 00:00:01 | 1 | 4 | Q1,00 | PCWP | | | 6 | TABLE ACCESS FULL | FACT_A | 995K| 4860K| 112 (7)| 00:00:02 | 1 | 4 | Q1,00 | PCWP | | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("A"."PART_COL1"="B"."PART_COL1")
The output is the same as the default degree. which is what we expected.
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process; DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES ---------- ---------- --------------- ---------- ---------- ---------- 1 0 Producer P002 50164 399870 1 0 Producer P003 49835 397316 1 0 Consumer P000 50164 399870 1 0 Consumer P001 49835 397316 1 1 Producer P000 50164 799700 1 1 Producer P001 49835 794572 1 1 Consumer QC 99999 1594272 Execution Plan ---------------------------------------------------------- Plan hash value: 378788929 ------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99999 | 976K| 64 (10)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 99999 | 976K| 64 (10)| 00:00:01 | | | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | 99999 | 976K| 64 (10)| 00:00:01 | | | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 99999 | 488K| 6 (17)| 00:00:01 | | | Q1,01 | PCWP | | | 5 | PX SEND BROADCAST LOCAL| :TQ10000 | 99999 | 488K| 6 (17)| 00:00:01 | | | Q1,00 | P->P | BCST LOCAL | | 6 | PX BLOCK ITERATOR | | 99999 | 488K| 6 (17)| 00:00:01 | 1 | 4 | Q1,00 | PCWC | | | 7 | TABLE ACCESS FULL | FACT_B | 99999 | 488K| 6 (17)| 00:00:01 | 1 | 4 | Q1,00 | PCWP | | | 8 | PX BLOCK ITERATOR | | 995K| 4860K| 56 (6)| 00:00:01 | 1 | 4 | Q1,01 | PCWC | | | 9 | TABLE ACCESS FULL | FACT_A | 995K| 4860K| 56 (6)| 00:00:01 | 1 | 4 | Q1,01 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("A"."PART_COL1"="B"."PART_COL1")
Also, the execution plan change worse.
]]>cbo@ORCL> select * from t1 left outer join t2 on (t1.x = t2.x); Execution Plan ---------------------------------------------------------- Plan hash value: 1823443478 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 78 | 5 (20)| 00:00:01 | |* 1 | HASH JOIN OUTER | | 6 | 78 | 5 (20)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 5 | 30 | 2 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL| T2 | 8 | 56 | 2 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."X"="T2"."X"(+)) cbo@ORCL> select * from t1 left outer join t2 on (t1.x = t2.x and t1.c = 11); Execution Plan ---------------------------------------------------------- Plan hash value: 2394896828 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 225 | 12 (0)| 00:00:01 | | 1 | NESTED LOOPS OUTER | | 5 | 225 | 12 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T1 | 5 | 30 | 2 (0)| 00:00:01 | | 3 | VIEW | | 1 | 39 | 2 (0)| 00:00:01 | |* 4 | TABLE ACCESS FULL| T2 | 1 | 7 | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("T1"."X"="T2"."X" AND "T1"."C"=11)
The output of the first sample should be like the output of
select * from t1,t2 where t1.x = t2.x(+);
For the second sample, we have no a simple corresponding (+) expression. But from the execution plan, we can guess one possibility, the outline is an left outer join, composite of one table and one view, the view come from t2, and have predicates from t1.c:
select * from t1, (select t2.* from t1, t2 where t1.x = t2.x and c = 11) t2 where t1.x = t2.x(+);
and corresponding execution plan:
Execution Plan ---------------------------------------------------------- Plan hash value: 1247508653 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 225 | 7 (15)| 00:00:01 | |* 1 | HASH JOIN OUTER | | 5 | 225 | 7 (15)| 00:00:01 | | 2 | TABLE ACCESS FULL | T1 | 5 | 30 | 2 (0)| 00:00:01 | | 3 | VIEW | | 1 | 39 | 5 (20)| 00:00:01 | |* 4 | HASH JOIN | | 1 | 13 | 5 (20)| 00:00:01 | |* 5 | TABLE ACCESS FULL| T1 | 1 | 6 | 2 (0)| 00:00:01 | | 6 | TABLE ACCESS FULL| T2 | 8 | 56 | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."X"="T2"."X"(+)) 4 - access("T1"."X"="T2"."X") 5 - filter("C"=13)
This is complicated than the sample output, what Oracle really does to this type of left outer join can be investigated through the 10053 trace, below is the transformed SQL.
SELECT T1.C C, T1.X X, T1.Y Y, lv.D_0 D, lv.X_1 X, lv.Y_2 Y FROM CBO.T1 T1, LATERAL((SELECT T2.D D_0, T2.X X_1, T2.Y Y_2 FROM CBO.T2 T2 WHERE T1.X = T2.X AND T1.C = 11))(+) lv
Like the way I show above, we can analyze the statement below, and translate them into the (+) way.
select t1.*,t2.* from t1 left outer join t2 on (t1.x=t2.x and (t2.d>23 or t2.d<23));
Another different issue is translating the statement below:
select t1.*,t2.* from t1 left outer join t2 on (t1.x=t2.x or t1.y = t2.y);
Oracle still use the Lateral View to generate the execution plan, but I failed to find an suitable (+) way to solve this problem, what I found is another way:
select * from t1, t2 where t1.x = t2.x or t1.y = t2.y union all select t1.*, null, null, null from t1 where not exists (select 1 from t2 where t2.x = t1.x or t1.y = t2.y);
This method is similar to the issue about “full outer join”.
SELECT T1.c, T2.d FROM T1 FULL OUTER JOIN T2 ON T1.x = T2.y;
will be translated to
SELECT T1.c, T2.d FROM T1, T2 WHERE T1.x = T2.y (+) UNION ALL SELECT NULL, T2.d FROM T2 WHERE NOT EXISTS (SELECT 1 FROM T1 WHERE T1.x = T2.y);
The general rule is that degrade left outer join to equi-join, and degrade full outer join to left outer join.
Reference:
The articles below would be useful to understand this topic:
OuterJoins in Oracle from http://optimizermagic.blogspot.com/2007/12/outerjoins-in-oracle.html
ANSI Outer Joins And Lateral Views from http://structureddata.org/2008/02/18/ansi-outer-joins-and-lateral-views/
What’s in a Condition? from http://www.oreillynet.com/pub/a/network/2002/10/01/whatsinacondition.html
]]>cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+); C X Y D X Y ---------- ---------- ---------- ---------- ---------- ---------- 11 1 51 21 1 51 13 3 53 23 3 40 15 5 55 25 5 54 14 4 54 12 2 52
An more advanced way is:
cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+) and t2.y(+)>50; C X Y D X Y ---------- ---------- ---------- ---------- ---------- ---------- 11 1 51 21 1 51 15 5 55 25 5 54 14 4 54 13 3 53 12 2 52
Be note, this is different from:
cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+) and t2.y>50; C X Y D X Y ---------- ---------- ---------- ---------- ---------- ---------- 11 1 51 21 1 51 15 5 55 25 5 54
The t2.y(+) > 50, as a join condition, is applied at join time, but t2.y>50 is applied after the result set is return from joining. Also note that, as t2 > 50 make sure that the rows in t2 is not null, so the SQL statement will become an normal equi-join. This is clear in the execution plans:
cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+) and t2.y(+)>50; Execution Plan ---------------------------------------------------------- Plan hash value: 1823443478 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 65 | 5 (20)| 00:00:01 | |* 1 | HASH JOIN OUTER | | 5 | 65 | 5 (20)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 5 | 30 | 2 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL| T2 | 1 | 7 | 2 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."X"="T2"."X"(+)) 3 - filter("T2"."Y"(+)>50) cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+) and t2.y>50; Execution Plan ---------------------------------------------------------- Plan hash value: 2959412835 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 5 (20)| 00:00:01 | |* 1 | HASH JOIN | | 1 | 13 | 5 (20)| 00:00:01 | |* 2 | TABLE ACCESS FULL| T2 | 1 | 7 | 2 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL| T1 | 5 | 30 | 2 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."X"="T2"."X") 2 - filter("T2"."Y">50)
But this method have its own limitations. The SQL Reference list such limitations:
FROM clause join syntax.WHERE clause or, in the context of left-correlation (when specifying the TABLE clause) in the FROM clause, and can be applied only to a column of a table or view.WHERE condition containing the (+) operator cannot be combined with another condition using the OR logical operator.WHERE condition cannot use the IN comparison condition to compare a column marked with the (+) operator with an expression.
It’s a little confused to read these statements, for the limitation I highlighted, there are two variations:
1. I can’t add OR operand predicates on the right table at join time, such the one below expressed in ANSI-SQL.
select t1.c,t2.d from t1 left outer join t2 on (t1.x = t2.x and (t2.y>23 or t2.y<23) );
2. I can’t combine OR operands as left outer join condition. such the one below expressed in ANSI-SQL.
select t1.c,t2.d from t1 left outer join t2 on (t1.x = t2.x or t1.y = t2.y);
Another limitation is that, I can’t add predicate on the left table at join time, such as the one below expressed in ANSI-SQL.
select * from t1 left outer join t2 on (t1.x = t2.x and t1.c = 11);
Clearly, Oracle recommend use ANSI-SQL style left outer join syntax, just as the examples above.
But another question comes, how Oracle implement ANSI-SQL left outer join syntax, and how to rewrite these unsupported statements in native/complex ways?
]]>For the following statement, 9i, 10g and 11g output the different cardinality and cost.
create table t1 nologging as select trunc(dbms_random.value(0,25)) n1, rpad('x',40) ind_pad, trunc(dbms_random.value(0,20)) n2, lpad(rownum,10,'0') small_vc, rpad('x',200) padding from all_objects where rownum <= 10000 ; update t1 set n1 = n2 where n2 in (1,3,5,7,11,13,15,17,19); create index t1_i1 on t1(n1, ind_pad, n2) nologging pctfree 91 ; -- gather statistics TABLE_NAME BLOCKS NUM_ROWS -------------------- ---------- ---------- T1 371 10000 1 row selected. NUM_ROWS DISTINCT_KEYS BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY ---------- ------------- ---------- ----------- ----------------- ----------------------- ----------------------- 10000 284 2 1111 7830 3 27 1 row selected. COLUMN_NAME NUM_NULLS NUM_DISTINCT DENSITY LOW_VALUE HIGH_VALUE -------------------- ---------- ------------ ---------- ------------------------ ------------------------ IND_PAD 0 1 1 782020202020202020202020 782020202020202020202020 202020202020202020202020 202020202020202020202020 2020202020202020 2020202020202020 N1 0 25 .04 80 C119 N2 0 20 .05 80 C114 set autot trace explain select small_vc from t1 where n1 = 2 and ind_pad = rpad('x',40) and n2 = 4 ;
9i:
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=21 Card=20 Bytes=1160) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=21 Card=20 Bytes=1160) 2 1 INDEX (RANGE SCAN) OF 'T1_I1' (NON-UNIQUE) (Cost=5 Card=20)
10gR2:
--------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 1160 | 34 | | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 20 | 1160 | 34 | |* 2 | INDEX RANGE SCAN | T1_I1 | 35 | | 6 | --------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("N1"=2 AND "IND_PAD"='x ' AND "N2"=4)
11g:
--------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 35 | 2030 | 34 | | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 35 | 2030 | 34 | |* 2 | INDEX RANGE SCAN | T1_I1 | 35 | | 6 | ---------------------------------------------------------------------
For 9i:
cost: 5 = blevel + LEAF_BLOCKS* (1/20)*(1/25)
21 - 5 = clustering_factor * (1/20)*(1/25)
Cardinality: 20 = 10000 * (1/20)*(1/25)
For 10gR2:
cost: 6 = blevel + LEAF_BLOCKS * (1/DISTINCT_KEYS)
34 - 6 = clustering_factor *(1/DISTINCT_KEYS)
Cardinality: 35 = 10000 * (1/DISTINCT_KEYS)
20 = 10000 * (1/20)*(1/25)
For 11g:
cost: 6 = blevel + LEAF_BLOCKS * (1/DISTINCT_KEYS)
34 - 6 = clustering_factor *(1/DISTINCT_KEYS)
Cardinality: 35 = 10000 * (1/DISTINCT_KEYS)
So, in 9i, the distinct_keys of user_indexes view is not used when calculate the cardinality or cost for index access. In 10gR2, the cost and cardinality for index line is calculate using distinct_keys, so is the cost for table access cost. In 11G, the cardinality for table access is also calculated with distinct_keys.
Actually, the cost and cardinality are becoming more and more accurate, when the index columns have some dependence with each other.
]]>Recall the index access path cost calculation, selectivity also play as a factor to the number of IO blocks. Below we will focus on selectivity calculation.
For the simplest predicate column = constant, such as month_no = 12, the selectivity is simple:
selectivity = 1/num_distinct; (when histogram non-available) or selectivity = density; (when histogram available)
both of num_distinct and density comes from user_tab_columns/user_tab_col_statistics view. And we can get the “number of input rows” from the num_rows of user_tables view, so we can get the candinality as num_rows * selectivity.
When we have null values in the column, then we must multiply this selectivity with (num_rows - num_nulls)/num_rows. As below
selectivity = (num_rows - num_nulls)/(num_distinct*num_rows); (when histogram non-available) or selectivity = density * (num_rows - num_nulls)/num_rows; (when histogram available)
When we use the in-list predicate, such as month_no in (2,3,4). The correct behavior should be to sum up all the possibilities of each member.
What about the values outside the valid scope? such as month_on in (13, 14)? What Oracle does is to assume that the further away you get from the known low/high value, the less chance you will get the data back. Oracle uses a straight-line decay to predict the variation, decaying to zero when you exceed the range by the difference between low and high.
What about not in predicates? Maybe you could guess that it should be (1 - selectivity(in predicate)), actually, the answer is NO. As Oracle transform the (C not in(A,B)) into (C <>A and C<>B), whose selectivity should be (1 - selectivity(A))*(1- selectivity(B)). There are a minor difference between (1 - selectivity(in predicate)) and (1 - selectivity(A))*(1- selectivity(B)). But Oracle seems to remain this in-consistence in all versions, including 11g. Let’s consider the in predicate again, actually Oracle transform this predicate (C in (A,B)) into (C = A or C = B), whose selectivity should be (selectivity(A) + selectivity(B) - selectivity(A) * selectivity(B). Although we know this value is wrong, as A and B in independent variable. But this wrong value is just the Oracle 8i counts, Oracle 9i and forwards corrects this mistake.
Let’s come to the range predicate, such as month_no between 2 and 5.
selectivity = "required range" divided by "total available range" + n/num_distinct; (n is the number of closed ends)
And then, we come to the multiple predicates, such as A = B or A = C, the following formulas applying.
selectivity(predicate1 AND predicate2) = selectivity(predicate1) * selectivity(predicate2).
selectivity(predicate1 OR predicate2) = selectivity(predicate1) + selectivity(predicate2) - selectivity(predicate1 AND predicate2)
selectivity(NOT predicate1) = 1 – selectivity(predicate1) //except for bind variable problems
Of course, these formula comes from Probability theory, as these formula indicates, the predicate1 and predicate2 should be independent, otherwise, the result will be erroneous. Below are some erroneous examples:
month_no > = 8 or month_no < 8 will not cover all the month_no. month_no in (8,9) will not cover 2/12 in Oracle 8i. month_no not in (8,9) will not cover 10/12 in Oracle 8i onwards.
So, what is Cost? From the Oracle Document, cost is defined as below:
Cost = ( #SRds * sreadtim + #MRds * mreadtim + #CPUCycles / cpuspeed ) / sreadtim = #SRds + #MRds * mreadtim/sreadtim + #CPUCycles/(cpuspeed*sreadtim) where #SRDs - number of single block reads #MRDs - number of multi block reads #CPUCycles - number of CPU Cycles sreadtim - single block read time mreadtim - multi block read time cpuspeed - CPU cycles per second
So, if we don’t count the CPU costing, which is #CPUCycles/(cpuspeed*#SRds), we get a conclusion that the cost is the sum of the number of single block reads and the number of multiple block reads, factored by mreadtime/sreadtime. This is the model what Oracle 8i does. In Oracle 9i and forward, the CPU cost is accounted and the CPU cost part is add to this cost formula.
For full table scan, the cost will be dependant on the total number of blocks in that table and the number of blocks each read can retrieve, or to say, the data_file_multiblock_read_count parameter. The cost for full table scan be can expressed as
full_table_scan_cost = total_number_of_blocks / adjusted_data_file_multiblock_count
Here, adjusted_data_file_multiblock_count is just an adjustment to data_file_multiblock_read_count, and the value can be obtained by repeated experiments with different total_number_of_blocks and the resulted full_table_scan_cost.
And for the index access path, the cost is
btree_index_access_cost = blevel + ceiling(leaf_blocks * effective index selectivity) + ceiling(clustering_factor * effective table selectivity)
Here, the concepts of selectivity, clustering factor, will be explained in the later articles. one thing to highlight is that the cost is usually based the unit of “BLOCK”.
Also, there are cost for join operations, such as nested loops, hash joins, sorting and merge joins.
nested_loop_cost = Cost of acquiring data from first table + Cardinality of result from first table * Cost of single visit to second table
hash_join_cost = (probe passes + 1) * round(Size of large data set/adjusted_data_file_multiblock_count) + round(Size of large data set/adjusted_data_file_multiblock_count)
The information of this article mainly comes from the great book Jonathan Lewis wrote, “Cost-Based Oracle Fundamental”, thanks to Jonathan.
]]>Everything in XQuery is an "expression", one most important expression is the FLWOR expression, composed of the following, in order, from which FLWOR takes its name: for, let, where , order by, return.
SQL functions XMLQuery and XMLTable are defined by the SQL/XML standard as a general interface between the SQL and XQuery languages. As is the case for the other SQL/XML functions, XMLQuery and XMLTable let you take advantage of the power and flexibility of both SQL and XML. Using these functions, you can construct XML data using relational data, query relational data as if it were XML, and construct relational data from XML data.
XMLQuery is the logical evolution of Oracle’s extract() operator, and XMLTable is the logical evolution of Oracle’s table(xmlsequence(extract(…)))
Below is a simple comparison between XML/SQL and XQuery.
create table test (KEY number, XML_INFO xmltype); insert into test (KEY, XML_INFO) values (1, '<?xml version="1.0" encoding="UTF-8"?> <lists> <list name="String"> <display-name >test</display-name> </list> </lists>'); --XQuery version SELECT test.KEY AS KEY, v.name, v.display FROM test, XMLTABLE(' $xml_info/lists/list' PASSING test.XML_INFO AS "xml_info" COLUMNS name VARCHAR2(100) PATH '@name', display VARCHAR2(100) PATH 'display-name') v; --SQL/XML version select test.key, extractValue(value(v), '/list/@name'), extractValue(value(v), '/list/display-name') from test, table(XMLSequence(extract(test.xml_info,'/resource-lists/list'))) v;
APPENDCHILDXML
DELETEXML
DEPTH
EXTRACT (XML)
EXISTSNODE
EXTRACTVALUE
INSERTCHILDXML
INSERTXMLBEFORE
PATH
SYS_DBURIGEN
SYS_XMLAGG
SYS_XMLGEN
UPDATEXML
XMLAGG
XMLCDATA
XMLCOLATTVAL
XMLCOMMENT
XMLCONCAT
XMLFOREST
XMLPARSE
XMLPI
XMLQUERY
XMLROOT
XMLSEQUENCE
XMLSERIALIZE
XMLTABLE
XMLTRANSFORM
Also, XMLType is important in Oracle XML DB, which I have discussed in the last post.
XMLType is a system-defined opaque type for handling XML data. It has predefined member functions on it to extract XML nodes and fragments.
You can create columns of XMLType and insert XML documents into it. You can also generate XML documents as XMLType instances dynamically using the SYS_XMLGEN and SYS_XMLAGG SQL functions.
Table 198-1 summarizes functions and procedures of the XMLType.
Table 198-1 XMLTYPE Subprograms
| Method | Description |
|---|---|
|
Creates a non schema based XML from the input schema based instance. |
|
|
Creates a schema based |
|
|
Static function for creating and returning an |
|
|
Takes a |
|
|
Takes a |
|
|
Returns the value of the |
|
|
Returns the value of the |
|
|
Returns the namespace for the top level element in a schema based document. |
|
|
Returns the value of the |
|
|
Returns the root element of the input instance. Returns |
|
|
Returns the XML schema URL if the input is an XML Schema based. |
|
|
Returns the value of the |
|
|
Checks if the input |
|
|
Returns 1 or 0 indicating if the input |
|
|
Checks if the input instance is schema valid according to the given schema URL. |
|
|
Checks if the instance has been validated against the schema. |
|
|
Validates the input instance according to the XML Schema. Raises error if the input instance is non-schema based. |
|
|
Sets the schema valid flag to avoid costly schema validation. |
|
|
Converts the |
|
|
Takes an |
|
|
Constructs an instance of the |
The SQL/XML SQL functions and their corresponding XMLType methods allow XPath expressions to be used to search collections of XML documents and to access a subset of the nodes contained within an XML document.
Oracle XML DB provides two ways of evaluating XPath expressions that operate on XMLType columns and tables, depending on the XML storage method used:
One example of XPath Rewrite is as following:
SELECT OBJECT_VALUE FROM mypurchaseorders p WHERE extractValue(OBJECT_VALUE, '/PurchaseOrder/Company') = 'Oracle';
will become:
SELECT VALUE(p) FROM mypurchaseorders p WHERE p.xmldata.Company = 'Oracle';