March 2010
M T W T F S S
« Apr    
1234567
891011121314
15161718192021
22232425262728
293031  

Automatic Statistics Gathering Job

Automation usually is a good thing, but sometimes it can make thing goes strange.

I came across this situation just now, even I thought maybe it should be an bug in CBO, before I discovered that the cause is the automatic statistics gathering.

Below is the output of this case.

SQL> delete from A a
  2  WHERE exists
  3  (SELECT 1 FROM B b
  4  WHERE a.ID=b.ID)
  5  ;
9888 rows deleted.

SQL> select * from table(dbms_xplan.display_cursor(null,null,’allstats last’));

PLAN_TABLE_OUTPUT
—————————————————————————————————————–
SQL_ID  0ztxv101z6bfu, child number 1
————————————-
delete from A a WHERE exists (SELECT 1 FROM
B b WHERE a.ID=b.ID)

Plan hash value: 106077456

—————————————————————————————————————-
| Id  | Operation           | Name| Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
—————————————————————————————————————-
|   1 |  DELETE             | A   |      1 |        |      0 |00:00:01.47 |   14808 |       |       |          |
|*  2 |   HASH JOIN SEMI    |     |      1 |      1 |   9888 |00:00:01.26 |    2728 |  3178K|  1130K| 5197K (0)|
|   3 |    TABLE ACCESS FULL| A   |      1 |      1 |  76611 |00:00:00.23 |    2712 |       |       |          |
|   4 |    TABLE ACCESS FULL| B   |      1 |  74535 |   3906 |00:00:00.01 |      16 |       |       |          |
—————————————————————————————————————-
Predicate Information (identified by operation id):
—————————————————
   2 - access(”A”.”ID”=”B”.”ID”)
21 rows selected.

Firstly, dynamic sampling is not happened here, this is unusual as I haven’t gather statistics on  these temporary tables before. secondly, the cardinality of A is 1, or to say, it’s empty!

One only explanation is that the statistics gathering job happened to gather statistics at the moment that I truncating the table A. As at that moment, there are more than 10% changes on that table. So when doing query next, the CBO assumes that the table table is almost empty, and use this table as the inner table when joining.

So please keep an eye on the bad side of the automation, the cause of stranger behaviors.

There are one way to disable the automatic statistics gathering job to gather the user tables, use the statement below:

exec DBMS_STATS.SET_PARAM('AUTOSTATS_TARGET','ORACLE');

Why we don’t completely disable automatic statistics gathering job is that it’s necessary for us to gather the system dictionary tables. And automatic statistics gathering job can do it well.

Relationship between DOP and Number of Hash Partitions (2)

Another relationship between DOP and number of hash partitions is that when doing partition wise sort, the number of partitions should be twice greater than DOP.

Similarly, firstly we create the table for testing. then insert data, then gather statistics.

SQL> CREATE TABLE fact_a(
  2  date_key date,
  3  part_col1 NUMBER
  4  )
  5  PARTITION BY HASH (part_col1) PARTITIONS 8
  6  parallel
  7  ;

Table created.

SQL>
SQL> insert into fact_a select null,rownum from dba_objects, dba_objects where rownum<1000000;

999999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user,'FACT_A');

PL/SQL procedure successfully completed.

1. parallelism is default

SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process;

DFO_NUMBER      TQ_ID SERVER_TYPE     PROCESS      NUM_ROWS      BYTES
---------- ---------- --------------- ---------- ---------- ----------
         1          0 Producer        P002           472794    2854042
         1          0 Producer        P003           527205    3188256
         1          0 Consumer        P000           500212    3022499
         1          0 Consumer        P001           499787    3019799
         1          1 Producer        P000           500212    2021748
         1          1 Producer        P001           499787    2020028
         1          1 Consumer        QC             999999    4041776

Execution Plan
----------------------------------------------------------
Plan hash value: 3448586717 

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |  1006K|  4913K|       |   159  (26)| 00:00:02 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |  1006K|  4913K|       |   159  (26)| 00:00:02 |       |       |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY        |          |  1006K|  4913K|    23M|   159  (26)| 00:00:02 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |  1006K|  4913K|       |   126   (6)| 00:00:02 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 |  1006K|  4913K|       |   126   (6)| 00:00:02 |       |       |  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          |  1006K|  4913K|       |   126   (6)| 00:00:02 |     1 |     8 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| FACT_A   |  1006K|  4913K|       |   126   (6)| 00:00:02 |     1 |     8 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------------------- 

On my system, the cpu count is 2, so the default DOP will be 4. But the actual DOP is 2, as the query above indicated, this is governed by the Database Resource Manager, as I have limited the maximum parallelism degree of user peta to be 2.

This is the bad execution plan.

2. parallelism is 2

SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process;

DFO_NUMBER      TQ_ID SERVER_TYPE     PROCESS      NUM_ROWS      BYTES
---------- ---------- --------------- ---------- ---------- ----------
         1          0 Producer        P000           499463    2018712
         1          0 Producer        P001           500536    2023044
         1          0 Consumer        QC             999999    4041756

Execution Plan
----------------------------------------------------------
Plan hash value: 2505098799 

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |  1006K|  4913K|       |   320  (26)| 00:00:04 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |  1006K|  4913K|       |   320  (26)| 00:00:04 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL|          |  1006K|  4913K|       |   320  (26)| 00:00:04 |     1 |     8 |  Q1,00 | PCWC |            |
|   4 |     HASH GROUP BY       |          |  1006K|  4913K|    23M|   320  (26)| 00:00:04 |       |       |  Q1,00 | PCWP |            |
|   5 |      TABLE ACCESS FULL  | FACT_A   |  1006K|  4913K|       |   251   (6)| 00:00:04 |     1 |     8 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------------------- 

This execution plan is what we expected, in which the DOP is less than half of number of hash partitions.

 

3. parallelism is 4

SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process;

DFO_NUMBER      TQ_ID SERVER_TYPE     PROCESS      NUM_ROWS      BYTES
---------- ---------- --------------- ---------- ---------- ----------
         1          0 Producer        P002           502709    3037531
         1          0 Producer        P003           497290    3004747
         1          0 Consumer        P000           500212    3022499
         1          0 Consumer        P001           499787    3019779
         1          1 Producer        P000           500212    2021748
         1          1 Producer        P001           499787    2020028
         1          1 Consumer        QC             999999    4041776

Execution Plan
----------------------------------------------------------
Plan hash value: 3448586717 

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |  1006K|  4913K|       |   159  (26)| 00:00:02 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |  1006K|  4913K|       |   159  (26)| 00:00:02 |       |       |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY        |          |  1006K|  4913K|    23M|   159  (26)| 00:00:02 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |  1006K|  4913K|       |   126   (6)| 00:00:02 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 |  1006K|  4913K|       |   126   (6)| 00:00:02 |       |       |  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          |  1006K|  4913K|       |   126   (6)| 00:00:02 |     1 |     8 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| FACT_A   |  1006K|  4913K|       |   126   (6)| 00:00:02 |     1 |     8 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------------------- 

The output should be the same as default parallelism.

4. parallelism is 8

SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process;

DFO_NUMBER      TQ_ID SERVER_TYPE     PROCESS      NUM_ROWS      BYTES
---------- ---------- --------------- ---------- ---------- ----------
         1          0 Producer        P002           519278    3136430
         1          0 Producer        P003           480721    2905868
         1          0 Consumer        P000           500212    3022499
         1          0 Consumer        P001           499787    3019799
         1          1 Producer        P000           500212    2021748
         1          1 Producer        P001           499787    2020028
         1          1 Consumer        QC             999999    4041776

Execution Plan
----------------------------------------------------------
Plan hash value: 3448586717 

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |  1006K|  4913K|       |    79  (26)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |  1006K|  4913K|       |    79  (26)| 00:00:01 |       |       |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY        |          |  1006K|  4913K|    23M|    79  (26)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |  1006K|  4913K|       |    63   (7)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 |  1006K|  4913K|       |    63   (7)| 00:00:01 |       |       |  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          |  1006K|  4913K|       |    63   (7)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| FACT_A   |  1006K|  4913K|       |    63   (7)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------------------- 

Also a bad execution plan.

Relationship between DOP and Number of Hash Partitions (1)

In 10gR2 or 11gR1, when doing partition wise join, the number of hash partitions should be equal to DOP. Otherwise, the execution plan must become worse.

First, I will create two tables, fact_a and fact_b, both are partitioned by hash, and have hash partitions 4.

SQL> CREATE TABLE fact_a(
  2  date_key date,
  3  part_col1 NUMBER
  4  )
  5  PARTITION BY HASH (part_col1) PARTITIONS 4
  6  parallel
  7  ;
Table created.
SQL> CREATE TABLE fact_b(
  2  date_key date,
  3  part_col1 NUMBER
  4  )
  5  PARTITION BY HASH (part_col1) PARTITIONS 4
  6  parallel
  7  ;
Table created.

Then, insert some data into both tables, and gather statistics.

SQL> insert into fact_a select null,rownum from dba_objects, dba_objects where rownum<1000000;
999999 rows created.
SQL> insert into fact_b select null,rownum from dba_objects, dba_objects where rownum<100000;
99999 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(user,'FACT_A');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(user,'FACT_B');
PL/SQL procedure successfully completed.

Then I will execute the SQL statement below, with different DOPs, using the hint such as “parallel(a 4)”.

select * from
   fact_a a, fact_b b
  where a.part_col1 = b.part_col1;

1. Default DOP

SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process;
DFO_NUMBER      TQ_ID SERVER_TYPE     PROCESS      NUM_ROWS      BYTES
---------- ---------- --------------- ---------- ---------- ----------
         1          0 Producer        P000            50099     798746
         1          0 Producer        P001            49900     795506
         1          0 Consumer        QC              99999    1594252
Execution Plan
----------------------------------------------------------
Plan hash value: 196040303
---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 99999 |   976K|   128  (10)| 00:00:02 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 | 99999 |   976K|   128  (10)| 00:00:02 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL|          | 99999 |   976K|   128  (10)| 00:00:02 |     1 |     4 |  Q1,00 | PCWC |            |
|*  4 |     HASH JOIN           |          | 99999 |   976K|   128  (10)| 00:00:02 |       |       |  Q1,00 | PCWP |            |
|   5 |      TABLE ACCESS FULL  | FACT_B   | 99999 |   488K|    12   (9)| 00:00:01 |     1 |     4 |  Q1,00 | PCWP |            |
|   6 |      TABLE ACCESS FULL  | FACT_A   |   995K|  4860K|   112   (7)| 00:00:02 |     1 |     4 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A"."PART_COL1"="B"."PART_COL1")

On my system, the cpu count is 2, so the default DOP will be 4. But the actual DOP is 2, as the query above indicated, this is governed by the Database Resource Manager, as I have limited the maximum parallelism degree of user peta to be 2.

This execution plan is what we want to get.

2. paralllelism is 2 ( < Number of Partitions)

SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process;
DFO_NUMBER      TQ_ID SERVER_TYPE     PROCESS      NUM_ROWS      BYTES
---------- ---------- --------------- ---------- ---------- ----------
         1          0 Producer        P002           500024    4031706
         1          0 Producer        P003           499975    4031490
         1          0 Consumer        P000           500024    4031706
         1          0 Consumer        P001           499975    4031490
         1          1 Producer        P000            50164     799700
         1          1 Producer        P001            49835     794572
         1          1 Consumer        QC              99999    1594272

Execution Plan
----------------------------------------------------------
Plan hash value: 2613174272
--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          | 99999 |   976K|   256  (10)| 00:00:04 |       |       |        |      |            |
|   1 |  PX COORDINATOR              |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)        | :TQ10001 | 99999 |   976K|   256  (10)| 00:00:04 |       |       |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN                 |          | 99999 |   976K|   256  (10)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX BLOCK ITERATOR        |          | 99999 |   488K|    23   (5)| 00:00:01 |     1 |     4 |  Q1,01 | PCWC |            |
|   5 |      TABLE ACCESS FULL       | FACT_B   | 99999 |   488K|    23   (5)| 00:00:01 |     1 |     4 |  Q1,01 | PCWP |            |
|   6 |     BUFFER SORT              |          |       |       |            |          |       |       |  Q1,01 | PCWC |            |
|   7 |      PX RECEIVE              |          |   995K|  4860K|   224   (6)| 00:00:03 |       |       |  Q1,01 | PCWP |            |
|   8 |       PX SEND BROADCAST LOCAL| :TQ10000 |   995K|  4860K|   224   (6)| 00:00:03 |       |       |  Q1,00 | P->P | BCST LOCAL |
|   9 |        PX BLOCK ITERATOR     |          |   995K|  4860K|   224   (6)| 00:00:03 |     1 |     4 |  Q1,00 | PCWC |            |
|  10 |         TABLE ACCESS FULL    | FACT_A   |   995K|  4860K|   224   (6)| 00:00:03 |     1 |     4 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("A"."PART_COL1"="B"."PART_COL1")

The execution plan begin to change worse.

3. paralllelism is 4 ( = Number of Partitions)

SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process;
DFO_NUMBER      TQ_ID SERVER_TYPE     PROCESS      NUM_ROWS      BYTES
---------- ---------- --------------- ---------- ---------- ----------
         1          0 Producer        P000            50099     798746
         1          0 Producer        P001            49900     795506
         1          0 Consumer        QC              99999    1594252
Execution Plan
----------------------------------------------------------
Plan hash value: 196040303
---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 99999 |   976K|   128  (10)| 00:00:02 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 | 99999 |   976K|   128  (10)| 00:00:02 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL|          | 99999 |   976K|   128  (10)| 00:00:02 |     1 |     4 |  Q1,00 | PCWC |            |
|*  4 |     HASH JOIN           |          | 99999 |   976K|   128  (10)| 00:00:02 |       |       |  Q1,00 | PCWP |            |
|   5 |      TABLE ACCESS FULL  | FACT_B   | 99999 |   488K|    12   (9)| 00:00:01 |     1 |     4 |  Q1,00 | PCWP |            |
|   6 |      TABLE ACCESS FULL  | FACT_A   |   995K|  4860K|   112   (7)| 00:00:02 |     1 |     4 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A"."PART_COL1"="B"."PART_COL1")

The output is the same as the default degree. which is what we expected.

4. paralllelism is 8 ( > Number of Partitions)

SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes FROM v$pq_tqstat ORDER BY dfo_number, tq_id, server_type DESC, process;
DFO_NUMBER      TQ_ID SERVER_TYPE     PROCESS      NUM_ROWS      BYTES
---------- ---------- --------------- ---------- ---------- ----------
         1          0 Producer        P002            50164     399870
         1          0 Producer        P003            49835     397316
         1          0 Consumer        P000            50164     399870
         1          0 Consumer        P001            49835     397316
         1          1 Producer        P000            50164     799700
         1          1 Producer        P001            49835     794572
         1          1 Consumer        QC              99999    1594272
Execution Plan
----------------------------------------------------------
Plan hash value: 378788929
-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          | 99999 |   976K|    64  (10)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR             |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)       | :TQ10001 | 99999 |   976K|    64  (10)| 00:00:01 |       |       |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN                |          | 99999 |   976K|    64  (10)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE              |          | 99999 |   488K|     6  (17)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND BROADCAST LOCAL| :TQ10000 | 99999 |   488K|     6  (17)| 00:00:01 |       |       |  Q1,00 | P->P | BCST LOCAL |
|   6 |       PX BLOCK ITERATOR     |          | 99999 |   488K|     6  (17)| 00:00:01 |     1 |     4 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL    | FACT_B   | 99999 |   488K|     6  (17)| 00:00:01 |     1 |     4 |  Q1,00 | PCWP |            |
|   8 |     PX BLOCK ITERATOR       |          |   995K|  4860K|    56   (6)| 00:00:01 |     1 |     4 |  Q1,01 | PCWC |            |
|   9 |      TABLE ACCESS FULL      | FACT_A   |   995K|  4860K|    56   (6)| 00:00:01 |     1 |     4 |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("A"."PART_COL1"="B"."PART_COL1")

Also, the execution plan change worse.

How Oracle implement ANSI left outer join syntax

At least, there will be two ways to handle left outer join, one is the traditional way to handle the case such as t1.x = t2.x (+), and the new way we saw in last article to handle the new ANSI syntax. So we will begin with such two execution plans.

cbo@ORCL> select * from t1 left outer join t2 on (t1.x = t2.x);

Execution Plan
----------------------------------------------------------
Plan hash value: 1823443478

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     6 |    78 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN OUTER   |      |     6 |    78 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |     5 |    30 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T2   |     8 |    56 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."X"="T2"."X"(+))

cbo@ORCL> select * from t1 left outer join t2 on (t1.x = t2.x and t1.c = 11);

Execution Plan
----------------------------------------------------------
Plan hash value: 2394896828

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     5 |   225 |    12   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER |      |     5 |   225 |    12   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T1   |     5 |    30 |     2   (0)| 00:00:01 |
|   3 |   VIEW              |      |     1 |    39 |     2   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T2   |     1 |     7 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T1"."X"="T2"."X" AND "T1"."C"=11)

The output of the first sample should be like the output of

select * from t1,t2 where t1.x = t2.x(+);

For the second sample, we have no a simple corresponding  (+) expression. But from the execution plan, we can guess one possibility, the outline is an left outer join, composite of one table and one view, the view come from t2, and have predicates from t1.c:

select *
  from t1,
       (select t2.*
          from t1, t2
         where t1.x = t2.x
           and c = 11) t2
 where t1.x = t2.x(+);

and corresponding execution plan:

Execution Plan
----------------------------------------------------------
Plan hash value: 1247508653

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     5 |   225 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN OUTER     |      |     5 |   225 |     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | T1   |     5 |    30 |     2   (0)| 00:00:01 |
|   3 |   VIEW               |      |     1 |    39 |     5  (20)| 00:00:01 |
|*  4 |    HASH JOIN         |      |     1 |    13 |     5  (20)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| T1   |     1 |     6 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |     8 |    56 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."X"="T2"."X"(+))
   4 - access("T1"."X"="T2"."X")
   5 - filter("C"=13)

This is complicated than the sample output, what Oracle really does to this type of left outer join can be investigated through the 10053 trace, below is the transformed SQL.

SELECT T1.C                    C,
       T1.X                    X,
       T1.Y                    Y,
       lv.D_0 D,
       lv.X_1 X,
       lv.Y_2 Y
  FROM CBO.T1 T1,
       LATERAL((SELECT T2.D D_0, T2.X X_1, T2.Y Y_2
                 FROM CBO.T2 T2
                WHERE T1.X = T2.X
                  AND T1.C = 11))(+) lv

A lateral view is an inline view that contains correlation referring to other tables that precede it in the FROM clause. 

Like the way I show above, we can analyze the statement below, and translate them into the  (+) way.

select t1.*,t2.*
from t1 left outer join t2
on (t1.x=t2.x and (t2.d>23 or t2.d<23));

Another different issue is translating the statement below:

select t1.*,t2.* from t1 left outer join t2 on (t1.x=t2.x or t1.y = t2.y);

Oracle still use the Lateral View to generate the execution plan, but I failed to find an suitable (+) way to solve this problem, what I found is another way:

select *
  from t1, t2
 where t1.x = t2.x
    or t1.y = t2.y
union all
select t1.*, null, null, null
  from t1
 where not exists (select 1
          from t2
         where t2.x = t1.x
            or t1.y = t2.y);

This method is similar to the issue about “full outer join”.

SELECT T1.c, T2.d
FROM T1 FULL OUTER JOIN T2
ON T1.x = T2.y;

will be translated to

SELECT T1.c, T2.d
FROM T1, T2
WHERE T1.x = T2.y (+)
UNION ALL
SELECT NULL, T2.d
FROM T2
WHERE NOT EXISTS
(SELECT 1 FROM T1 WHERE T1.x = T2.y);

The general rule is that degrade left outer join to equi-join, and degrade full outer join to left outer join.

Reference:

The articles below would be useful to understand this topic:

OuterJoins in Oracle from http://optimizermagic.blogspot.com/2007/12/outerjoins-in-oracle.html

ANSI Outer Joins And Lateral Views from http://structureddata.org/2008/02/18/ansi-outer-joins-and-lateral-views/

What’s in a Condition? from http://www.oreillynet.com/pub/a/network/2002/10/01/whatsinacondition.html

Limitation of Oracle syntax for left outer join (+)

Oracle guys usually use Oracle native left outer join syntax to do left outer join, such as

cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+);

         C          X          Y          D          X          Y
---------- ---------- ---------- ---------- ---------- ----------
        11          1         51         21          1         51
        13          3         53         23          3         40
        15          5         55         25          5         54
        14          4         54
        12          2         52

An more advanced way is:

cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+) and t2.y(+)>50;

         C          X          Y          D          X          Y
---------- ---------- ---------- ---------- ---------- ----------
        11          1         51         21          1         51
        15          5         55         25          5         54
        14          4         54
        13          3         53
        12          2         52

Be note, this is different from:

cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+) and t2.y>50;

         C          X          Y          D          X          Y
---------- ---------- ---------- ---------- ---------- ----------
        11          1         51         21          1         51
        15          5         55         25          5         54

The t2.y(+) > 50, as a join condition, is applied at join time, but t2.y>50 is applied after the result set is return from joining. Also note that, as t2 > 50 make sure that the rows in t2 is not null, so the SQL statement will become an  normal equi-join. This is clear in the execution plans:

cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+) and t2.y(+)>50;

Execution Plan
----------------------------------------------------------
Plan hash value: 1823443478

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     5 |    65 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN OUTER   |      |     5 |    65 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |     5 |    30 |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     7 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."X"="T2"."X"(+))
   3 - filter("T2"."Y"(+)>50)

cbo@ORCL> select * from t1,t2 where t1.x = t2.x(+) and t2.y>50;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN         |      |     1 |    13 |     5  (20)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |     7 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   |     5 |    30 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."X"="T2"."X")
   2 - filter("T2"."Y">50)

 

But this method have its own limitations. The SQL Reference list such limitations:

  1. You cannot specify the (+) operator in a query block that also contains FROM clause join syntax.
  2. The (+) operator can appear only in the WHERE clause or, in the context of left-correlation (when specifying the TABLE clause) in the FROM clause, and can be applied only to a column of a table or view.
  3. If A and B are joined by multiple join conditions, then you must use the (+) operator in all of these conditions. If you do not, then Oracle Database will return only the rows resulting from a simple join, but without a warning or error to advise you that you do not have the results of an outer join.
  4. The (+) operator does not produce an outer join if you specify one table in the outer query and the other table in an inner query.
  5. You cannot use the (+) operator to outer-join a table to itself, although self joins are valid. For example, the following statement is not valid:
    – The following statement is not valid:
    SELECT employee_id, manager_id
       FROM employees
       WHERE employees.manager_id(+) = employees.employee_id;
    However, the following self join is valid:SELECT e1.employee_id, e1.manager_id, e2.employee_id
       FROM employees e1, employees e2
       WHERE e1.manager_id(+) = e2.employee_id
       ORDER BY e1.employee_id, e1.manager_id, e2.employee_id;
  6. The (+) operator can be applied only to a column, not to an arbitrary expression. However, an arbitrary expression can contain one or more columns marked with the (+) operator.
  7. A WHERE condition containing the (+) operator cannot be combined with another condition using the OR logical operator.
  8. A WHERE condition cannot use the IN comparison condition to compare a column marked with the (+) operator with an expression.

 

It’s a little confused to read these statements, for the limitation I highlighted, there are two variations:

1. I can’t add OR operand predicates on the right table at join time, such the one below expressed in ANSI-SQL.

select t1.c,t2.d from t1 left outer join t2 on (t1.x = t2.x and (t2.y>23 or t2.y<23) );

2. I can’t combine OR operands as left outer join condition. such the one below expressed in ANSI-SQL.

select t1.c,t2.d from t1 left outer join t2 on (t1.x = t2.x or t1.y = t2.y);

 

Another limitation is that, I can’t add predicate on the left table at join time, such as the one below expressed in ANSI-SQL.

select * from t1 left outer join t2 on (t1.x = t2.x and t1.c = 11);

Clearly, Oracle recommend use ANSI-SQL style left outer join syntax, just as the examples above.

But another question comes, how Oracle implement ANSI-SQL left outer join syntax, and how to rewrite these unsupported statements in native/complex ways?

Interesting things about index cardinality and cost calculation

For the following statement, 9i, 10g  and 11g output the different cardinality and cost.


create table t1
nologging
as
select
  trunc(dbms_random.value(0,25))  n1,
  rpad('x',40)      ind_pad,
  trunc(dbms_random.value(0,20))  n2,
  lpad(rownum,10,'0')   small_vc,
  rpad('x',200)     padding
from
  all_objects
where
  rownum  <= 10000
;
update t1 set n1 = n2 where n2 in (1,3,5,7,11,13,15,17,19);
create index t1_i1 on t1(n1, ind_pad, n2)
nologging
pctfree 91
;

-- gather statistics

TABLE_NAME               BLOCKS   NUM_ROWS
-------------------- ---------- ----------
T1                          371      10000

1 row selected.

  NUM_ROWS DISTINCT_KEYS     BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
---------- ------------- ---------- ----------- ----------------- ----------------------- -----------------------
     10000           284          2        1111              7830                       3                      27

1 row selected.

COLUMN_NAME           NUM_NULLS NUM_DISTINCT    DENSITY LOW_VALUE                HIGH_VALUE
-------------------- ---------- ------------ ---------- ------------------------ ------------------------
IND_PAD                       0            1          1 782020202020202020202020 782020202020202020202020
                                                        202020202020202020202020 202020202020202020202020
                                                        2020202020202020         2020202020202020

N1                            0           25        .04 80                       C119
N2                            0           20        .05 80                       C114

 

set autot trace explain

select small_vc from t1 where n1 = 2 and ind_pad = rpad('x',40) and n2 = 4 ;

9i:

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=21 Card=20 Bytes=1160)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=21 Card=20 Bytes=1160)
   2    1     INDEX (RANGE SCAN) OF 'T1_I1' (NON-UNIQUE) (Cost=5 Card=20)

10gR2:

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    20 |  1160 |    34 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    20 |  1160 |    34 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |    35 |       |     6 |
---------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=2 AND "IND_PAD"='x
                  ' AND "N2"=4)

11g:

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    35 |  2030 |    34 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    35 |  2030 |    34 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |    35 |       |     6 |
---------------------------------------------------------------------

 

For 9i:

cost: 5 = blevel + LEAF_BLOCKS* (1/20)*(1/25)

21 - 5 = clustering_factor * (1/20)*(1/25)

Cardinality: 20 = 10000 * (1/20)*(1/25)

For 10gR2:

cost: 6 = blevel + LEAF_BLOCKS * (1/DISTINCT_KEYS)

34 - 6 = clustering_factor *(1/DISTINCT_KEYS)

Cardinality: 35 = 10000 * (1/DISTINCT_KEYS)

20 = 10000 * (1/20)*(1/25)

For 11g:

cost: 6 = blevel + LEAF_BLOCKS * (1/DISTINCT_KEYS)

34 - 6 = clustering_factor *(1/DISTINCT_KEYS)

Cardinality: 35 = 10000 * (1/DISTINCT_KEYS)

So, in 9i, the distinct_keys of user_indexes view is not used when calculate the cardinality or cost for index access. In 10gR2, the cost and  cardinality for index line is calculate using distinct_keys, so is the cost for table access cost. In 11G, the cardinality for table access is also calculated with distinct_keys.

Actually, the cost and cardinality are becoming more and more accurate, when the index columns have some dependence with each other.

Selectivity and Cardinality and their Formulas

Two important and related concepts inside CBO is selectivity and cardinality. The cardinality means that how many rows should be returned by CBO after evaluating the predicates. And this values is always equal to (number of input rows)* selectivity. So we can guess that selectivity means that how many percents of the input rows will be returned. The selectivity plan an crucial part in choosing the join order and optimal index.

Recall the index access path cost calculation, selectivity also play as a factor to the number of IO blocks. Below we will focus on selectivity calculation.

For the simplest predicate column = constant, such as month_no  = 12, the selectivity is simple:

selectivity = 1/num_distinct; (when histogram non-available)
or
selectivity = density; (when histogram available)

both of num_distinct and density comes from user_tab_columns/user_tab_col_statistics view. And we can get the “number of input rows” from the num_rows of user_tables view, so we can get the candinality as num_rows * selectivity.

When we have null values in the column, then we must multiply this selectivity with (num_rows - num_nulls)/num_rows. As below

selectivity = (num_rows - num_nulls)/(num_distinct*num_rows); (when histogram non-available)
or
selectivity = density * (num_rows - num_nulls)/num_rows; (when histogram available)

When we use the in-list predicate, such as month_no in (2,3,4). The correct behavior should be to sum up all the possibilities of each member.

What about the values outside the valid scope? such as month_on in (13, 14)?  What Oracle does is to assume that the further away you get from the known low/high value, the less chance you will get the data back. Oracle uses a straight-line decay to predict the variation, decaying to zero when you exceed the range by the difference between low and high.

What about not in predicates? Maybe you could guess that it should be (1 - selectivity(in predicate)), actually, the answer is NO. As Oracle transform the (C not in(A,B)) into (C <>A and C<>B), whose selectivity should be (1 - selectivity(A))*(1- selectivity(B)). There are a minor difference between (1 - selectivity(in predicate)) and (1 - selectivity(A))*(1- selectivity(B)). But Oracle seems to remain this in-consistence in all versions, including 11g. Let’s consider the in predicate again, actually Oracle transform this predicate (C in (A,B)) into (C = A or C = B), whose selectivity should be (selectivity(A) + selectivity(B) - selectivity(A) * selectivity(B). Although we know this value is wrong, as A and B in independent variable. But this wrong value is just the Oracle 8i counts, Oracle 9i and forwards corrects this mistake.

Let’s come to the range predicate, such as month_no between 2 and 5.

selectivity = "required range" divided by "total available range" + n/num_distinct;

(n is the number of closed ends)

And then, we come to the multiple predicates, such as A = B or A = C, the following formulas applying.

selectivity(predicate1 AND predicate2) = selectivity(predicate1) * selectivity(predicate2).
selectivity(predicate1 OR predicate2) = selectivity(predicate1) + selectivity(predicate2) - selectivity(predicate1 AND predicate2)
selectivity(NOT predicate1) = 1 – selectivity(predicate1)       //except for bind variable problems

Of course, these formula comes from Probability theory, as these formula indicates, the predicate1 and predicate2 should be  independent, otherwise, the result will be erroneous. Below are some erroneous examples:

month_no > = 8 or month_no < 8 will not cover all the month_no.
month_no in (8,9) will not cover 2/12 in Oracle 8i.
month_no not in (8,9) will not cover 10/12 in Oracle 8i onwards.

Cost for Access Path and Joins

SQL tuning is the most important part of performance tuning. Cost based optimizer, which is used to evaluate the cost for the SQL statement and determine which execution plan is suitable for that specified statement, plays a key role in SQL tuning.

So, what is Cost? From the Oracle Document, cost is defined as below:


Cost = (
#SRds * sreadtim +
#MRds * mreadtim +
#CPUCycles / cpuspeed
) / sreadtim
= #SRds + #MRds * mreadtim/sreadtim + #CPUCycles/(cpuspeed*sreadtim)
where
#SRDs - number of single block reads
#MRDs - number of multi block reads
#CPUCycles - number of CPU Cycles
sreadtim - single block read time
mreadtim - multi block read time
cpuspeed - CPU cycles per second

So, if we don’t count the CPU costing, which is #CPUCycles/(cpuspeed*#SRds), we get a conclusion that the cost is the sum of the number of single block reads and the number of multiple block reads, factored by  mreadtime/sreadtime. This is the model what Oracle 8i does. In Oracle 9i and forward, the CPU cost is accounted and the CPU cost part is add to this cost formula.

For full table scan, the cost will be dependant on the total number of blocks in that table and the number of blocks each read can retrieve, or to say, the data_file_multiblock_read_count parameter. The cost for full table scan be can expressed as

full_table_scan_cost = total_number_of_blocks / adjusted_data_file_multiblock_count

Here, adjusted_data_file_multiblock_count is just an adjustment to data_file_multiblock_read_count, and the value can be obtained by repeated experiments with different total_number_of_blocks and the resulted full_table_scan_cost.

And for the index access path, the cost is

btree_index_access_cost =
blevel +
ceiling(leaf_blocks * effective index selectivity) +
ceiling(clustering_factor * effective table selectivity)

Here, the concepts of selectivity, clustering factor, will be explained in the later articles. one thing to highlight is that the cost is usually based the unit of “BLOCK”.

Also, there are cost for join operations, such as nested loops, hash joins, sorting and merge joins.

nested_loop_cost = Cost of acquiring data from first table +
Cardinality of result from first table * Cost of single visit to second table
hash_join_cost =
(probe passes + 1) * round(Size of large data set/adjusted_data_file_multiblock_count)
+ round(Size of large data set/adjusted_data_file_multiblock_count) 

 

The information of this article mainly comes from the great book Jonathan Lewis wrote, “Cost-Based Oracle Fundamental”, thanks to Jonathan.

XQuery vs SQL/XML

XQuery 1.0 is the W3C language designed for querying XML data. It is similar to SQL in many ways, but just as SQL is designed for querying structured, relational data, XQuery is designed especially for querying semistructured, XML data from a variety of data sources. You can use XQuery to query XML data wherever it is found, whether it is stored in database tables, available through Web Services, or otherwise created on the fly. In addition to querying XML data, XQuery can be used to construct XML data. In this regard, XQuery can serve as an alternative or a complement to both XSLT and the other SQL/XML publishing functions, such as XMLElement.

Everything in XQuery is an "expression", one most important expression is the FLWOR expression, composed of the following, in order, from which FLWOR takes its name: for, let, where , order by, return.

SQL functions XMLQuery and XMLTable are defined by the SQL/XML standard as a general interface between the SQL and XQuery languages. As is the case for the other SQL/XML functions, XMLQuery and XMLTable let you take advantage of the power and flexibility of both SQL and XML. Using these functions, you can construct XML data using relational data, query relational data as if it were XML, and construct relational data from XML data.

XMLQuery is the logical evolution of Oracle’s extract() operator, and XMLTable is the logical evolution of Oracle’s table(xmlsequence(extract(…)))

Below is a simple comparison between XML/SQL and XQuery.

create table test (KEY number, XML_INFO xmltype);
insert into test
   (KEY, XML_INFO)
 values
   (1,
    '<?xml version="1.0" encoding="UTF-8"?>
    <lists>
      <list name="String">
                <display-name >test</display-name>
     </list>
   </lists>');

--XQuery version
SELECT test.KEY AS KEY, v.name, v.display
  FROM test, XMLTABLE('
  $xml_info/lists/list'
PASSING test.XML_INFO AS "xml_info"
COLUMNS name VARCHAR2(100) PATH '@name',
display VARCHAR2(100) PATH 'display-name')  v;

--SQL/XML version
select test.key,
extractValue(value(v), '/list/@name'),
extractValue(value(v), '/list/display-name')
from test,
table(XMLSequence(extract(test.xml_info,'/resource-lists/list'))) v;

SQL/XML and XPath Rewrite

SQL/XML provides a series of functions to integrate XML functionality to SQL processing,below is the list of such functions:

APPENDCHILDXML
DELETEXML
DEPTH
EXTRACT (XML)
EXISTSNODE
EXTRACTVALUE
INSERTCHILDXML
INSERTXMLBEFORE
PATH
SYS_DBURIGEN
SYS_XMLAGG
SYS_XMLGEN
UPDATEXML
XMLAGG
XMLCDATA
XMLCOLATTVAL
XMLCOMMENT
XMLCONCAT
XMLFOREST
XMLPARSE
XMLPI
XMLQUERY
XMLROOT
XMLSEQUENCE
XMLSERIALIZE
XMLTABLE
XMLTRANSFORM

Also, XMLType is important in Oracle XML DB, which I have discussed in the last post.

XMLType is a system-defined opaque type for handling XML data. It has predefined member functions on it to extract XML nodes and fragments.

You can create columns of XMLType and insert XML documents into it. You can also generate XML documents as XMLType instances dynamically using the SYS_XMLGEN and SYS_XMLAGG SQL functions.

Table 198-1 summarizes functions and procedures of the XMLType.

Table 198-1 XMLTYPE Subprograms

Method Description

CREATENONSCHEMABASEDXML

Creates a non schema based XML from the input schema based instance.

CREATESCHEMABASEDXML

Creates a schema based XMLType instance from the non-schema based instance using the input schema URL.

CREATEXML

Static function for creating and returning an XMLType instance.

EXISTSNODE

Takes a XMLType instance and a XPath and returns 1 or 0 indicating if applying the XPath returns a non-empty set of nodes.

EXTRACT

Takes a XMLType instance and an XPath, applies the XPath expression and returns the results as an XMLType.

GETBLOBVAL

Returns the value of the XMLType instance as a BLOB

GETCLOBVAL

Returns the value of the XMLType instance as a CLOB.

GETNAMESPACE

Returns the namespace for the top level element in a schema based document.

GETNUMBERVAL

Returns the value of the XMLType instance as a NUMBER. This is only valid if the input XMLType instance contains a simple text node and is convertible to a number.

GETROOTELEMENT

Returns the root element of the input instance. Returns NULL if the instance is a fragment

GETSCHEMAURL

Returns the XML schema URL if the input is an XML Schema based.

GETSTRINGVAL

Returns the value of the XMLType instance as a string.

ISFRAGMENT

Checks if the input XMLType instance is a fragment or not. A fragment is a XML instance, which has more than one root element.

ISSCHEMABASED

Returns 1 or 0 indicating if the input XMLType instance is a schema based one or not.

ISSCHEMAVALID

Checks if the input instance is schema valid according to the given schema URL.

ISSCHEMAVALIDATED

Checks if the instance has been validated against the schema.

SCHEMAVALIDATE

Validates the input instance according to the XML Schema. Raises error if the input instance is non-schema based.

SETSCHEMAVALIDATED

Sets the schema valid flag to avoid costly schema validation.

TOOBJECT

Converts the XMLType instance to an object type.

TRANSFORM

Takes an XMLType instance and an associated stylesheet (which is also an XMLType instance), applies the stylesheet and returns the result as XML.

XMLTYPE

Constructs an instance of the XMLType datatype. The constructor can take in the XML as a CLOB, VARCHAR2 or take in a object type.



The SQL/XML SQL functions and their corresponding XMLType methods allow XPath expressions to be used to search collections of XML documents and to access a subset of the nodes contained within an XML document.

Oracle XML DB provides two ways of evaluating XPath expressions that operate on XMLType columns and tables, depending on the XML storage method used:

  1. Structured-storage XML data: Oracle XML DB attempts to translate the XPath expression in a SQL/XML function into an equivalent SQL query. The SQL query references the object-relational data structures that underpin a schema-based XMLType. This process is referred to as XPath rewrite. It can occur when performing queries and UPDATE operations.
  2. Unstructured-storage XML data: Oracle XML DB evaluates the XPath expression using functional evaluation. Functional evaluation builds a DOM tree for each XML document, and then resolves the XPath programmatically using the methods provided by the DOM API. If the operation involves updating the DOM tree, the entire XML document has to be written back to disc when the operation is completed.

One example of XPath Rewrite is as following:

SELECT OBJECT_VALUE FROM mypurchaseorders p
WHERE extractValue(OBJECT_VALUE, '/PurchaseOrder/Company') = 'Oracle';

will become:

SELECT VALUE(p) FROM mypurchaseorders p WHERE p.xmldata.Company = 'Oracle';