OR Dear Me!  When Cost Based OR  Expansion Gets it Wrong – Part 4

Continuing on from Parts 1, 2 and 3, we are now going to see how the optimizer comes up with the cardinality estimate after adding the index. I’ll be using some values and abbreviations from Part 3 so I suggest you make sure you have read and understood that before tackling this part. I’ll be

SQL> CREATE TABLE t (
  2      c NUMBER(2, 0) NOT NULL,
  3      p VARCHAR2(4000)
  4* );
SQL>
SQL> INSERT /*+append*/ INTO t
  2      SELECT
  3          mod(ROWNUM, 100) c,
  4          rpad('X', 4000)  padding
  5      FROM
  6          dual
  7      CONNECT BY
  8*         ROWNUM <= 100000;
SQL>
SQL> COMMIT;
SQL>
SQL> CREATE INDEX i ON t(c);
SQL>
SQL> EXEC dbms_stats.gather_table_stats(null, 't', method_opt=>'for all columns size 1');
SQL>
SQL> SELECT /*+ gather_plan_statistics */ DISTINCT
  2                       TRIM(p) p
  3                   FROM
  4                       t WHERE
  5                                        (c BETWEEN  2 AND  3) OR (c BETWEEN  4 AND  5) OR (c BETWEEN  6 AND  7) OR (c BETWEEN  8 AND  9)
  6            OR (c BETWEEN 10 AND 11) OR (c BETWEEN 12 AND 13) OR (c BETWEEN 14 AND 15) OR (c BETWEEN 16 AND 17) OR (c BETWEEN 18 AND 19)
  7            OR (c BETWEEN 20 AND 21) OR (c BETWEEN 22 AND 23) OR (c BETWEEN 24 AND 25) OR (c BETWEEN 26 AND 27) OR (c BETWEEN 28 AND 29)
  8            OR (c BETWEEN 30 AND 31) OR (c BETWEEN 32 AND 33) OR (c BETWEEN 34 AND 35) OR (c BETWEEN 36 AND 37) OR (c BETWEEN 38 AND 39)
  9            OR (c BETWEEN 40 AND 41) OR (c BETWEEN 42 AND 43) OR (c BETWEEN 44 AND 45) OR (c BETWEEN 46 AND 47) OR (c BETWEEN 48 AND 49)
 10            OR (c BETWEEN 50 AND 51) OR (c BETWEEN 52 AND 53) OR (c BETWEEN 54 AND 55) OR (c BETWEEN 56 AND 57) OR (c BETWEEN 58 AND 59)
 11            OR (c BETWEEN 60 AND 61) OR (c BETWEEN 62 AND 63) OR (c BETWEEN 64 AND 65) OR (c BETWEEN 66 AND 67) OR (c BETWEEN 68 AND 69)
 12            OR (c BETWEEN 70 AND 71) OR (c BETWEEN 72 AND 73) OR (c BETWEEN 74 AND 75) OR (c BETWEEN 76 AND 77) OR (c BETWEEN 78 AND 79)
 13            OR (c BETWEEN 80 AND 81) OR (c BETWEEN 82 AND 83) OR (c BETWEEN 84 AND 85) OR (c BETWEEN 86 AND 87) OR (c BETWEEN 88 AND 89)
 14*           OR (c BETWEEN 90 AND 91) OR (c BETWEEN 92 AND 93) OR (c BETWEEN 94 AND 95) OR (c BETWEEN 96 AND 97);
   P
____
X
SQL>
SQL> SELECT * FROM dbms_xplan.display_cursor(format=>'allstats last -predicate -cost -bytes');
                                                                                                               PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________________________
SQL_ID  11ucbv2rb2wq5, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ DISTINCT
...
Plan hash value: 31970834
-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |      1 |        |      1 |00:00:05.32 |   96284 |  96193 |
|   1 |  HASH UNIQUE                           |                 |      1 |      1 |      1 |00:00:05.32 |   96284 |  96193 |
|   2 |   VIEW                                 | VW_ORE_1B35BA0F |      1 |  25144 |  96000 |00:00:08.29 |   96284 |  96193 |
|   3 |    UNION-ALL                           |                 |      1 |        |  96000 |00:00:06.52 |   96284 |  96193 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   3010 |   2000 |00:00:00.72 |    2006 |   2006 |
|   5 |      INDEX RANGE SCAN                  | I               |      1 |   3010 |   2000 |00:00:00.02 |       6 |      6 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   2890 |   2000 |00:00:01.49 |    2006 |   2004 |
|   7 |      INDEX RANGE SCAN                  | I               |      1 |   2890 |   2000 |00:00:00.02 |       6 |      4 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   2720 |   2000 |00:00:01.49 |    2006 |   2004 |
|   9 |      INDEX RANGE SCAN                  | I               |      1 |   2720 |   2000 |00:00:00.03 |       6 |      4 |
|  10 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   2512 |   2000 |00:00:01.47 |    2006 |   2004 |
|  11 |      INDEX RANGE SCAN                  | I               |      1 |   2512 |   2000 |00:00:00.02 |       6 |      4 |
|  12 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   2276 |   2000 |00:00:01.50 |    2006 |   2004 |
|  13 |      INDEX RANGE SCAN                  | I               |      1 |   2276 |   2000 |00:00:00.02 |       6 |      4 |
|  14 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   2026 |   2000 |00:00:01.55 |    2006 |   2004 |
|  15 |      INDEX RANGE SCAN                  | I               |      1 |   2026 |   2000 |00:00:00.03 |       6 |      4 |
|  16 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   1771 |   2000 |00:00:01.53 |    2006 |   2004 |
|  17 |      INDEX RANGE SCAN                  | I               |      1 |   1771 |   2000 |00:00:00.02 |       6 |      4 |
|  18 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   1523 |   2000 |00:00:01.63 |    2006 |   2004 |
|  19 |      INDEX RANGE SCAN                  | I               |      1 |   1523 |   2000 |00:00:00.02 |       6 |      4 |
|  20 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   1288 |   2000 |00:00:01.62 |    2005 |   2003 |
|  21 |      INDEX RANGE SCAN                  | I               |      1 |   1288 |   2000 |00:00:00.02 |       5 |      3 |
|  22 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   1073 |   2000 |00:00:01.67 |    2006 |   2004 |
|  23 |      INDEX RANGE SCAN                  | I               |      1 |   1073 |   2000 |00:00:00.02 |       6 |      4 |
|  24 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    880 |   2000 |00:00:01.55 |    2006 |   2004 |
|  25 |      INDEX RANGE SCAN                  | I               |      1 |    880 |   2000 |00:00:00.02 |       6 |      4 |
|  26 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    712 |   2000 |00:00:01.60 |    2006 |   2005 |
|  27 |      INDEX RANGE SCAN                  | I               |      1 |    712 |   2000 |00:00:00.02 |       6 |      5 |
|  28 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    569 |   2000 |00:00:01.54 |    2006 |   2004 |
|  29 |      INDEX RANGE SCAN                  | I               |      1 |    569 |   2000 |00:00:00.02 |       6 |      4 |
|  30 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    449 |   2000 |00:00:01.59 |    2006 |   2004 |
|  31 |      INDEX RANGE SCAN                  | I               |      1 |    449 |   2000 |00:00:00.04 |       6 |      4 |
|  32 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    350 |   2000 |00:00:01.54 |    2006 |   2004 |
|  33 |      INDEX RANGE SCAN                  | I               |      1 |    350 |   2000 |00:00:00.04 |       6 |      4 |
|  34 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    270 |   2000 |00:00:01.61 |    2006 |   2004 |
|  35 |      INDEX RANGE SCAN                  | I               |      1 |    270 |   2000 |00:00:00.05 |       6 |      4 |
|  36 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    206 |   2000 |00:00:01.59 |    2006 |   2004 |
|  37 |      INDEX RANGE SCAN                  | I               |      1 |    206 |   2000 |00:00:00.02 |       6 |      4 |
|  38 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    156 |   2000 |00:00:01.51 |    2006 |   2004 |
|  39 |      INDEX RANGE SCAN                  | I               |      1 |    156 |   2000 |00:00:00.02 |       6 |      4 |
|  40 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |    117 |   2000 |00:00:01.57 |    2005 |   2003 |
|  41 |      INDEX RANGE SCAN                  | I               |      1 |    117 |   2000 |00:00:00.02 |       5 |      3 |
|  42 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |     88 |   2000 |00:00:01.62 |    2006 |   2004 |
|  43 |      INDEX RANGE SCAN                  | I               |      1 |     88 |   2000 |00:00:00.02 |       6 |      4 |
|  44 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |     65 |   2000 |00:00:01.69 |    2006 |   2004 |
|  45 |      INDEX RANGE SCAN                  | I               |      1 |     65 |   2000 |00:00:00.02 |       6 |      4 |
|  46 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |     48 |   2000 |00:00:01.54 |    2006 |   2004 |
|  47 |      INDEX RANGE SCAN                  | I               |      1 |     48 |   2000 |00:00:00.03 |       6 |      4 |
|  48 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |     36 |   2000 |00:00:01.63 |    2006 |   2004 |
|  49 |      INDEX RANGE SCAN                  | I               |      1 |     36 |   2000 |00:00:00.03 |       6 |      4 |
|  50 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |     26 |   2000 |00:00:01.60 |    2006 |   2004 |
|  51 |      INDEX RANGE SCAN                  | I               |      1 |     26 |   2000 |00:00:00.02 |       6 |      4 |
|  52 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |     19 |   2000 |00:00:01.62 |    2006 |   2004 |
|  53 |      INDEX RANGE SCAN                  | I               |      1 |     19 |   2000 |00:00:00.02 |       6 |      4 |
|  54 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |     14 |   2000 |00:00:01.63 |    2006 |   2005 |
|  55 |      INDEX RANGE SCAN                  | I               |      1 |     14 |   2000 |00:00:00.02 |       6 |      5 |
|  56 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |     10 |   2000 |00:00:01.55 |    2006 |   2004 |
|  57 |      INDEX RANGE SCAN                  | I               |      1 |     10 |   2000 |00:00:00.03 |       6 |      4 |
|  58 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      8 |   2000 |00:00:01.67 |    2006 |   2004 |
|  59 |      INDEX RANGE SCAN                  | I               |      1 |      8 |   2000 |00:00:00.04 |       6 |      4 |
|  60 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      6 |   2000 |00:00:01.66 |    2006 |   2004 |
|  61 |      INDEX RANGE SCAN                  | I               |      1 |      6 |   2000 |00:00:00.03 |       6 |      4 |
|  62 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      4 |   2000 |00:00:01.61 |    2005 |   2003 |
|  63 |      INDEX RANGE SCAN                  | I               |      1 |      4 |   2000 |00:00:00.03 |       5 |      3 |
|  64 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      3 |   2000 |00:00:01.60 |    2006 |   2004 |
|  65 |      INDEX RANGE SCAN                  | I               |      1 |      3 |   2000 |00:00:00.03 |       6 |      4 |
|  66 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      2 |   2000 |00:00:01.80 |    2006 |   2004 |
|  67 |      INDEX RANGE SCAN                  | I               |      1 |      2 |   2000 |00:00:00.03 |       6 |      4 |
|  68 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      2 |   2000 |00:00:01.55 |    2006 |   2004 |
|  69 |      INDEX RANGE SCAN                  | I               |      1 |      2 |   2000 |00:00:00.04 |       6 |      4 |
|  70 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.56 |    2006 |   2004 |
|  71 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.04 |       6 |      4 |
|  72 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.54 |    2006 |   2004 |
|  73 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.03 |       6 |      4 |
|  74 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.54 |    2006 |   2004 |
|  75 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.02 |       6 |      4 |
|  76 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.56 |    2006 |   2004 |
|  77 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.02 |       6 |      4 |
|  78 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.57 |    2006 |   2004 |
|  79 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.02 |       6 |      4 |
|  80 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.53 |    2006 |   2004 |
|  81 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.02 |       6 |      4 |
|  82 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.59 |    2006 |   2005 |
|  83 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.03 |       6 |      5 |
|  84 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.52 |    2005 |   2003 |
|  85 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.02 |       5 |      3 |
|  86 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.57 |    2006 |   2004 |
|  87 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.02 |       6 |      4 |
|  88 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.59 |    2006 |   2004 |
|  89 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.03 |       6 |      4 |
|  90 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.57 |    2006 |   2004 |
|  91 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.03 |       6 |      4 |
|  92 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.55 |    2006 |   2004 |
|  93 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.02 |       6 |      4 |
|  94 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.56 |    2006 |   2004 |
|  95 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.02 |       6 |      4 |
|  96 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.57 |    2006 |   2004 |
|  97 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.04 |       6 |      4 |
|  98 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.53 |    2006 |   2004 |
|  99 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.03 |       6 |      4 |
-----------------------------------------------------------------------------------------------------------------------------L

Lines 4 and 5 in the execution plan simply correspond to B1 so we use the cardinality previously calculated, 3010.

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------
...
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   3010 |   2000 |00:00:00.72 |    2006 |   2006 |
|   5 |      INDEX RANGE SCAN                  | I               |      1 |   3010 |   2000 |00:00:00.02 |       6 |      6 |
...

When it is doing the estimate for the OR Expansion for B2 the optimizer has to exclude rows that might be matched by B1. Whilst we know there is no overlap, the optimizer cannot make this assumption. Let’s call Xn as shorthand for the predicate NOT Bn or internally LNNVL(Bn). Note Column C is declared as NOT NULL I’m skipping over handling NULLs here as they complicate things slightly. As we have previously calculated SEL(Bn) as 0.03010101 for all values of n, you would hope that calculation of SEL(Xn) would be as follows:

SEL(NOT Bn)
=1-SEL(Bn)
=1-0.03010101...
=0.96989898...

Unfortunately is is more complicated, and goes like this:

  SEL(NOT(C >= 2n))
= 1 - SEL(C >= 2n) 
= 1-((99-2n)/(HIGH_VALUE-LOW_VALUE)+1/NUM_DISTINCT)
= 2n/99-1/100
  SEL(NOT(C <= 2n+1))
= 1-SEL(C <= 2n+1)
= 1-((((2n+1)-LOW_VALUE)/(HIGH_VALUE-LOW_VALUE)+1/NUM_DISTINCT))
= 99/100-(2n+1)/99
  SEL(NOT Bn)
= SEL(LNNVL(C >= 2n) OR LNNVL(C <= 2n+1))
= SEL(LNNVL(C >= 2n)) + SEL(LNNVL(C <= 2n+1)) - SEL(LNNVL(C >= 2n)) * SEL(LNNVL(C <= 2n+1))
= SEL(NOT(C>=2n)) + SEL(NOT(C<=2n+1))- SEL(NOT(C>=2n)) * SEL(NOT(C<=2n+1))
= (2n/99-1/100) + 99/100-(2n+1)/99 - (2n/99-1/100) * (99/100-(2n+1)/99)
= (4n2-196n)/9801 + 969901/990000

So back to estimate of rows for B2 AND NOT B1 we get

  SEL(NOT B1)
= (4/9801 - 196/9801 + 969901/990000)
= 0.96010814...
  SEL(B2 AND NOT B1) 
= SEL(B2) * SEL(NOT B1) =
= 0.03010101 * 0.96010814
= 0.02890022...
  CARD(B2 AND NOT B1) = 
= ROUND(NUM_ROWS * SEL(B2 AND NOT B1)) = 
= ROUND(100000*0.02890022)=
= 2890

Fortunately this calculation matches the estimate displayed by the execution plan

|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |   2890 |   2000 |00:00:01.49 |    2006 |   2004 |
|   7 |      INDEX RANGE SCAN                  | I               |      1 |   2890 |   2000 |00:00:00.02 |       6 |      4 |

So as the optimizer knows there is no overlap optimizer simply does the following

  CARD(B2 OR B1)
= CARD(B2) + CARD(B2 AND NOT B1)
= 3010 + 2890
= 5900

Note the difference from Part 3 where the calculation resulted in CARD(B2 OR B1)=5930.

Calculation of the next part of the OR expansion continues thus:

  SEL(NOT B2)
= 4*2*2/9801 - 196*2/9801 + 969901/990000
= 0.94133454...

  SEL(B3 AND NOT B2 AND NOT B1)
=SEL(B3) * SEL(NOT B2) * SEL(NOT B1)
=0.03010101*0.94133454*0.96010814
=0.02720477...
  CARD(B3 AND NOT B2 AND NOT B1)
= ROUND(SEL(B3 AND NOT B2 AND NOT B1) * NUM_ROWS)
= ROUND(0.02720477*100000)
= 2720
  CARD(B3 OR B2 OR B1)
= CARD(B3 AND NOT B2 AND NOT B1) + CARD(B2 OR B1)
= 2720+5900
= 8620

Note this corresponds with what the execution plan is telling us, and again compare this with the calculations without OR expansion.

I used a spreadsheet to calculate the remaindering values, and visualize them, and it’s quite striking that after tracking the actual rows reasonably closely with only a few predicates, the estimate starts to tail off to around 25,000 rows. Towards the end, the optimizer is estimating that only one row will be returned from each branch, although actually 2,000 rows will match.

|  98 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |      1 |      1 |   2000 |00:00:01.53 |    2006 |   2004 |
|  99 |      INDEX RANGE SCAN                  | I               |      1 |      1 |   2000 |00:00:00.03 |       6 |      4 |

Leave a Comment

Your email address will not be published. Required fields are marked *