I promised in Part 1 that next we would drill into where the cardinality estimate for the non-transformed query comes about, but, never having been someone to refrain from kicking the Cost Based Optimizer when it’s down, I realised i can fool it even better with some skewed data, and a histogram. Note I am testing on 21c but I believe behaviour should reproduce on any version after 12c.
SQL> CREATE TABLE t ( 2 c NUMBER(2, 0) NOT NULL, 3 p VARCHAR2(4000) 4* ); Table T created. SQL> INSERT /*+append*/ INTO t 2 SELECT 3 CASE WHEN ROWNUM <= 100000 THEN mod(ROWNUM, 100) 4 ELSE 99 END c, 5 rpad('X', 4000) padding 6 FROM 7 dual 8 CONNECT BY 9* ROWNUM <= 199000; 199 000 rows inserted. SQL> CREATE INDEX i ON t (c); Index I created. SQL> EXEC dbms_stats.gather_table_stats(null, 't', method_opt=>'for all columns size 100'); PL/SQL procedure successfully completed.
So we have a table with 199,000 rows, of which for each value of c between 0 and 98 there are 1,000 rows in the table, and the for the remaining 100,000 then c equals 99. In addition there is a a histogram with 100 buckets on c, so that the optimizer has perfect knowledge of this data distribution. Check out how this affects the execution plan and cardinality estimates for our query now.
SQL> SELECT /*+gather_plan_statistics*/ DISTINCT trim(p) p FROM t WHERE 2 (c BETWEEN 0 AND 1) OR (c BETWEEN 2 AND 3) OR (c BETWEEN 4 AND 5) OR (c BETWEEN 6 AND 7) OR (c BETWEEN 8 AND 9) 3 OR (c BETWEEN 10 AND 11) OR (c BETWEEN 12 AND 13) OR (c BETWEEN 14 AND 15) OR (c BETWEEN 16 AND 17) OR (c BETWEEN 18 AND 19) 4 OR (c BETWEEN 20 AND 21) OR (c BETWEEN 22 AND 23) OR (c BETWEEN 24 AND 25) OR (c BETWEEN 26 AND 27) OR (c BETWEEN 28 AND 29) 5 OR (c BETWEEN 30 AND 31) OR (c BETWEEN 32 AND 33) OR (c BETWEEN 34 AND 35) OR (c BETWEEN 36 AND 37) OR (c BETWEEN 38 AND 39) 6 OR (c BETWEEN 40 AND 41) OR (c BETWEEN 42 AND 43) OR (c BETWEEN 44 AND 45) OR (c BETWEEN 46 AND 47) OR (c BETWEEN 48 AND 49) 7 OR (c BETWEEN 50 AND 51) OR (c BETWEEN 52 AND 53) OR (c BETWEEN 54 AND 55) OR (c BETWEEN 56 AND 57) OR (c BETWEEN 58 AND 59) 8 OR (c BETWEEN 60 AND 61) OR (c BETWEEN 62 AND 63) OR (c BETWEEN 64 AND 65) OR (c BETWEEN 66 AND 67) OR (c BETWEEN 68 AND 69) 9 OR (c BETWEEN 70 AND 71) OR (c BETWEEN 72 AND 73) OR (c BETWEEN 74 AND 75) OR (c BETWEEN 76 AND 77) OR (c BETWEEN 78 AND 79) 10 OR (c BETWEEN 80 AND 81) OR (c BETWEEN 82 AND 83) OR (c BETWEEN 84 AND 85) OR (c BETWEEN 86 AND 87) OR (c BETWEEN 88 AND 89) 11* OR (c BETWEEN 90 AND 91) OR (c BETWEEN 92 AND 93) OR (c BETWEEN 94 AND 95) OR (c BETWEEN 96 AND 97) OR (c BETWEEN 98 AND 99); P ____ X SQL> SELECT * FROM dbms_xplan.display_cursor(format=>'allstats last -bytes -cost -predicate'); PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________________________ SQL_ID 1hwut074tng4n, child number 0 ------------------------------------- ... ----------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ----------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:06.81 | 199K| 199K| | 1 | HASH UNIQUE | | 1 | 1 | 1 |00:00:06.81 | 199K| 199K| | 2 | VIEW | VW_ORE_1B35BA0F | 1 | 25785 | 199K|00:00:12.57 | 199K| 199K| | 3 | UNION-ALL | | 1 | | 199K|00:00:09.03 | 199K| 199K| | 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 2001 | 2000 |00:00:00.87 | 2005 | 1998 | | 5 | INDEX RANGE SCAN | I | 1 | 2001 | 2000 |00:00:00.02 | 5 | 0 | | 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1980 | 2000 |00:00:01.44 | 2006 | 1998 | | 7 | INDEX RANGE SCAN | I | 1 | 1980 | 2000 |00:00:00.02 | 6 | 0 | | 8 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1940 | 2000 |00:00:01.46 | 2006 | 1998 | | 9 | INDEX RANGE SCAN | I | 1 | 1940 | 2000 |00:00:00.02 | 6 | 0 | | 10 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1883 | 2000 |00:00:01.33 | 2006 | 1998 | | 11 | INDEX RANGE SCAN | I | 1 | 1883 | 2000 |00:00:00.02 | 6 | 0 | | 12 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1810 | 2000 |00:00:01.30 | 2006 | 1998 | | 13 | INDEX RANGE SCAN | I | 1 | 1810 | 2000 |00:00:00.02 | 6 | 0 | | 14 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1722 | 2000 |00:00:01.34 | 2006 | 1998 | | 15 | INDEX RANGE SCAN | I | 1 | 1722 | 2000 |00:00:00.02 | 6 | 0 | | 16 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1624 | 2000 |00:00:01.32 | 2006 | 1998 | | 17 | INDEX RANGE SCAN | I | 1 | 1624 | 2000 |00:00:00.02 | 6 | 0 | | 18 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1516 | 2000 |00:00:01.30 | 2006 | 1998 | | 19 | INDEX RANGE SCAN | I | 1 | 1516 | 2000 |00:00:00.02 | 6 | 0 | | 20 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1403 | 2000 |00:00:01.34 | 2006 | 1998 | | 21 | INDEX RANGE SCAN | I | 1 | 1403 | 2000 |00:00:00.02 | 6 | 0 | | 22 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1286 | 2000 |00:00:01.36 | 2005 | 1998 | | 23 | INDEX RANGE SCAN | I | 1 | 1286 | 2000 |00:00:00.02 | 5 | 0 | | 24 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1169 | 2000 |00:00:01.48 | 2006 | 1998 | | 25 | INDEX RANGE SCAN | I | 1 | 1169 | 2000 |00:00:00.02 | 6 | 0 | | 26 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1053 | 2000 |00:00:01.52 | 2006 | 1998 | | 27 | INDEX RANGE SCAN | I | 1 | 1053 | 2000 |00:00:00.02 | 6 | 0 | | 28 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 940 | 2000 |00:00:01.59 | 2006 | 1998 | | 29 | INDEX RANGE SCAN | I | 1 | 940 | 2000 |00:00:00.02 | 6 | 0 | | 30 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 832 | 2000 |00:00:01.82 | 2006 | 2004 | | 31 | INDEX RANGE SCAN | I | 1 | 832 | 2000 |00:00:00.02 | 6 | 4 | | 32 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 730 | 2000 |00:00:01.49 | 2006 | 2004 | | 33 | INDEX RANGE SCAN | I | 1 | 730 | 2000 |00:00:00.02 | 6 | 4 | | 34 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 635 | 2000 |00:00:01.49 | 2006 | 2004 | | 35 | INDEX RANGE SCAN | I | 1 | 635 | 2000 |00:00:00.02 | 6 | 4 | | 36 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 549 | 2000 |00:00:01.44 | 2006 | 2004 | | 37 | INDEX RANGE SCAN | I | 1 | 549 | 2000 |00:00:00.02 | 6 | 4 | | 38 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 470 | 2000 |00:00:01.44 | 2006 | 2004 | | 39 | INDEX RANGE SCAN | I | 1 | 470 | 2000 |00:00:00.02 | 6 | 4 | | 40 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 399 | 2000 |00:00:01.51 | 2006 | 2004 | | 41 | INDEX RANGE SCAN | I | 1 | 399 | 2000 |00:00:00.02 | 6 | 4 | | 42 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 337 | 2000 |00:00:01.45 | 2005 | 2003 | | 43 | INDEX RANGE SCAN | I | 1 | 337 | 2000 |00:00:00.03 | 5 | 3 | | 44 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 282 | 2000 |00:00:01.45 | 2006 | 2004 | | 45 | INDEX RANGE SCAN | I | 1 | 282 | 2000 |00:00:00.02 | 6 | 4 | | 46 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 235 | 2000 |00:00:01.41 | 2006 | 2004 | | 47 | INDEX RANGE SCAN | I | 1 | 235 | 2000 |00:00:00.02 | 6 | 4 | | 48 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 194 | 2000 |00:00:01.47 | 2006 | 2004 | | 49 | INDEX RANGE SCAN | I | 1 | 194 | 2000 |00:00:00.03 | 6 | 4 | | 50 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 159 | 2000 |00:00:01.50 | 2006 | 2004 | | 51 | INDEX RANGE SCAN | I | 1 | 159 | 2000 |00:00:00.02 | 6 | 4 | | 52 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 129 | 2000 |00:00:01.48 | 2006 | 2004 | | 53 | INDEX RANGE SCAN | I | 1 | 129 | 2000 |00:00:00.02 | 6 | 4 | | 54 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 105 | 2000 |00:00:01.45 | 2006 | 2004 | | 55 | INDEX RANGE SCAN | I | 1 | 105 | 2000 |00:00:00.02 | 6 | 4 | | 56 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 84 | 2000 |00:00:01.48 | 2006 | 2004 | | 57 | INDEX RANGE SCAN | I | 1 | 84 | 2000 |00:00:00.02 | 6 | 4 | | 58 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 67 | 2000 |00:00:01.44 | 2006 | 2004 | | 59 | INDEX RANGE SCAN | I | 1 | 67 | 2000 |00:00:00.03 | 6 | 4 | | 60 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 54 | 2000 |00:00:01.40 | 2006 | 2004 | | 61 | INDEX RANGE SCAN | I | 1 | 54 | 2000 |00:00:00.02 | 6 | 4 | | 62 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 42 | 2000 |00:00:01.52 | 2006 | 2004 | | 63 | INDEX RANGE SCAN | I | 1 | 42 | 2000 |00:00:00.02 | 6 | 4 | | 64 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 33 | 2000 |00:00:01.46 | 2005 | 2003 | | 65 | INDEX RANGE SCAN | I | 1 | 33 | 2000 |00:00:00.02 | 5 | 3 | | 66 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 26 | 2000 |00:00:01.48 | 2006 | 2004 | | 67 | INDEX RANGE SCAN | I | 1 | 26 | 2000 |00:00:00.02 | 6 | 4 | | 68 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 20 | 2000 |00:00:01.42 | 2006 | 2004 | | 69 | INDEX RANGE SCAN | I | 1 | 20 | 2000 |00:00:00.03 | 6 | 4 | | 70 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 16 | 2000 |00:00:01.42 | 2006 | 2004 | | 71 | INDEX RANGE SCAN | I | 1 | 16 | 2000 |00:00:00.02 | 6 | 4 | | 72 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 12 | 2000 |00:00:01.50 | 2006 | 2004 | | 73 | INDEX RANGE SCAN | I | 1 | 12 | 2000 |00:00:00.02 | 6 | 4 | | 74 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 9 | 2000 |00:00:01.53 | 2006 | 2004 | | 75 | INDEX RANGE SCAN | I | 1 | 9 | 2000 |00:00:00.02 | 6 | 4 | | 76 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 7 | 2000 |00:00:01.48 | 2006 | 2004 | | 77 | INDEX RANGE SCAN | I | 1 | 7 | 2000 |00:00:00.02 | 6 | 4 | | 78 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 5 | 2000 |00:00:01.71 | 2006 | 2004 | | 79 | INDEX RANGE SCAN | I | 1 | 5 | 2000 |00:00:00.04 | 6 | 4 | | 80 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 4 | 2000 |00:00:01.59 | 2006 | 2004 | | 81 | INDEX RANGE SCAN | I | 1 | 4 | 2000 |00:00:00.02 | 6 | 4 | | 82 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 3 | 2000 |00:00:01.43 | 2006 | 2004 | | 83 | INDEX RANGE SCAN | I | 1 | 3 | 2000 |00:00:00.03 | 6 | 4 | | 84 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 2 | 2000 |00:00:01.44 | 2006 | 2004 | | 85 | INDEX RANGE SCAN | I | 1 | 2 | 2000 |00:00:00.02 | 6 | 4 | | 86 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 2 | 2000 |00:00:01.42 | 2005 | 2003 | | 87 | INDEX RANGE SCAN | I | 1 | 2 | 2000 |00:00:00.02 | 5 | 3 | | 88 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 2000 |00:00:01.42 | 2006 | 2004 | | 89 | INDEX RANGE SCAN | I | 1 | 1 | 2000 |00:00:00.02 | 6 | 4 | | 90 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 2000 |00:00:01.34 | 2006 | 2004 | | 91 | INDEX RANGE SCAN | I | 1 | 1 | 2000 |00:00:00.05 | 6 | 4 | | 92 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 2000 |00:00:01.50 | 2006 | 2004 | | 93 | INDEX RANGE SCAN | I | 1 | 1 | 2000 |00:00:00.02 | 6 | 4 | | 94 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 2000 |00:00:01.43 | 2006 | 2004 | | 95 | INDEX RANGE SCAN | I | 1 | 1 | 2000 |00:00:00.03 | 6 | 4 | | 96 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 2000 |00:00:01.52 | 2006 | 2004 | | 97 | INDEX RANGE SCAN | I | 1 | 1 | 2000 |00:00:00.03 | 6 | 4 | | 98 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 2000 |00:00:01.51 | 2006 | 2004 | | 99 | INDEX RANGE SCAN | I | 1 | 1 | 2000 |00:00:00.03 | 6 | 4 | | 100 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 2000 |00:00:01.38 | 2006 | 2004 | | 101 | INDEX RANGE SCAN | I | 1 | 1 | 2000 |00:00:00.03 | 6 | 4 | | 102 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 9 | 101K|00:00:03.62 | 101K| 101K| | 103 | INDEX RANGE SCAN | I | 1 | 9 | 101K|00:00:01.25 | 199 | 197 | -----------------------------------------------------------------------------------------------------------------------------
So although all 199,000 rows from the table are actually examined (A-Rows
), the estimate (E-Rows
) was that only 25,785 would be. This discrepancy is starting to become problematic.
Pay particular attention to the last two lines in the execution plan. We can see (A-Rows
) column that 101K rows were retrieved, so can deduce that these lines correspond to the (c BETWEEN 98 and 99)
predicate. The optimizer has estimated that only 9 rows will be retrieved and has done so using the index.
If I execute the query with that single predicate then the cardinality estimate is correct, and the optimizer (sensibly) choses to do a full table scan rather than using the index.
SQL> SELECT /*+gather_plan_statistics*/ DISTINCT trim(p) p FROM t WHERE (c BETWEEN 98 AND 99); P ____ X SQL> SELECT * FROM dbms_xplan.display_cursor(format=>'allstats last -bytes -cost -predicate'); PLAN_TABLE_OUTPUT _________________________________________________________________________________________________ ... ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.74 | 199K| 199K| | 1 | HASH UNIQUE | | 1 | 1 | 1 |00:00:01.74 | 199K| 199K| | 2 | TABLE ACCESS FULL| T | 1 | 101K| 101K|00:00:02.38 | 199K| 199K| ---------------------------------------------------------------------------------------------- 15 rows selected.
In the next article we will return to the simpler case without histograms, and start to examine where the cardinality estimates are calculated.
Pingback: OR Dear Me! When Cost Based OR Expansion Gets it Wrong – Part 3 | jolliffe.hk