AWR Reports: Don’t Trust the Numbers

After implementing a recent change, and I pulled of a quick AWR report to compare performance.  At first glance I was impressed by significant improvement in “log file parallel write” event, which had halved from 2 millisecond to 1 millisecond, but something about the numbers didn’t quite look right.

Before:
before

After:
after

Let’s calculate the average wait times ourselves:
Before:  2,346s/1,482,747=0.00158219844s=1.58 ms.
After:  2,658s/1,785,172=0.00148893215s=1.49 ms.

So AWR rounds to the nearest millisecond, and this has turned an actual 6% (=(1.58-1.49)/1.58*100)  improvement into a 100% improvement.

Note that in Oracle 12.2 AWR handles this much better by reporting wait events in microseconds, one more reason to upgrade!

Please stop (ab)using DUAL!

Update (2018-07-28) Lukas Eder highlighted via twitter that my comparison was invalid dues to some PL/SQL optimizations. I have updated the testcase, there is still significant improvement, just not as much as my previous test

Recently I’ve been noticing a lot of frequently called PL/SQL code that has structure similar to the following:

SELECT TO_CHAR(l_date, 'YYYY-MM-DD') INTO l_string FROM DUAL;

I really don’t know how people got into the habit of doing things this way, isn’t the following much simpler?

l_string := TO_CHAR(l_date, 'YYYY-MM-DD');

Does it really matter? Isn’t it just a coding preference? Let’s benchmark to see the difference:

SQL> declare
  2      l_string VARCHAR2(10);
  3      l_len NUMBER := 0;
  4      l_date DATE := SYSDATE;
  5  begin
  6
  7      for loopy in 1..1000000
  8      loop
  9          SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD') into l_string FROM DUAL;
 10          l_len := l_len + LENGTH(l_string);
 11      end loop;
 12      dbms_output.put_line(l_len);
 13  end;
 14  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:17.171
SQL>

We executed the statement one million times in just over 17 seconds. Not too shabby. However let’s try the other way:

SQL> declare
  2      l_string VARCHAR2(10);
  3      l_len NUMBER := 0;
  4  begin
  5      for loopy in 1..1000000
  6      loop
  7          l_string := TO_CHAR(SYSDATE, 'YYYY-MM-DD');
  8          l_len := l_len + LENGTH(l_string);
  9      end loop;
 10      dbms_output.put_line(l_len);
 11  end;
 12  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.082

Same amount of work, eight times faster. Why? Probably something to do with context switches between PL/SQL and SQL Engines, but don’t stress too much about that, just look at the figures.

Oracle Database CPU cycles are expensive, don’t waste them doing things inefficiently. Please stop abusing the DUAL table.

“WARNING: too many parse errors”: dbms_rai_owner

We’ve just upgraded the database containing our recovery catalog to 12.2. Although rman operations seemed to be completing OK, we noticed the following slightly worrying error message in the alert log:

PDB(3):WARNING: too many parse errors, count=44106 SQL hash=0x2473a808
PDB(3):PARSE ERROR: ospid=398064, error=6550 for statement:
2018-07-22T17:06:22.313262+08:00
PDB(3):BEGIN :owner := dbms_rai_owner; END;
PDB(3):Additional information: hd=0x26ee29a560 phd=0x26ee122418 flg=0x101476 cisid=343 sid=343 ciuid=343 uid=343
2018-07-22T17:06:22.313360+08:00
PDB(3):----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x26afed3ed8        36  package body RCAT.DBMS_RCVVPC.IS_RA
0x26afed3ed8         8  package body RCAT.DBMS_RCVVPC.__pkg_init
0x240a892a20         1  anonymous block
0x23cb60d9d0         4  anonymous block

I checked the source of the package DBMS_RCVPC in the recovery catalog (it is unwrapped PL/SQL):

FUNCTION is_ra
  RETURN BOOLEAN
  IS
    l_owner all_users.username%TYPE;
  BEGIN
    EXECUTE IMMEDIATE 'BEGIN :owner := dbms_rai_owner; END;'
      USING OUT l_owner;
    RETURN (l_owner IS NOT NULL);
  EXCEPTION
    WHEN OTHERS
    THEN RETURN FALSE;
  END is_ra;

I cannot find any object named dbms_rai_owner in the database.
It seems the package is designed to fail with a parse error and that will be caught by the “WHEN OTHERS” handler and return false, and that is what had been happening when this database was at version  12.1.
What has changed is that Oracle now writes details about such parse errors (see Oracle support document 2320935.1 for more details).

Ideally the Oracle developers need to re-write the package logic so it doesn’t rely on throwing parsing exceptions, but until then, I think this message in the alert log can be safely ignored.

Update (2018-07-27): Bug 28223871 – RMAN12C.DBMS_RCVVPC.IS_RA WARNING: TOO MANY PARSE ERRORS, COUNT=4519

Symptom can be fixed by disabling VPD on recovery catalog using dbmsrmanvpc.sql -novpd. Obviously weigh up the pros and cons of this before proceeding.

 

Problems with Kerberos Credentials to 12.2 Databases in Enterprise Manger

Just a quick note in case anyone else hits this issue. We use make extensive use of Kerberos to give us Windows single-sign on. For database connections via Enterprise Manger we use “Database Kerberos Credentials” credential type and have no problems connecting to 11.2 and 12.1 databases (apart from slight pain points that it’s not possible to use them as preferred credentials, and the necessity to update the credential whenever Windows password is updated).

However as we have been migrating databases to 12.2, the credentials have not worked, testing against such a database sometimes give the following error message:

Credentials could not be verified. EXCEPTION_WHILE_CREATING_CONN_FROMSUB
Surprisingly sometimes testing the credential also completes successfully.

Testing through various combinations I’ve discovered that problem is the following line in the [libdefaults] section of the Kerberos configuration file, krb5.conf, on the Enterprise Manger server

forwardable = true

After commenting out this line, Kerberos credentials test successfully against 12.2 databases.  I have no idea what change in 12.2 causes this setting to cause a problem, if anyone has any ideas welcome to share.

Histograms & Unpadded NCHAR Predicates

Update (2018-03-16) Seems WordPress or SQLPlus messed up my testcase, I have updated post to include unedited testcase as well as output.

Update (2018-03-22) Oracle Support have created a new bug for this issue, Bug 27730925 – WRONG CARDINALITY ESTIMATION FROM HYBRID HISTOGRAM ON NCHAR COLUMN

Update (2018-07-24) Tested the patch for bug 27730925 resolves issue. Note the following is required for fix to take effect: alter session set “_fix_control”=’27730925:1′;
 

Paraphrasing Jonathan Lewis it’s very easy to start ‘breaking’ Oracle as soon as you start combining a few features that most people are not running. The following demonstrates that if you are using NCHAR datatypes (in our case this is forced on us because due to our JDEdwards application) and if you have a hybrid histogram then cardinality estimates vary greatly depending on whether or not you pad your predicates.

Of course with NCHAR datatypes, the results and hence cardinality should be unchanged whether or not predicates are padded.  The issue does not reproduce the symptoms with CHAR datatypes.

SQL>  CREATE TABLE HH_NC_TAB (COL NCHAR(10));

Table created.

SQL> insert /*+APPEND*/ into HH_NC_TAB
  2      SELECT 'AAA'||mod(rownum, 10) FROM DUAL CONNECT BY LEVEL < 100000
  3      union all
  4      SELECT 'AAA'||rownum FROM DUAL CONNECT BY LEVEL  comment to avoid WordPress format issue
  5  ;

199998 rows created.

SQL> COMMIT;

Commit complete.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(NULL, 'HH_NC_TAB', METHOD_OPT=>'FOR COLUMNS COL SIZE 254');

PL/SQL procedure successfully completed.

SQL> EXPLAIN PLAN FOR SELECT * FROM HH_NC_TAB WHERE COL = 'AAA1';

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2981390417

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    21 |   206   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| HH_NC_TAB |     1 |    21 |   206   (2)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter("COL"=U'AAA1')

13 rows selected.

SQL> EXPLAIN PLAN FOR SELECT * FROM HH_NC_TAB WHERE COL = RPAD('AAA1', 10);

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2981390417

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 10236 |   209K|   204   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| HH_NC_TAB | 10236 |   209K|   204   (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter("COL"=U'AAA1      ')

13 rows selected.

SQL>

Issue reproduces on 12.1, 12.2 and I’ve uploaded to LiveSQL so you can verify for yourself on 18.1

Multi-Tenant Resource Manager Oddity

I’ll try to write this up further when I make progress, but just wanted somewhere to share testcase. I’m guessing it’s a multi-tenant bug, but if that’s the case I’m a little surprised why no-one else has hit it.

SQL> CREATE TABLE DUMMY AS SELECT * FROM DBA_OBJECTS WHERE 1=0;

Table created.

SQL> SELECT resource_consumer_group
  2  FROM v$session
  3  WHERE sid IN (SELECT sid FROM v$mystat);

RESOURCE_CONSUMER_GROUP
--------------------------------
OTHER_GROUPS

SQL> BEGIN
  2    DBMS_SCHEDULER.create_job (
  3      job_name        => 'RC_JOBA',
  4      job_type        => 'PLSQL_BLOCK',
  5      job_action      => 'BEGIN INSERT INTO DUMMY SELECT * FROM DBA_OBJECTS; DELETE FROM DUMMY; COMMIT; END;',
  6      enabled         => TRUE);
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL> SELECT DISTINCT consumer_group
  2  FROM
  3      v$active_session_history ash,
  4      dba_rsrc_consumer_groups cg
  5  WHERE
  6      action = 'RC_JOBA'
  7      AND   ash.consumer_group_id = cg.consumer_group_id;

CONSUMER_GROUP
--------------------------------------------------------------------------------
SYS_GROUP

SQL>

Edit: I guess if you are testing you may need to update the task the job does to ensure it writes something into v$active_session_history, if you’ve got suggestion for better way to do that let me know, I can’t think of one right now.

Update 2018-03-02: Worryingly this doesn’t seem to be simply a reporting error. I updated the current database resource plan such that any SQL estimated to take over 1 second to complete would be cancelled.
plan

Running a bad SQL directly as user caused query to be cancelled as expected, running via dbms_scheduler attempted to execute the bad SQL.

SQL> CREATE OR REPLACE PROCEDURE KILL_ME_PROC AS
  2  BEGIN
  3     FOR LOOPY IN (SELECT NULL FROM ALL_OBJECTS A1, ALL_OBJECTS A2, ALL_OBJECTS A3, ALL_OBJECTS A3)
  4     LOOP
  5        NULL;
  6     END LOOP;
  7  END;
  8  /

Procedure created.

SQL>
SQL> EXEC KILL_ME_PROC;
BEGIN KILL_ME_PROC; END;

*
ERROR at line 1:
ORA-00040: active time limit exceeded - call aborted
ORA-06512: at "C##PATRICKJOLLIFFE.KILL_ME_PROC", line 3
ORA-06512: at line 1


SQL> SELECT resource_consumer_group
  2  FROM v$session
  3  WHERE sid IN (SELECT sid FROM v$mystat);

RESOURCE_CONSUMER_GROUP
--------------------------------
OTHER_GROUPS

SQL> BEGIN
  2      DBMS_SCHEDULER.create_job (
  3        job_name        => 'KILL_ME_JOB2',
  4        job_type        => 'STORED_PROCEDURE',
  5        job_action      => 'KILL_ME_PROC',
  6        enabled         => TRUE);
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL> SELECT ERRORS FROM DBA_SCHEDULER_JOB_RUN_DETAILS WHERE JOB_NAME = 'KILL_ME_JOB2' AND LOG_DATE > SYSDATE-1/24;

ERRORS
--------------------------------------------------------------------------------
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
ORA-06512: at "C##PATRICKJOLLIFFE.KILL_ME_PROC", line 3


SQL>

 

Windows Single Sign-On to an Oracle database using Kerberos

I’ve recently been investigating Kerberos authentication as a means for providing Windows Single Sign On from Windows clients into an Oracle database servers. Maybe you are wondering one of the following questions.
What on earth does Kerberos authentication have to do with Microsoft Windows?
Don’t you have to pay extra money to Oracle for such functionality?
How to I configure this, are any additional components required?

To answer those questions in order, firstly Kerberos is the default authentication protocol since Windows 2000 (type klist at your Windows command prompt if you’re logged into a Windows domain to see evidence of this).
Secondly, while initially part of the Advanced Security Option, Kerberos Authentication no longer requires this extra cost option.
To answer the final question, read on; the following are the steps I took to enable Kerberos Authentication on my lab environment running on AWS.

First of all the initial setup.  The following machines all belong to a domain called patrick.domain.

  • Host MSWDC runs Windows Server 2016 is the Domain Controller
  • Host WINCLIENT runs Windows Server 2016, and holds Oracle (instant) client binaries
  • Host linoradb runs Oracle Linux Server release 7.4, holding an Oracle 11.2.0.4 Enterprise Edition Database.  Kerberos client packages are installed

I have created the two domain user accounts on patrick.domain:

  • patrickjolliffe – my personal account with which will try to SSO to the database.
  • orasvc – a service account that will be used to generate the keytab file used by the database server

First I add the following entries to sqlnet.ora on the database server:

SQLNET.KERBEROS5_CONF=/etc/krb5.conf
SQLNET.KERBEROS5_CONF_MIT=true

And update /etc/krb5.conf so that it contains the following:

[libdefaults]
  default_realm = PATRICK.DOMAIN
[realms]
  PATRICK.DOMAIN = {
    kdc = mswdc.patrick.domain
    admin_server = mswdc.patrick.domain
  }

At this point it’s worth doing some initial checks to validate Kerberos communication between the Oracle Database Server and the Domain Controller.

First let’s obtain, view and then destroy a Kerberos ticket through Linux
Kerberos Stack:

[oracle@linoradb ~]$ kinit patrickjolliffe
Password for patrickjolliffe@PATRICK.DOMAIN:
[oracle@linoradb admin]$ klist
Ticket cache: FILE:/tmp/krb5cc_54321
Default principal: patrickjolliffe@PATRICK.DOMAIN

Valid starting Expires Service principal
02/24/2018 06:58:44 02/24/2018 16:58:44 krbtgt/PATRICK.DOMAIN@PATRICK.DOMAIN
renew until 02/25/2018 06:58:41
[oracle@linoradb admin]$ kdestroy

Next let’s perform the same task via the Oracle Kerberos stack:

[oracle@linoradb admin]$ okinit patrickjolliffe

Kerberos Utilities for Linux: Version 11.2.0.4.0 - Production on 24-FEB-2018 07:01:34

Copyright (c) 1996, 2013 Oracle. All rights reserved.

Password for patrickjolliffe@PATRICK.DOMAIN:
[oracle@linoradb admin]$ oklist

Kerberos Utilities for Linux: Version 11.2.0.4.0 - Production on 24-FEB-2018 07:01:45

Copyright (c) 1996, 2013 Oracle. All rights reserved.

Ticket cache: /tmp/krb5cc_54321
Default principal: patrickjolliffe@PATRICK.DOMAIN

Valid Starting Expires Principal
24-Feb-2018 07:01:38 24-Feb-2018 15:01:34 krbtgt/PATRICK.DOMAIN@PATRICK.DOMAIN
[oracle@linoradb admin]$ okdstry

Kerberos Utilities for Linux: Version 11.2.0.4.0 - Production on 24-FEB-2018 07:01:52

Copyright (c) 1996, 2013 Oracle. All rights reserved.

[oracle@linoradb admin]$

Next we generate a keytab file on the domain controller for deployment to the database server.

ktpass -princ oracle/linoradb.patrick.domain@PATRICK.DOMAIN -ptype KRB5_NT_PRINCIPAL
-crypto RC4-HMAC-NT -mapuser orasvc@patrick.domain -out krb5.keytab -pass

File krb5.keytab is created and I copy to /etc directory on the database server linoradb.
I’ll add a couple of entries to sqlnet.ora on the database server so the Oracle Kerberos Stack can find this keytab, and also provide the service name “oracle” which I specified when generating the keytab.

SQLNET.KERBEROS5_KEYTAB=/etc/krb5.keytab
SQLNET.AUTHENTICATION_KERBEROS5_SERVICE=oracle

Let’s validate the keytab through the Linux stack:

[oracle@linoradb admin]$ klist -k -e
Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
4 oracle/linoradb.patrick.domain@PATRICK.DOMAIN (arcfour-hmac)
[oracle@linoradb admin]$

All good, so let’s do the same check using the Oracle stack:

[oracle@linoradb admin]$ oklist -k

Kerberos Utilities for Linux: Version 11.2.0.4.0 - Production on 24-FEB-2018 07:16:14

Copyright (c) 1996, 2013 Oracle. All rights reserved.

Service Key Table: /etc/krb5.keytab

Ver Timestamp Principal
4 01-Jan-1970 00:00:00 oracle/linoradb.patrick.domain@PATRICK.DOMAIN
[oracle@linoradb admin]$

Next I’ll add the following entry to sqlnet.ora on the database server to allow Kerberos in addition to BEQ (local) connection

SQLNET.AUTHENTICATION_SERVICES=(BEQ,KERBEROS5)

On the database I create a new user that will be authenticated using Kerberos. Note the double quotes and the capitalization.

SQL> create user "PATRICKJOLLIFFE@PATRICK.DOMAIN" identified externally;

User created.

SQL> GRANT CREATE SESSION TO "PATRICKJOLLIFFE@PATRICK.DOMAIN";

Grant succeeded.

Before we complicate matters by getting the Windows client involved, lets first check Kerberos authentication for a connection locally on the database server (note I create and use a tnsnames entry to make sure we don’t use local BEQ authentication):

[oracle@linoradb ~]$ kinit patrickjolliffe
Password for patrickjolliffe@PATRICK.DOMAIN:
[oracle@linoradb admin]$ sqlplus /@SSOTEST

SQL*Plus: Release 11.2.0.4.0 Production on Sat Feb 24 07:49:19 2018

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> SHOW USER;
USER is "PATRICKJOLLIFFE@PATRICK.DOMAIN"
SQL>

That’s working fine so finally lets try the Windows client. I have installed instant client 11.1, 11.2, 12.1 and 12.2 versions and ensured the following settings work with all of them.

For 11.x clients authentication service KERBEROS5 is used, with Credential Cache (CC_NAME) OSMSFT:

For 12.x client 12.x in theory, KERBROS5 service should be used with MSLSA: for the CC_NAME, however due to bug 18895651, KERBEROS5PRE is required with CC_NAME OSMSFT:

Sources on the internet specify “OSMSFT:” or “OSMSFT://”, testing shows that any characters after the “:” are ignored so I provide the simplest possible configuration.

SQLNET.AUTHENTICATION_SERVICES=(KERBEROS5PRE,KERBEROS5)
SQLNET.KERBEROS5_CONF_MIT=true
SQLNET.KERBEROS5_CONF=C:\Users\patrickjolliffe\krb5.conf
SQLNET.KERBEROS5_CC_NAME=OSMSFT:

Testing with instant clients show that KERBEROS5_CONF just needs to point to a valid kerberos configuration file, it can even be empty, however for consistency, I just configure exactly the same as the krb5.conf on the database server.

Below I show connecting via 11.2 instant client. For demonstration purposes I purge Kerberos tickets before connecting, and after successful connection show that a ticket has been obtained for the access to the database:

C:\Users\patrickjolliffe\instantclient_11_1>klist purge

Current LogonId is 0:0xa69ed
Deleting all tickets:
Ticket(s) purged!

C:\Users\patrickjolliffe\instantclient_11_1>sqlplus /@ssotest

SQL*Plus: Release 11.1.0.7.0 - Production on Sat Feb 24 08:19:15 2018

Copyright (c) 1982, 2008, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> show user
USER is "PATRICKJOLLIFFE@PATRICK.DOMAIN"
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

C:\Users\patrickjolliffe\instantclient_11_1>klist

Current LogonId is 0:0xa69ed

Cached Tickets: (2)

#0> Client: patrickjolliffe @ PATRICK.DOMAIN
Server: krbtgt/PATRICK.DOMAIN @ PATRICK.DOMAIN
KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
Ticket Flags 0x40e10000 -> forwardable renewable initial pre_authent name_canonicalize
Start Time: 2/24/2018 8:19:15 (local)
End Time: 2/24/2018 18:19:15 (local)
Renew Time: 3/3/2018 8:19:15 (local)
Session Key Type: AES-256-CTS-HMAC-SHA1-96
Cache Flags: 0x1 -> PRIMARY
Kdc Called: MSWDC.patrick.domain

#1> Client: patrickjolliffe @ PATRICK.DOMAIN
Server: oracle/linoradb.patrick.domain @ PATRICK.DOMAIN
KerbTicket Encryption Type: RSADSI RC4-HMAC(NT)
Ticket Flags 0x40a10000 -> forwardable renewable pre_authent name_canonicalize
Start Time: 2/24/2018 8:19:15 (local)
End Time: 2/24/2018 18:19:15 (local)
Renew Time: 3/3/2018 8:19:15 (local)
Session Key Type: RSADSI RC4-HMAC(NT)
Cache Flags: 0
Kdc Called: MSWDC.patrick.domain

C:\Users\patrickjolliffe\instantclient_11_1>

Of course it’s never so easy in practice, there are always complications, debugging and investigation needed to get this working, but when you consider the benefits to a large organization of providing Single Sign-On to Oracle databases, I think it’s time well spent.

Sangam17 – The All India Usergroup Conference

Being a four hour red-eye flight away, Sangam, the All India Oracle Usergroup conference, is the closest significant Oracle conference to Hong Kong.  It is included on the OTN APAC Tour, and Oracle corporation supports it by sending some of it’s big guns to present, resulting in a fantastic line-up of speakers; the fact that all sessions are conducted in English is a real bonus too.

Last year it was hosted in Bangalore, and I was fortunate to be able to attend; this year it was hosted in Hyderabad and was pleased to be accepted as a speaker.  Checking the schedule I saw that I was due to speak at the same time as one of the Oracle big guns, Connor McDonald, so I was expecting no-one to be attending my session, however in the end I presented to a pretty full room.

20171208_132740.jpg

One thing which impressed me was the level of engagement from the audience, they weren’t hesitant to ask for clarification during the session, and also to discuss points informally throughout rest the conference.

I enjoyed many of the other sessions, highlights included:

  • Chris Saxon using packets of M&Ms to demonstrating the differences between index access vs full table scans, and the effect of clustering factor. This was stuff I already knew, but I had never heard them explained so clearly, and it’s great to see someone presenting the basics for a change
  • Andrew Holdsworth from Real World Performance Team suggesting that it might make sense NOT to gather system statistics on a system, which initially seemed counter-intuitive, but am starting to see the sense of it
  • Bawa and Dinesh talking about the importance of rest, and leading all the attendees through a guided meditation

20171208_202702.jpg

A trip to India wouldn’t be complete without enjoying some of the food, unfortunately this trip I hardly got out from the hotel and conference center bubble, however I loved the buffet put on for lunch and the dosas for breakfast.

20171209_081106.jpgThanks so much for the all the organizers and volunteers for giving me the opportunity  to participate in such a great conference, and also the attendees and speakers for making it so special.

Direct Path Reads and Chained Rows

Another DBA was investigating a performance difference of a query between two different systems and asked me to take a look. The execution plan, a full table scan, was the same in both cases.  We ran Snapper on the problem system and observed that most of the time was spent on event ‘db file sequential read’.

-- Session Snapper v4.24 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------
  ActSes   %Thread | INST | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------------------------------------
     .98     (98%) |    1 | 2kcksks9ht14t   | 0         | db file sequential read             | User I/O
     .03      (3%) |    1 | 2kcksks9ht14t   | 0         | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2017-11-24 16:12:18, seconds=5, samples_taken=40, AAS=1

‘db file sequential read’ is a single block access, not usually associated with a full table scan, however from previous experience I guessed that we had chained rows (one row split among multiple blocks), so we reran snapper and checked the value of statistic ‘table fetch continued row’.

-- Session Snapper v4.24 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
30, GLOGOWNER , STAT, session logical reads                                     ,          8477,      1.56k,         ,             ,          ,           ,      8.48k total buffer visits
30, GLOGOWNER , STAT, user I/O wait time                                        ,           531,      98.03,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, non-idle wait time                                        ,           531,      98.03,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, non-idle wait count                                       ,          8449,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical read total IO requests                           ,          8449,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical read total bytes                                 ,      69214208,     12.78M,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, cell physical IO interconnect bytes                       ,      69214208,     12.78M,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, consistent gets                                           ,          8477,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, consistent gets direct                                    ,          8477,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical reads                                            ,          8449,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical reads direct                                     ,          8449,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical read IO requests                                 ,          8449,      1.56k,         ,             ,          ,           ,      8.19k bytes per request
30, GLOGOWNER , STAT, physical read bytes                                       ,      69214208,     12.78M,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, file io service time                                      ,         16898,      3.12k,         ,             ,          ,           ,        2us bad guess of IO service time per IO request
30, GLOGOWNER , STAT, file io wait time                                         ,       5285794,    975.84k,         ,             ,          ,           ,   625.61us bad guess of IO wait time per IO request
30, GLOGOWNER , STAT, no work - consistent read gets                            ,          8478,      1.57k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, table scan rows gotten                                    ,          3798,     701.17,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, table scan blocks gotten                                  ,           882,     162.83,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, table fetch continued row                                 ,          7567,       1.4k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, buffer is not pinned count                                ,          8477,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , TIME, DB CPU                                                    ,        716890,   132.35ms,    13.2%, [@@        ],          ,           ,
30, GLOGOWNER , TIME, sql execute elapsed time                                  ,       6001826,      1.11s,   110.8%, [##########],          ,           ,
30, GLOGOWNER , TIME, DB time                                                   ,       6001826,      1.11s,   110.8%, [##########],          ,           ,      -10.8 % unaccounted time
30, GLOGOWNER , WAIT, db file sequential read                                   ,       5300427,   978.54ms,    97.9%, [WWWWWWWWWW],      8409,      1.55k,   630.33us average wait

--  End of Stats snap 1, end=2017-11-24 16:12:29, seconds=5.4

We were starting to understand the problem we were facing, especially when the DBA observed that the table in question had more than 255 columns (rows in such tables will always be chained), however we still didn’t understand why on the good performing system, also with chained rows, the query ran very fast, mostly on CPU.

I reran the query with SQL trace on the problem system, and attach a small portion below:

WAIT #139940851571352: nam='direct path read' ela= 3354 file number=11 first dba=257026 block cnt=62 obj#=131201 tim=1988902071975
WAIT #139940851571352: nam='db file sequential read' ela= 316 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902073088
WAIT #139940851571352: nam='db file sequential read' ela= 576 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902073711
WAIT #139940851571352: nam='db file sequential read' ela= 472 file#=11 block#=118100 blocks=1 obj#=131201 tim=1988902074209
WAIT #139940851571352: nam='db file sequential read' ela= 382 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902074629
WAIT #139940851571352: nam='db file sequential read' ela= 558 file#=11 block#=118140 blocks=1 obj#=131201 tim=1988902075214
WAIT #139940851571352: nam='db file sequential read' ela= 479 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902075732
WAIT #139940851571352: nam='db file sequential read' ela= 532 file#=11 block#=141498 blocks=1 obj#=131201 tim=1988902076290
WAIT #139940851571352: nam='db file sequential read' ela= 605 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902076933
WAIT #139940851571352: nam='db file sequential read' ela= 512 file#=11 block#=12950 blocks=1 obj#=131201 tim=1988902077472
WAIT #139940851571352: nam='direct path read' ela= 4326 file number=11 first dba=259074 block cnt=62 obj#=131201 tim=1988902081885
WAIT #139940851571352: nam='db file sequential read' ela= 754 file#=11 block#=13035 blocks=1 obj#=131201 tim=1988902082905
WAIT #139940851571352: nam='db file sequential read' ela= 544 file#=11 block#=13035 blocks=1 obj#=131201 tim=1988902083483
WAIT #139940851571352: nam='db file sequential read' ela= 617 file#=11 block#=12955 blocks=1 obj#=131201 tim=1988902084132
WAIT #139940851571352: nam='db file sequential read' ela= 546 file#=11 block#=13035 blocks=1 obj#=131201 tim=1988902084723
WAIT #139940851571352: nam='db file sequential read' ela= 976 file#=11 block#=141559 blocks=1 obj#=131201 tim=1988902085730
WAIT #139940851571352: nam='direct path read' ela= 3240 file number=11 first dba=264450 block cnt=62 obj#=131201 tim=1988902089070
WAIT #139940851571352: nam='db file sequential read' ela= 380 file#=11 block#=13036 blocks=1 obj#=131201 tim=1988902090605
WAIT #139940851571352: nam='db file sequential read' ela= 812 file#=11 block#=13036 blocks=1 obj#=131201 tim=1988902091452
WAIT #139940851571352: nam='db file sequential read' ela= 452 file#=11 block#=118118 blocks=1 obj#=131201 tim=1988902091936
WAIT #139940851571352: nam='db file sequential read' ela= 608 file#=11 block#=13036 blocks=1 obj#=131201 tim=1988902092588
WAIT #139940851571352: nam='db file sequential read' ela= 566 file#=11 block#=13008 blocks=1 obj#=131201 tim=1988902093186

While trying to figure out why the same block, say block# 13034, in the trace was being reread from disk during subsequent events, rather than being accessed from the buffer cache, another of my colleagues (thanks Kei) pointed out that the event associated with the table scan was a ‘direct path read’, rather than the usual ‘db file scattered read’.

‘direct path reads’ bypass the buffer cache, reading blocks directly into the session’s PGA. It seems this behavior also affects subsequent ‘db file sequential read’s necessary for accessing chained row pieces, potentially causing some significant performance issues.

As is often the case, Jonathan Lewis has already blogged about this issue however it’s always more rewarding and fun (re)discovering these issues oneself!

12.2 controlfile backup never gets marked as obsolete

I’ve blogged about this bug in passing, but I thought would be worthwhile to document my testcase.    Basically backups of controlfiles never get marked as obsolete. The issue reproduces for both image copies and backupsets, my test-case uses the latter for simplicity.

RMAN> show all;

RMAN configuration parameters for database with db_unique_name PVJTEST are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF;
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE RMAN OUTPUT TO KEEP FOR 7 DAYS; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/12.2.0.1/dbhome/dbs/snapcf_PVJTEST.f'; # default

RMAN> list backup of controlfile;

specification does not match any backup in the repository

RMAN> backup database;

Starting backup at 2017-11-01 11:31:38
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=/u02/oradata/PVJTEST/sysaux01.dbf
input datafile file number=00001 name=/u02/oradata/PVJTEST/system01.dbf
input datafile file number=00004 name=/u02/oradata/PVJTEST/undotbs01.dbf
input datafile file number=00007 name=/u02/oradata/PVJTEST/users01.dbf
channel ORA_DISK_1: starting piece 1 at 2017-11-01 11:31:38
channel ORA_DISK_1: finished piece 1 at 2017-11-01 11:31:53
piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/agsifi8q_1_1 tag=TAG20171101T113138 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
Finished backup at 2017-11-01 11:31:54

Starting Control File and SPFILE Autobackup at 2017-11-01 11:31:54
piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/c-2122366327-20171101-07 comment=NONE
Finished Control File and SPFILE Autobackup at 2017-11-01 11:31:55

RMAN> backup database;

Starting backup at 2017-11-01 11:31:59
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=/u02/oradata/PVJTEST/sysaux01.dbf
input datafile file number=00001 name=/u02/oradata/PVJTEST/system01.dbf
input datafile file number=00004 name=/u02/oradata/PVJTEST/undotbs01.dbf
input datafile file number=00007 name=/u02/oradata/PVJTEST/users01.dbf
channel ORA_DISK_1: starting piece 1 at 2017-11-01 11:31:59
channel ORA_DISK_1: finished piece 1 at 2017-11-01 11:32:14
piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/aisifi9f_1_1 tag=TAG20171101T113159 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
Finished backup at 2017-11-01 11:32:14

Starting Control File and SPFILE Autobackup at 2017-11-01 11:32:14
piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/c-2122366327-20171101-08 comment=NONE
Finished Control File and SPFILE Autobackup at 2017-11-01 11:32:15

RMAN> delete obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
using channel ORA_DISK_1
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set 207 2017-11-01 11:31:53
Backup Piece 207 2017-11-01 11:31:53 /u01/app/oracle/product/12.2.0.1/dbhome/dbs/agsifi8q_1_1

Do you really want to delete the above objects (enter YES or NO)? YES
deleted backup piece
backup piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/agsifi8q_1_1 RECID=207 STAMP=958908699
Deleted 1 objects

 

RMAN> list backup of controlfile;

 

List of Backup Sets
===================

 

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
208 Full 10.47M DISK 00:00:00 2017-11-01 11:31:54
BP Key: 208 Status: AVAILABLE Compressed: NO Tag: TAG20171101T113154
Piece Name: /u01/app/oracle/product/12.2.0.1/dbhome/dbs/c-2122366327-20171101-07
Control File Included: Ckp SCN: 12426398 Ckp time: 2017-11-01 11:31:54

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
210 Full 10.47M DISK 00:00:00 2017-11-01 11:32:14
BP Key: 210 Status: AVAILABLE Compressed: NO Tag: TAG20171101T113214
Piece Name: /u01/app/oracle/product/12.2.0.1/dbhome/dbs/c-2122366327-20171101-08
Control File Included: Ckp SCN: 12426426 Ckp time: 2017-11-01 11:32:14

RMAN>

Oracle Support initially identified the following bug:
Bug 25943271 : RMAN REPORT OBSOLETE DOES NOT REPORT CONTROLFILE BACKUP AS OBSOLETE

However after waiting for the patch, it still did not resolve the issue. Seems this may now be the following bug:
Bug 26771767 : SPFILE BACKUP OR BACKUP PIECE INCLUDES SPFILE ARE NOT REPORTED AS OBSOLETE

We are working around the issue at the moment by modifying our rman scripts to delete controlfile backups older than a certain age.  Still waiting for the fix, after supplying reproducible testcase over six months ago 🙂