Problems with Kerberos Credentials to 12.2 Databases in Enterprise Manger

Just a quick note in case anyone else hits this issue. We use make extensive use of Kerberos to give us Windows single-sign on. For database connections via Enterprise Manger we use “Database Kerberos Credentials” credential type and have no problems connecting to 11.2 and 12.1 databases (apart from slight pain points that it’s not possible to use them as preferred credentials, and the necessity to update the credential whenever Windows password is updated).

However as we have been migrating databases to 12.2, the credentials have not worked, testing against such a database sometimes give the following error message:

Credentials could not be verified. EXCEPTION_WHILE_CREATING_CONN_FROMSUB
Surprisingly sometimes testing the credential also completes successfully.

Testing through various combinations I’ve discovered that problem is the following line in the [libdefaults] section of the Kerberos configuration file, krb5.conf, on the Enterprise Manger server

forwardable = true

After commenting out this line, Kerberos credentials test successfully against 12.2 databases.  I have no idea what change in 12.2 causes this setting to cause a problem, if anyone has any ideas welcome to share.

Hybrid Histograms & Unpadded NCHAR Predicates

EDIT:2018-03-16 Seems WordPress or SQLPlus messed up my testcase, I have updated tihs post to include unedited testcase as well as output.

EDIT:2018-03-22 Oracle Support have created a new bug for this issue, Bug 27730925 – WRONG CARDINALITY ESTIMATION FROM HYBRID HISTOGRAM ON NCHAR COLUMN

Paraphrasing Jonathan Lewis it’s very easy to start ‘breaking’ Oracle as soon as you start combining a few features that most people are not running. The following demonstrates that if you are using NCHAR datatypes (in our case this is forced on us because due to our JDEdwards application) and if you have a hybrid histogram then cardinality estimates vary greatly depending on whether or not you pad your predicates.

Of course with NCHAR datatypes, the results and hence cardinality should be unchanged whether or not predicates are padded.  The issue does not reproduce the symptoms with CHAR datatypes.

SQL> CREATE TABLE HH_NC_TAB (COL NCHAR(10));

Table created.

SQL> insert /*+APPEND*/ into HH_NC_TAB
  2  SELECT 'AAA'||mod(rownum, 10) FROM DUAL CONNECT BY LEVEL < 100000
  3  union all
  4  SELECT 'AAA'||rownum FROM DUAL CONNECT BY LEVEL < 100000; -- > comment to avoid WP format issue

199998 rows created.

SQL> COMMIT;

Commit complete.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(NULL, 'HH_NC_TAB', METHOD_OPT=>'FOR COLUMNS COL SIZE 254');

PL/SQL procedure successfully completed.

SQL> EXPLAIN PLAN FOR SELECT * FROM HH_NC_TAB WHERE COL = 'AAA1';

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2981390417

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    21 |   206   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| HH_NC_TAB |     1 |    21 |   206   (2)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter("COL"=U'AAA1')

13 rows selected.

SQL> EXPLAIN PLAN FOR SELECT * FROM HH_NC_TAB WHERE COL = RPAD('AAA1', 10);

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2981390417

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 10236 |   209K|   204   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| HH_NC_TAB | 10236 |   209K|   204   (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter("COL"=U'AAA1      ')

13 rows selected.

SQL>

Issue reproduces on 12.1, 12.2 and I’ve uploaded to LiveSQL so you can verify for yourself on 18.1

Multi-Tenant Resource Manager Oddity

I’ll try to write this up further when I make progress, but just wanted somewhere to share testcase. I’m guessing it’s a multi-tenant bug, but if that’s the case I’m a little surprised why no-one else has hit it.

SQL> CREATE TABLE DUMMY AS SELECT * FROM DBA_OBJECTS WHERE 1=0;

Table created.

SQL> SELECT resource_consumer_group
  2  FROM v$session
  3  WHERE sid IN (SELECT sid FROM v$mystat);

RESOURCE_CONSUMER_GROUP
--------------------------------
OTHER_GROUPS

SQL> BEGIN
  2    DBMS_SCHEDULER.create_job (
  3      job_name        => 'RC_JOBA',
  4      job_type        => 'PLSQL_BLOCK',
  5      job_action      => 'BEGIN INSERT INTO DUMMY SELECT * FROM DBA_OBJECTS; DELETE FROM DUMMY; COMMIT; END;',
  6      enabled         => TRUE);
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL> SELECT DISTINCT consumer_group
  2  FROM
  3      v$active_session_history ash,
  4      dba_rsrc_consumer_groups cg
  5  WHERE
  6      action = 'RC_JOBA'
  7      AND   ash.consumer_group_id = cg.consumer_group_id;

CONSUMER_GROUP
--------------------------------------------------------------------------------
SYS_GROUP

SQL>

Edit: I guess if you are testing you may need to update the task the job does to ensure it writes something into v$active_session_history, if you’ve got suggestion for better way to do that let me know, I can’t think of one right now.

Update 2018-03-02: Worryingly this doesn’t seem to be simply a reporting error. I updated the current database resource plan such that any SQL estimated to take over 1 second to complete would be cancelled.
plan

Running a bad SQL directly as user caused query to be cancelled as expected, running via dbms_scheduler attempted to execute the bad SQL.

SQL> CREATE OR REPLACE PROCEDURE KILL_ME_PROC AS
  2  BEGIN
  3     FOR LOOPY IN (SELECT NULL FROM ALL_OBJECTS A1, ALL_OBJECTS A2, ALL_OBJECTS A3, ALL_OBJECTS A3)
  4     LOOP
  5        NULL;
  6     END LOOP;
  7  END;
  8  /

Procedure created.

SQL>
SQL> EXEC KILL_ME_PROC;
BEGIN KILL_ME_PROC; END;

*
ERROR at line 1:
ORA-00040: active time limit exceeded - call aborted
ORA-06512: at "C##PATRICKJOLLIFFE.KILL_ME_PROC", line 3
ORA-06512: at line 1


SQL> SELECT resource_consumer_group
  2  FROM v$session
  3  WHERE sid IN (SELECT sid FROM v$mystat);

RESOURCE_CONSUMER_GROUP
--------------------------------
OTHER_GROUPS

SQL> BEGIN
  2      DBMS_SCHEDULER.create_job (
  3        job_name        => 'KILL_ME_JOB2',
  4        job_type        => 'STORED_PROCEDURE',
  5        job_action      => 'KILL_ME_PROC',
  6        enabled         => TRUE);
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL> SELECT ERRORS FROM DBA_SCHEDULER_JOB_RUN_DETAILS WHERE JOB_NAME = 'KILL_ME_JOB2' AND LOG_DATE > SYSDATE-1/24;

ERRORS
--------------------------------------------------------------------------------
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
ORA-06512: at "C##PATRICKJOLLIFFE.KILL_ME_PROC", line 3


SQL>

 

Windows Single Sign-On to an Oracle database using Kerberos

I’ve recently been investigating Kerberos authentication as a means for providing Windows Single Sign On from Windows clients into an Oracle database servers. Maybe you are wondering one of the following questions.
What on earth does Kerberos authentication have to do with Microsoft Windows?
Don’t you have to pay extra money to Oracle for such functionality?
How to I configure this, are any additional components required?

To answer those questions in order, firstly Kerberos is the default authentication protocol since Windows 2000 (type klist at your Windows command prompt if you’re logged into a Windows domain to see evidence of this).
Secondly, while initially part of the Advanced Security Option, Kerberos Authentication no longer requires this extra cost option.
To answer the final question, read on; the following are the steps I took to enable Kerberos Authentication on my lab environment running on AWS.

First of all the initial setup.  The following machines all belong to a domain called patrick.domain.

  • Host MSWDC runs Windows Server 2016 is the Domain Controller
  • Host WINCLIENT runs Windows Server 2016, and holds Oracle (instant) client binaries
  • Host linoradb runs Oracle Linux Server release 7.4, holding an Oracle 11.2.0.4 Enterprise Edition Database.  Kerberos client packages are installed

I have created the two domain user accounts on patrick.domain:

  • patrickjolliffe – my personal account with which will try to SSO to the database.
  • orasvc – a service account that will be used to generate the keytab file used by the database server

First I add the following entries to sqlnet.ora on the database server:

SQLNET.KERBEROS5_CONF=/etc/krb5.conf
SQLNET.KERBEROS5_CONF_MIT=true

And update /etc/krb5.conf so that it contains the following:

[libdefaults]
  default_realm = PATRICK.DOMAIN
[realms]
  PATRICK.DOMAIN = {
    kdc = mswdc.patrick.domain
    admin_server = mswdc.patrick.domain
  }

At this point it’s worth doing some initial checks to validate Kerberos communication between the Oracle Database Server and the Domain Controller.

First let’s obtain, view and then destroy a Kerberos ticket through Linux
Kerberos Stack:

[oracle@linoradb ~]$ kinit patrickjolliffe
Password for patrickjolliffe@PATRICK.DOMAIN:
[oracle@linoradb admin]$ klist
Ticket cache: FILE:/tmp/krb5cc_54321
Default principal: patrickjolliffe@PATRICK.DOMAIN

Valid starting Expires Service principal
02/24/2018 06:58:44 02/24/2018 16:58:44 krbtgt/PATRICK.DOMAIN@PATRICK.DOMAIN
renew until 02/25/2018 06:58:41
[oracle@linoradb admin]$ kdestroy

Next let’s perform the same task via the Oracle Kerberos stack:

[oracle@linoradb admin]$ okinit patrickjolliffe

Kerberos Utilities for Linux: Version 11.2.0.4.0 - Production on 24-FEB-2018 07:01:34

Copyright (c) 1996, 2013 Oracle. All rights reserved.

Password for patrickjolliffe@PATRICK.DOMAIN:
[oracle@linoradb admin]$ oklist

Kerberos Utilities for Linux: Version 11.2.0.4.0 - Production on 24-FEB-2018 07:01:45

Copyright (c) 1996, 2013 Oracle. All rights reserved.

Ticket cache: /tmp/krb5cc_54321
Default principal: patrickjolliffe@PATRICK.DOMAIN

Valid Starting Expires Principal
24-Feb-2018 07:01:38 24-Feb-2018 15:01:34 krbtgt/PATRICK.DOMAIN@PATRICK.DOMAIN
[oracle@linoradb admin]$ okdstry

Kerberos Utilities for Linux: Version 11.2.0.4.0 - Production on 24-FEB-2018 07:01:52

Copyright (c) 1996, 2013 Oracle. All rights reserved.

[oracle@linoradb admin]$

Next we generate a keytab file on the domain controller for deployment to the database server.

ktpass -princ oracle/linoradb.patrick.domain@PATRICK.DOMAIN -ptype KRB5_NT_PRINCIPAL
-crypto RC4-HMAC-NT -mapuser orasvc@patrick.domain -out krb5.keytab -pass

File krb5.keytab is created and I copy to /etc directory on the database server linoradb.
I’ll add a couple of entries to sqlnet.ora on the database server so the Oracle Kerberos Stack can find this keytab, and also provide the service name “oracle” which I specified when generating the keytab.

SQLNET.KERBEROS5_KEYTAB=/etc/krb5.keytab
SQLNET.AUTHENTICATION_KERBEROS5_SERVICE=oracle

Let’s validate the keytab through the Linux stack:

[oracle@linoradb admin]$ klist -k -e
Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
4 oracle/linoradb.patrick.domain@PATRICK.DOMAIN (arcfour-hmac)
[oracle@linoradb admin]$

All good, so let’s do the same check using the Oracle stack:

[oracle@linoradb admin]$ oklist -k

Kerberos Utilities for Linux: Version 11.2.0.4.0 - Production on 24-FEB-2018 07:16:14

Copyright (c) 1996, 2013 Oracle. All rights reserved.

Service Key Table: /etc/krb5.keytab

Ver Timestamp Principal
4 01-Jan-1970 00:00:00 oracle/linoradb.patrick.domain@PATRICK.DOMAIN
[oracle@linoradb admin]$

Next I’ll add the following entry to sqlnet.ora on the database server to allow Kerberos in addition to BEQ (local) connection

SQLNET.AUTHENTICATION_SERVICES=(BEQ,KERBEROS5)

On the database I create a new user that will be authenticated using Kerberos. Note the double quotes and the capitalization.

SQL> create user "PATRICKJOLLIFFE@PATRICK.DOMAIN" identified externally;

User created.

SQL> GRANT CREATE SESSION TO "PATRICKJOLLIFFE@PATRICK.DOMAIN";

Grant succeeded.

Before we complicate matters by getting the Windows client involved, lets first check Kerberos authentication for a connection locally on the database server (note I create and use a tnsnames entry to make sure we don’t use local BEQ authentication):

[oracle@linoradb ~]$ kinit patrickjolliffe
Password for patrickjolliffe@PATRICK.DOMAIN:
[oracle@linoradb admin]$ sqlplus /@SSOTEST

SQL*Plus: Release 11.2.0.4.0 Production on Sat Feb 24 07:49:19 2018

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> SHOW USER;
USER is "PATRICKJOLLIFFE@PATRICK.DOMAIN"
SQL>

That’s working fine so finally lets try the Windows client. I have installed instant client 11.1, 11.2, 12.1 and 12.2 versions and ensured the following settings work with all of them.

For 11.x clients authentication service KERBEROS5 is used, with Credential Cache (CC_NAME) OSMSFT:

For 12.x client 12.x in theory, KERBROS5 service should be used with MSLSA: for the CC_NAME, however due to bug 18895651, KERBEROS5PRE is required with CC_NAME OSMSFT:

Sources on the internet specify “OSMSFT:” or “OSMSFT://”, testing shows that any characters after the “:” are ignored so I provide the simplest possible configuration.

SQLNET.AUTHENTICATION_SERVICES=(KERBEROS5PRE,KERBEROS5)
SQLNET.KERBEROS5_CONF_MIT=true
SQLNET.KERBEROS5_CONF=C:\Users\patrickjolliffe\krb5.conf
SQLNET.KERBEROS5_CC_NAME=OSMSFT:

Testing with instant clients show that KERBEROS5_CONF just needs to point to a valid kerberos configuration file, it can even be empty, however for consistency, I just configure exactly the same as the krb5.conf on the database server.

Below I show connecting via 11.2 instant client. For demonstration purposes I purge Kerberos tickets before connecting, and after successful connection show that a ticket has been obtained for the access to the database:

C:\Users\patrickjolliffe\instantclient_11_1>klist purge

Current LogonId is 0:0xa69ed
Deleting all tickets:
Ticket(s) purged!

C:\Users\patrickjolliffe\instantclient_11_1>sqlplus /@ssotest

SQL*Plus: Release 11.1.0.7.0 - Production on Sat Feb 24 08:19:15 2018

Copyright (c) 1982, 2008, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> show user
USER is "PATRICKJOLLIFFE@PATRICK.DOMAIN"
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

C:\Users\patrickjolliffe\instantclient_11_1>klist

Current LogonId is 0:0xa69ed

Cached Tickets: (2)

#0> Client: patrickjolliffe @ PATRICK.DOMAIN
Server: krbtgt/PATRICK.DOMAIN @ PATRICK.DOMAIN
KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
Ticket Flags 0x40e10000 -> forwardable renewable initial pre_authent name_canonicalize
Start Time: 2/24/2018 8:19:15 (local)
End Time: 2/24/2018 18:19:15 (local)
Renew Time: 3/3/2018 8:19:15 (local)
Session Key Type: AES-256-CTS-HMAC-SHA1-96
Cache Flags: 0x1 -> PRIMARY
Kdc Called: MSWDC.patrick.domain

#1> Client: patrickjolliffe @ PATRICK.DOMAIN
Server: oracle/linoradb.patrick.domain @ PATRICK.DOMAIN
KerbTicket Encryption Type: RSADSI RC4-HMAC(NT)
Ticket Flags 0x40a10000 -> forwardable renewable pre_authent name_canonicalize
Start Time: 2/24/2018 8:19:15 (local)
End Time: 2/24/2018 18:19:15 (local)
Renew Time: 3/3/2018 8:19:15 (local)
Session Key Type: RSADSI RC4-HMAC(NT)
Cache Flags: 0
Kdc Called: MSWDC.patrick.domain

C:\Users\patrickjolliffe\instantclient_11_1>

Of course it’s never so easy in practice, there are always complications, debugging and investigation needed to get this working, but when you consider the benefits to a large organization of providing Single Sign-On to Oracle databases, I think it’s time well spent.

Sangam17 – The All India Usergroup Conference

Being a four hour red-eye flight away, Sangam, the All India Oracle Usergroup conference, is the closest significant Oracle conference to Hong Kong.  It is included on the OTN APAC Tour, and Oracle corporation supports it by sending some of it’s big guns to present, resulting in a fantastic line-up of speakers; the fact that all sessions are conducted in English is a real bonus too.

Last year it was hosted in Bangalore, and I was fortunate to be able to attend; this year it was hosted in Hyderabad and was pleased to be accepted as a speaker.  Checking the schedule I saw that I was due to speak at the same time as one of the Oracle big guns, Connor McDonald, so I was expecting no-one to be attending my session, however in the end I presented to a pretty full room.

20171208_132740.jpg

One thing which impressed me was the level of engagement from the audience, they weren’t hesitant to ask for clarification during the session, and also to discuss points informally throughout rest the conference.

I enjoyed many of the other sessions, highlights included:

  • Chris Saxon using packets of M&Ms to demonstrating the differences between index access vs full table scans, and the effect of clustering factor. This was stuff I already knew, but I had never heard them explained so clearly, and it’s great to see someone presenting the basics for a change
  • Andrew Holdsworth from Real World Performance Team suggesting that it might make sense NOT to gather system statistics on a system, which initially seemed counter-intuitive, but am starting to see the sense of it
  • Bawa and Dinesh talking about the importance of rest, and leading all the attendees through a guided meditation

20171208_202702.jpg

A trip to India wouldn’t be complete without enjoying some of the food, unfortunately this trip I hardly got out from the hotel and conference center bubble, however I loved the buffet put on for lunch and the dosas for breakfast.

20171209_081106.jpgThanks so much for the all the organizers and volunteers for giving me the opportunity  to participate in such a great conference, and also the attendees and speakers for making it so special.

Direct Path Reads and Chained Rows

Another DBA was investigating a performance difference of a query between two different systems and asked me to take a look. The execution plan, a full table scan, was the same in both cases.  We ran Snapper on the problem system and observed that most of the time was spent on event ‘db file sequential read’.

-- Session Snapper v4.24 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------
  ActSes   %Thread | INST | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------------------------------------
     .98     (98%) |    1 | 2kcksks9ht14t   | 0         | db file sequential read             | User I/O
     .03      (3%) |    1 | 2kcksks9ht14t   | 0         | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2017-11-24 16:12:18, seconds=5, samples_taken=40, AAS=1

‘db file sequential read’ is a single block access, not usually associated with a full table scan, however from previous experience I guessed that we had chained rows (one row split among multiple blocks), so we reran snapper and checked the value of statistic ‘table fetch continued row’.

-- Session Snapper v4.24 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
30, GLOGOWNER , STAT, session logical reads                                     ,          8477,      1.56k,         ,             ,          ,           ,      8.48k total buffer visits
30, GLOGOWNER , STAT, user I/O wait time                                        ,           531,      98.03,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, non-idle wait time                                        ,           531,      98.03,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, non-idle wait count                                       ,          8449,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical read total IO requests                           ,          8449,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical read total bytes                                 ,      69214208,     12.78M,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, cell physical IO interconnect bytes                       ,      69214208,     12.78M,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, consistent gets                                           ,          8477,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, consistent gets direct                                    ,          8477,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical reads                                            ,          8449,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical reads direct                                     ,          8449,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, physical read IO requests                                 ,          8449,      1.56k,         ,             ,          ,           ,      8.19k bytes per request
30, GLOGOWNER , STAT, physical read bytes                                       ,      69214208,     12.78M,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, file io service time                                      ,         16898,      3.12k,         ,             ,          ,           ,        2us bad guess of IO service time per IO request
30, GLOGOWNER , STAT, file io wait time                                         ,       5285794,    975.84k,         ,             ,          ,           ,   625.61us bad guess of IO wait time per IO request
30, GLOGOWNER , STAT, no work - consistent read gets                            ,          8478,      1.57k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, table scan rows gotten                                    ,          3798,     701.17,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, table scan blocks gotten                                  ,           882,     162.83,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, table fetch continued row                                 ,          7567,       1.4k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , STAT, buffer is not pinned count                                ,          8477,      1.56k,         ,             ,          ,           ,          ~ per execution
30, GLOGOWNER , TIME, DB CPU                                                    ,        716890,   132.35ms,    13.2%, [@@        ],          ,           ,
30, GLOGOWNER , TIME, sql execute elapsed time                                  ,       6001826,      1.11s,   110.8%, [##########],          ,           ,
30, GLOGOWNER , TIME, DB time                                                   ,       6001826,      1.11s,   110.8%, [##########],          ,           ,      -10.8 % unaccounted time
30, GLOGOWNER , WAIT, db file sequential read                                   ,       5300427,   978.54ms,    97.9%, [WWWWWWWWWW],      8409,      1.55k,   630.33us average wait

--  End of Stats snap 1, end=2017-11-24 16:12:29, seconds=5.4

We were starting to understand the problem we were facing, especially when the DBA observed that the table in question had more than 255 columns (rows in such tables will always be chained), however we still didn’t understand why on the good performing system, also with chained rows, the query ran very fast, mostly on CPU.

I reran the query with SQL trace on the problem system, and attach a small portion below:

WAIT #139940851571352: nam='direct path read' ela= 3354 file number=11 first dba=257026 block cnt=62 obj#=131201 tim=1988902071975
WAIT #139940851571352: nam='db file sequential read' ela= 316 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902073088
WAIT #139940851571352: nam='db file sequential read' ela= 576 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902073711
WAIT #139940851571352: nam='db file sequential read' ela= 472 file#=11 block#=118100 blocks=1 obj#=131201 tim=1988902074209
WAIT #139940851571352: nam='db file sequential read' ela= 382 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902074629
WAIT #139940851571352: nam='db file sequential read' ela= 558 file#=11 block#=118140 blocks=1 obj#=131201 tim=1988902075214
WAIT #139940851571352: nam='db file sequential read' ela= 479 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902075732
WAIT #139940851571352: nam='db file sequential read' ela= 532 file#=11 block#=141498 blocks=1 obj#=131201 tim=1988902076290
WAIT #139940851571352: nam='db file sequential read' ela= 605 file#=11 block#=13034 blocks=1 obj#=131201 tim=1988902076933
WAIT #139940851571352: nam='db file sequential read' ela= 512 file#=11 block#=12950 blocks=1 obj#=131201 tim=1988902077472
WAIT #139940851571352: nam='direct path read' ela= 4326 file number=11 first dba=259074 block cnt=62 obj#=131201 tim=1988902081885
WAIT #139940851571352: nam='db file sequential read' ela= 754 file#=11 block#=13035 blocks=1 obj#=131201 tim=1988902082905
WAIT #139940851571352: nam='db file sequential read' ela= 544 file#=11 block#=13035 blocks=1 obj#=131201 tim=1988902083483
WAIT #139940851571352: nam='db file sequential read' ela= 617 file#=11 block#=12955 blocks=1 obj#=131201 tim=1988902084132
WAIT #139940851571352: nam='db file sequential read' ela= 546 file#=11 block#=13035 blocks=1 obj#=131201 tim=1988902084723
WAIT #139940851571352: nam='db file sequential read' ela= 976 file#=11 block#=141559 blocks=1 obj#=131201 tim=1988902085730
WAIT #139940851571352: nam='direct path read' ela= 3240 file number=11 first dba=264450 block cnt=62 obj#=131201 tim=1988902089070
WAIT #139940851571352: nam='db file sequential read' ela= 380 file#=11 block#=13036 blocks=1 obj#=131201 tim=1988902090605
WAIT #139940851571352: nam='db file sequential read' ela= 812 file#=11 block#=13036 blocks=1 obj#=131201 tim=1988902091452
WAIT #139940851571352: nam='db file sequential read' ela= 452 file#=11 block#=118118 blocks=1 obj#=131201 tim=1988902091936
WAIT #139940851571352: nam='db file sequential read' ela= 608 file#=11 block#=13036 blocks=1 obj#=131201 tim=1988902092588
WAIT #139940851571352: nam='db file sequential read' ela= 566 file#=11 block#=13008 blocks=1 obj#=131201 tim=1988902093186

While trying to figure out why the same block, say block# 13034, in the trace was being reread from disk during subsequent events, rather than being accessed from the buffer cache, another of my colleagues (thanks Kei) pointed out that the event associated with the table scan was a ‘direct path read’, rather than the usual ‘db file scattered read’.

‘direct path reads’ bypass the buffer cache, reading blocks directly into the session’s PGA. It seems this behavior also affects subsequent ‘db file sequential read’s necessary for accessing chained row pieces, potentially causing some significant performance issues.

As is often the case, Jonathan Lewis has already blogged about this issue however it’s always more rewarding and fun (re)discovering these issues oneself!

12.2 controlfile backup never gets marked as obsolete

I’ve blogged about this bug in passing, but I thought would be worthwhile to document my testcase.    Basically backups of controlfiles never get marked as obsolete. The issue reproduces for both image copies and backupsets, my test-case uses the latter for simplicity.

RMAN> show all;

RMAN configuration parameters for database with db_unique_name PVJTEST are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF;
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE RMAN OUTPUT TO KEEP FOR 7 DAYS; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/12.2.0.1/dbhome/dbs/snapcf_PVJTEST.f'; # default

RMAN> list backup of controlfile;

specification does not match any backup in the repository

RMAN> backup database;

Starting backup at 2017-11-01 11:31:38
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=/u02/oradata/PVJTEST/sysaux01.dbf
input datafile file number=00001 name=/u02/oradata/PVJTEST/system01.dbf
input datafile file number=00004 name=/u02/oradata/PVJTEST/undotbs01.dbf
input datafile file number=00007 name=/u02/oradata/PVJTEST/users01.dbf
channel ORA_DISK_1: starting piece 1 at 2017-11-01 11:31:38
channel ORA_DISK_1: finished piece 1 at 2017-11-01 11:31:53
piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/agsifi8q_1_1 tag=TAG20171101T113138 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
Finished backup at 2017-11-01 11:31:54

Starting Control File and SPFILE Autobackup at 2017-11-01 11:31:54
piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/c-2122366327-20171101-07 comment=NONE
Finished Control File and SPFILE Autobackup at 2017-11-01 11:31:55

RMAN> backup database;

Starting backup at 2017-11-01 11:31:59
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=/u02/oradata/PVJTEST/sysaux01.dbf
input datafile file number=00001 name=/u02/oradata/PVJTEST/system01.dbf
input datafile file number=00004 name=/u02/oradata/PVJTEST/undotbs01.dbf
input datafile file number=00007 name=/u02/oradata/PVJTEST/users01.dbf
channel ORA_DISK_1: starting piece 1 at 2017-11-01 11:31:59
channel ORA_DISK_1: finished piece 1 at 2017-11-01 11:32:14
piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/aisifi9f_1_1 tag=TAG20171101T113159 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
Finished backup at 2017-11-01 11:32:14

Starting Control File and SPFILE Autobackup at 2017-11-01 11:32:14
piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/c-2122366327-20171101-08 comment=NONE
Finished Control File and SPFILE Autobackup at 2017-11-01 11:32:15

RMAN> delete obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
using channel ORA_DISK_1
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set 207 2017-11-01 11:31:53
Backup Piece 207 2017-11-01 11:31:53 /u01/app/oracle/product/12.2.0.1/dbhome/dbs/agsifi8q_1_1

Do you really want to delete the above objects (enter YES or NO)? YES
deleted backup piece
backup piece handle=/u01/app/oracle/product/12.2.0.1/dbhome/dbs/agsifi8q_1_1 RECID=207 STAMP=958908699
Deleted 1 objects

 

RMAN> list backup of controlfile;

 

List of Backup Sets
===================

 

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
208 Full 10.47M DISK 00:00:00 2017-11-01 11:31:54
BP Key: 208 Status: AVAILABLE Compressed: NO Tag: TAG20171101T113154
Piece Name: /u01/app/oracle/product/12.2.0.1/dbhome/dbs/c-2122366327-20171101-07
Control File Included: Ckp SCN: 12426398 Ckp time: 2017-11-01 11:31:54

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
210 Full 10.47M DISK 00:00:00 2017-11-01 11:32:14
BP Key: 210 Status: AVAILABLE Compressed: NO Tag: TAG20171101T113214
Piece Name: /u01/app/oracle/product/12.2.0.1/dbhome/dbs/c-2122366327-20171101-08
Control File Included: Ckp SCN: 12426426 Ckp time: 2017-11-01 11:32:14

RMAN>

Oracle Support initially identified the following bug:
Bug 25943271 : RMAN REPORT OBSOLETE DOES NOT REPORT CONTROLFILE BACKUP AS OBSOLETE

However after waiting for the patch, it still did not resolve the issue. Seems this may now be the following bug:
Bug 26771767 : SPFILE BACKUP OR BACKUP PIECE INCLUDES SPFILE ARE NOT REPORTED AS OBSOLETE

We are working around the issue at the moment by modifying our rman scripts to delete controlfile backups older than a certain age.  Still waiting for the fix, after supplying reproducible testcase over six months ago 🙂

Slovenia & Croatia Oracle User Group Conferences 2017

20171014_183311.jpgThe fact that these two conferences run back-to-back seemed a perfect opportunity to experience both, at the expense of a single long-haul flight.  Their taking place at amazing locations on the Istrian coast which I had never visited before sealed the deal. I submitted abstracts and was delighted when they both accepted me to speak.

I landed in Zagreb after eighteen hours in transit, picked up my rental car, and crossed the Croatian-Slovenian border on my way to the Slovenian capital Ljubljana.  The city was  beautiful, but I was only able to spend about half an hour looking around before exhaustion kicked in and I retired for an early night.

20171015_113318.jpgThe Slovenian conference was held on the town of Portoroz, and I had heard about some cave systems, Postojna and Skocjan on the way.  I had planned to visit both, but on the way to the Postojna, I saw a sign to a cave at Planina so I decided to check that out.  The public accessible area only extends a couple of hundred meters inside (much more is accessible with a guide)  but it was really cool just being alone in the quiet and pitch darkness.

20171015_131242.jpgPostojna Cave  is a complete contrast; you take a 20 minute train journey with hundreds of other sightseers deep into an amazing cave system, followed by a 40 minute guided tour, before taking the train back out again.  After these two experiences I’d had more then enough of caves, so I decided too skip Skocjan and head direct to the conference venue.

20171015_201945.jpg

20171016_203427.jpgThe speakers dinner was held at an fantastic restaurant, Na Burji, and the meal was accompanied by bread baked by the multi-talented Jože Senegačnik.  I was thrilled to finally catch up in person with Jure Bratina, the first subscriber to my blog (!).

Both were fantastic hosts, epitomizing the warmth of the all Slovenian and Croatian people I met on the trip.

After the two day conference I drove with ‘Mr SmartScan’ Roger McNicol back across the border to Croatia and the town of Rovinj.

The Croatian conference is held at Hotel Istra, on Red Island, a 20 minute ferry trip from Rovinj.  There are actually two islands joined together by a causeway, and you can walk the circumference of both in about forty minutes.  They must have been spectacular before the hotel was built!

20171018_072500.jpg

Most days during the conference I managed to swim in the morning and evening, one morning completing the 75 minute swim around the smaller island with Kamil “Let’s do it quickly before we realize it’s a stupid idea” Stawiarski!

One of the highlights was a guided trip to the historic town of Pula, followed by drinking and go-karting, organized by the larger than life figure of Davor Ranković.

20171018_150440.jpg

The above description makes everything seem like an amazing holiday, which indeed it was.  However in addition, I saw so many excellent and informative presentations on topics such as Oracle database, Application Express and Oracle JET.  Being able to interact and make connections in person with so many incredibly knowledgeable and talented people from both inside, and outside of, Oracle corporation is very empowering, and something that is just not possible through the internet. Without exception I found that everyone was so welcoming to me as a new speaker.

Thanks everyone that made it possible, or made it special just by being part of it!

 

 

 

 

Identifying invalid SQL with event 10035

I recently learned from Jonathan Lewis’s blog that in version 12.2 details of invalid SQL statements are written to the alert log (after every 100 occurrences for a particular SQL statement).

Its worth to  mention that similar functionality can be achieved in earlier versions by setting event 10035 (not to be confused with it’s more (in)famous sibling, event 10053).

I will demonstrate with version 12.1 using the same invalid SQL from Jonathan’s post (I have tested that this works at least as far back as 10.2):

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> ALTER SESSION SET EVENTS='10035 trace name context forever, level 1';

Session altered.

SQL> select count(*) frm dual;

Error starting at line : 1 in command -
select count(*) frm dual
Error at Command Line : 1 Column : 21
Error report -
SQL Error: ORA-00923: FROM keyword not found where expected
00923. 00000 - "FROM keyword not found where expected"
*Cause:
*Action:

The following information is written in the alert log:

PARSE ERROR: ospid=14737, error=923 for statement:
Sun Oct 08 01:14:43 2017
select count(*) frm dual
Additional information: hd=0x6bffaa18 phd=0x71c88e18 flg=0x28 cisid=8 sid=8 ciuid=8 uid=8
PARSE ERROR: ospid=14737, error=923 for statement:
Sun Oct 08 01:14:43 2017
select count(*) frm dual
Additional information: hd=0x6bffaa18 phd=0x71c88e18 flg=0x28 cisid=8 sid=8 ciuid=8 uid=8
Sun Oct 08 01:14:43 2017
----- PL/SQL Call Stack -----
object line object
handle number name
0x6f9e6a10 1120 package body SYS.DBMS_SQL
0x727c2ba8 15 anonymous block

Due to the extra overhead this would add to the parse of every invalid SQL, I would suggest to only set this event temporarily at session or system level after identifying that this is a problem from the statistic “parse count (failures)”.

Update

I just found out that Tanel covered this back in 2009 as part of an deep-dive article into diagnostic events, that is well worth a read.

In-Memory Column Store and tables with more than 255 columns

Last week, developers reported some unexpected behavior on a test system.  I thought I would take advantage of the long weekend here in Hong Kong to build the simplest possible reproducible testcase.  I find this is useful to not only in allowing others to easily test on their system, but also to distill exactly what is and is not relevant to the problem, an essential first step for understanding what is going on.

The result confirms something I think I heard from Jonathan Lewis, in that when you start to mix certain features that majority of customers are not using, in this case in-memory column store, tables with >255 columns, and NCHAR columns, then you can start to see some ‘interesting’ results.

For my testcase I am running 12.1.0.2 with 170418 DBBP.

Observe the first part of the setup, I simply use PL/SQL block to create a table with 256 columns (note the first is an NCHAR) and populate it with a couple of rows.  The rows both have c1 set to ‘X’, and on one I populate the last field c256 with value 0, on the other it is NULL.  A query from the the table where c1=’X’ returns both rows as expected.

SQL> DECLARE
  2      sqltext   VARCHAR2(32767);
  3  BEGIN
  4      sqltext := 'CREATE TABLE IM256 (C1 NCHAR(1), ';
  5      FOR loopy IN 2..256 LOOP
  6          sqltext := sqltext
  7           || 'c'
  8           || loopy
  9           || ' NUMBER,';
 10      END LOOP;
 11      -- Remove trailing comma and add close braket
 12      sqltext := SUBSTR(sqltext, 1, LENGTH(SQLTEXT)-1) || ')' ;
 13      EXECUTE IMMEDIATE sqltext;
 14  END;
 15  /

PL/SQL procedure successfully completed.

SQL>
SQL> INSERT INTO IM256(C1, C256) VALUES ('X' , 0);

1 row inserted.

SQL> INSERT INTO IM256(C1) VALUES ('X');

1 row inserted.

SQL> COMMIT;

Commit complete.

SQL>
SQL> SELECT * FROM IM256 WHERE C1 = 'X';

X














                                                                                                                                                                   0

X

















2 rows selected.

SQL>

As soon as I move the table into In-Memory Column Store, the query with the predicate starts to return only one row. The query without predicate still returns both rows.

SQL> ALTER TABLE IM256 INMEMORY NO MEMCOMPRESS;

Table IM256 altered.

SQL> SELECT * FROM IM256 WHERE C1 = 'X';

X














                                                                                                                                                                   0


1 row selected.

SQL> SELECT * FROM IM256;

X














                                                                                                                                                                   0

X














                                                                                                                                                                   0


2 rows selected.

SQL>

Stefan Koehler pointed me towards event 12099 which is mentioned in a couple of bug reports in this area, as a way to disable ‘turboscan’ and indeed setting this event causes the expected results to appear.

SQL>  alter session set events '12099 trace name context forever, level 1';

Session altered.

SQL> SELECT * FROM IM256 WHERE C1 = 'X';

X














                                                                                                                                                                   0

X

















2 rows selected.

Oracle support have identified this as bug 20508559 which is described in further detail in document 20508559.8 – Wrong Results / ORA-600 [kdmv_check_row] with more than 255 columns with nulls.

Edit: I have just tested the patch and after applying it the issue is resolved.