Do not set job_queue_processes at PDB level in 12.1.0.2

MOS Doc ID 2175443.1 goes into more detail, including the following quote:

It is possible to set job_queue_processes at pdb level on 12.1.0.2 multitenant database, but it does not work correctly.

I have managed to reproduce the actual behavior. Of particular concern is that this allows a local user on one PDB to disrupt things on other PDBs.

My test system is running 12.1.0.2, multi-tenant with two PDBs, PDB1 and PDB2. Each PDB has a local user with DBA privileges, dba1, and dba2 respectively.

[oracle@awrw admin]$ sqlplus dba1/oracle@pdb1

SQL*Plus: Release 12.1.0.2.0 Production on Sun Apr 2 13:34:41 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sun Apr 02 2017 13:33:46 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show parameter job_queue_processes

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes		     integer	 1000
SQL> alter system set job_queue_processes=0 container=current;

System altered.

SQL> show parameter job_queue_processes

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes		     integer	 0
SQL>

So dba1 seems to have changed job_queue_processes, in their own PDB, pdb1.

The alert log reveals some cause for concern:

CJQ0 started with pid=34, OS id=31546
Stopping background process CJQ0
Sun Apr 02 13:35:01 2017
ALTER SYSTEM SET job_queue_processes=0 SCOPE=BOTH;

Sure enough, in PDB2, even though job_queue_processes parameter seems to be set to it’s default value, we are unable to use scheduler:

[oracle@awrw admin]$ sqlplus dba2/oracle@pdb2

SQL*Plus: Release 12.1.0.2.0 Production on Sun Apr 2 13:44:19 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sun Apr 02 2017 13:43:25 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show parameter job_queue_processes

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes		     integer	 1000
SQL> exec sys.dbms_scheduler.create_job( 'DUMMY_JOB', job_type=>'PLSQL_BLOCK', job_action=>'BEGIN NULL; END;');

PL/SQL procedure successfully completed.

SQL> exec sys.dbms_scheduler.run_job( 'DUMMY_JOB', false);
BEGIN sys.dbms_scheduler.run_job( 'DUMMY_JOB', false); END;

*
ERROR at line 1:
ORA-27492: unable to run job "DBA2"."DUMMY_JOB": scheduler unavailable
ORA-06512: at "SYS.DBMS_ISCHED", line 209
ORA-06512: at "SYS.DBMS_SCHEDULER", line 594
ORA-06512: at line 1

I connect back to PDB1 and reset the parameter:

[oracle@awrw admin]$ sqlplus dba1/oracle@pdb1

SQL*Plus: Release 12.1.0.2.0 Production on Sun Apr 2 13:46:40 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sun Apr 02 2017 13:34:41 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> alter system set job_queue_processes=1000 container=current;

System altered.
In the alert log I can see Job Queue Co-ordinator process is restarted:
Starting background process CJQ0
Sun Apr 02 13:47:08 2017
ALTER SYSTEM SET job_queue_processes=1000 SCOPE=BOTH;
Sun Apr 02 13:47:08 2017
CJQ0 started with pid=34, OS id=2227

And sure enough I am now able to run schedule jobs in PDB2.

[oracle@awrw ~]$ sqlplus dba2/oracle@pdb2

SQL*Plus: Release 12.1.0.2.0 Production on Sun Apr 2 13:48:51 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sun Apr 02 2017 13:44:19 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> exec sys.dbms_scheduler.run_job( 'DUMMY_JOB', false);

PL/SQL procedure successfully completed.

Adaptive Dynamic Sampling and the Result Cache: RC Latch

After we migrated one of our instances from Oracle 11.2.0.4 to 12.1.0.2 we ran into some serious latch contention that basically took down the instance.  From online research, and discussion with other DBAs I don’t think we are the only customer that has run into this problem, so I wanted to share my investigation and findings in the hope that it will help others, or to spur some fixes from Oracle in this area.

If you don’t want to drill into details, the TLDR is that Adaptive Dynamic Sampling (ADS) (the most likely way for this to be triggered if a SQL Plan Directive appears on a table, these are created automatically with default database settings) and frequent hard parsing (for example due to not using bind variables) do not play well together.

For an excellent introduction to ADS, I suggest you check out this presentation on the subject by Christian Antognini.

To reproduce the behavior that I was seeing, the script below creates a procedure that I can call to perform lots of hard parsing, and forcing ADS using a hint.  The purpose of the identifier parameter is to include it in the SQL text, so that each time I execute the procedure I pass in a different value, forcing a reparse on each execution.

create table t1 nologging as
select  rownum        id,
        rpad('x',100) padding
    from dual 
    connect by level <= 1e4;
               
create index i1 on t1(id);
 
CREATE OR REPLACE PROCEDURE lhps(identifier IN VARCHAR2, parse_count IN NUMBER) IS
BEGIN
   FOR LOOPY IN 1..PARSE_COUNT
   LOOP
      EXECUTE IMMEDIATE 'select /*+dynamic_sampling(11) '|| IDENTIFIER || ' */ * from t1 where id = ' || LOOPY;
   END LOOP;
END;
/

Using mystats package, we can see that in this case, for 100 hard parses, the RC Latch is taken 401 times.

SQL> exec mystats_pkg.ms_start; 
PL/SQL procedure successfully completed.

SQL> EXEC lhps('A', 100);                                                                               
PL/SQL procedure successfully completed.    
                                                             
SQL> exec mystats_pkg.ms_stop(p_statname_like=>'%RC%');
 ==========================================================================================
 MyStats report : 04-SEP-2016 11:18:49
 ==========================================================================================
 2. Statistics Report
 ----------------------------------------------------------------
 Type    Statistic Name                                     Value
 ------  --------------------------------------  ----------------
LATCH   Result Cache: RC Latch                                401             
 ----------------------------------------------------------------
 4. About
 ------------------------------------------------------------------------------------------
 - MyStats v3.0 by Adrian Billington (http://www.oracle-developer.net)
 - Based on the SNAP_MY_STATS utility by Jonathan Lewis
 ==========================================================================================

Examination of the SQL Trace shows that, as you would expect (from Christian’s presentation), the reason for the high RC Latch usage is to the internal ADS SQL statements.

SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T1")  */ 1 AS C1 FROM "T1" "T1" WHERE ("T1"."ID"=1)) innerQuery

MOS note 2002089.1 describes this behavior, and suggests that this behavior can be disabled (obviously there are pros and cons in doing so) with the following setting:

alter session set "_optimizer_ads_use_result_cache" = FALSE;

As a side note it is slightly suprizing that there appears to be 4 RC Latch gets per each execution of the dynamic sampling SQL.  Note that this database in multi-tenant.  As there is only one RC Latch shared by all container databases, my guess is that when SQL is parsed, the RC Latch is taken once for each PDB.

Let’s retest after applying that underscore setting at session level:

SQL> alter session set "_optimizer_ads_use_result_cache" = FALSE;

SQL> exec mystats_pkg.ms_start;                                                                        
PL/SQL procedure successfully completed.                                                               

SQL> EXEC lhps('B', 100);                                                                             
PL/SQL procedure successfully completed.                                                               

SQL> exec mystats_pkg.ms_stop(p_statname_like=>'%RC%');
 ==========================================================================================
 MyStats report : 04-SEP-2016 11:28:53
 ==========================================================================================
 ------------------------------------------------------------------------------------------
 2. Statistics Report
 ------------------------------------------------------------
 Type    Statistic Name                                 Value
 ------  ----------------------------------  ----------------
 LATCH   Result Cache: RC Latch                            50
 ------------------------------------------------------------
 4. About
 ------------------------------------------------------------------------------------------
 - MyStats v3.0 by Adrian Billington (http://www.oracle-developer.net)
 - Based on the SNAP_MY_STATS utility by Jonathan Lewis
 ==========================================================================================

RC Latch usage is much reduced, but still seems to be quite high, it is taken 50 times for 100 hard parses.

Further analysis of the SQL Trace identifies the following SQL as the culprit:

SELECT /*+ RESULT_CACHE (SYSOBJ=TRUE SHELFLIFE=30) */ SNAP_ID,DBID,INSTANCE_NUMBER,SQL_ID,PARSING_SCHEMA_NAME,END_OF_FETCH_COUNT_DELTA,EXECUTIONS_DELTA,PX_SERVERS_EXECS_DELTA,BUFFER_GETS_DELTA,CPU_TIME_DELTA,ELAPSED_TIME_DELTA,FLAG,CON_ID FROM "SYS"."INT$DBA_HIST_SQLSTAT" "INT$DBA_HIST_SQLSTAT" WHERE ("INT$DBA_HIST_SQLSTAT"."SQL_ID"=:1 AND "INT$DBA_HIST_SQLSTAT"."PARSING_SCHEMA_NAME"=:2 AND ("INT$DBA_HIST_SQLSTAT"."CON_ID"=0 OR "INT$DBA_HIST_SQLSTAT"."CON_ID"=3) AND BITAND(NVL("INT$DBA_HIST_SQLSTAT"."FLAG",0),1)=0) AND ("INT$DBA_HIST_SQLSTAT".CON_ID=0 OR "INT$DBA_HIST_SQLSTAT".CON_ID=3)

Paraphrasing Christian’s presentation this lookup is performed to check from AWR whether SQL statement has been seen previously, to determine how long to spend on ADS during optimization.  I can’t see any method to disable Results Cache for this statement (apart from disabling Results Cache itself).  Note also, that I suspect this behavior is also related to multi-tenant, as I can’t reproduce this (either the RC Latch usage or even the lookup itself) on a non multi-tenant instance.

In our case have we disabled ADS at application schema level by setting the following in a login trigger, while we try to work with application teams to reduce hard parsing.

alter session set optimizer_dynamic_sampling = 0;