AWR Warehouse and Data Guard – Conclusions

Unfortunately AWR Warehouse and Data Guard do not work together seamlessly, however by understanding some of the implementation details, and following some simple steps you can help to mitigate most problems that may be encountered.

It is not possible to add an instance to AWRW when it is in standby mode, worse still if you do configure the primary and then switch over to an instance not configured in AWRW, the extract jobs will continue to dump files to local storage, but those files will never be transferred or loaded.

For this reason, I would suggest if setting up a pair of instances in data guard configuration from scratch to consider adding both to AWRW as part of initial configuration process. If this is an existing system where you cannot switchover to add the standby to AWRW, you should consider adding it the next time you do switchover.

Whenever you add a standby instance to AWRW make sure to backup the extract metadata table beforehand, and restore it before performing the first extract (remember to fix the sequence on this table too), to make sure you do not extract the same snapshots from multiple instances.

While it is possible to update the mapping table to force AWRW to upload from both instances to the same DBID, I think it is not worth the risk you run of breaking things by performing such changes. The other downside is that removal of either instance from AWRW triggers the removal of AWR history from both instances from the repository.

The downside of having the different instances of the same database mapped to different DBIDs is extra complexity in queries you run against the AWRW for analysis, especially if you switchover for significant periods of time. In addition, AWR reports or ASH analytics will not be work on consolidation of both instances, you will have to perform those tasks against each instance separately.

In addition as long as you are on DB plugins 13.2.2 or later (and if you are not you should think about upgrading), you may also consider updating the properties file such that data is extracted to a location accessible from both instances. That way files extracted from one instance will still get uploaded from the other after switchover. If you do not, you may need to perform some manual steps to ensure snapshots are not missed during switchover.

AWR Warehouse and Data Guard (Part Five)

Continuing the investigation, I wanted to check the effect of deleting one of the two data guard instances from AWRW; remember I have updated the mapping table such that both DGA and DGA both map to the same DBID.

We can check from load metadata and AWR itself which snapshots have been loaded

SQL> SELECT
    dump_id,
    dbid,
    target_name,
    begin_snap_id,
    end_snap_id
FROM
    dbsnmp.caw_load_metadata
ORDER BY dump_id;  

   DUMP_ID     DBID TARGET_NAM BEGIN_SNAP_ID END_SNAP_ID
---------- ---------- ---------- ------------- -----------
    14 1852644603 DGA            12        38
    15 1852644603 DGB            39        41
    16 1852644603 DGB            42        44

SQL> SELECT
    MIN(snap_id),
    MAX(snap_id),
    COUNT(*)
FROM
    dba_hist_snapshot
WHERE
    dbid = 1852644603
ORDER BY snap_id;  

MIN(SNAP_ID) MAX(SNAP_ID)   COUNT(*)
------------ ------------ ----------
      12           44      33

So we have 33 snapshots, loaded from both DGA and DGB.
I then delete DGB from AWRW via the GUI. After waiting a few minutes for the scheduler task that actually does the deletion to kick in, I recheck the AWR data.

SQL> SELECT
    MIN(snap_id),
    MAX(snap_id),
    COUNT(*)
FROM
    dba_hist_snapshot
WHERE
    dbid = 1852644603
ORDER BY snap_id;
  
MIN(SNAP_ID) MAX(SNAP_ID)   COUNT(*)
------------ ------------ ----------
                   0

Because databases are mapped to the same DBID, all AWR history for both databases has been removed from AWR Warehouse.

One final thing I wanted to check was that with the updated mapping record, AWRW data was really accessible via OEM.

I re-added DGB back to AWRW, mapped to its real DBID using the steps before.
From the AWRW page I highlight DGA and select “Compare Periods->Compare Periods Report”. (Note this is a powerful feature of AWRW probably deserving of it’s own entry):

compare_periods

I make sure AWR Data Source is “Historical – AWR Warehouse”:

select_snapshot

I then check I can pick a pair of uploaded snapshots, in this case 49 and 50.
For the second period I can select a different database. Note in particular how this screen shows the AWRW mappings  I am then able to pick the exact same snapshots, 49 and 50, for DGB.
I am able to run the AWR comparison report for the same snapshots, but associated with the different databases.

Index WRH$_SGASTAT_U has status UNUSABLE after upgrade to 12.2

Update 2016-04-26: this is new bug 25954054 – WRH$_SGASTAT_U BECOMING UNUSABLE STATE IN UPGRADED DB.

After upgrading our OMR and AWRW instances from 12.1 to 12.2, our monitoring revealed that index SYS.WRH$_SGASTAT_U was becoming unusable at regular intervals.
The first couple of times, we just rebuilt it in the hope that the problem would just go away(!); of course it didn’t, so I had to do a little bit of investigation.

Reviewing the alert log identified the exact time that the problem had occurred:

2017-04-12T08:11:43.674485+08:00
Some indexes or index [sub]partitions of table SYS.WRH$_SGASTAT have been marked unusable

Grepping for “SGASTAT” in the trace directory, also revealed that the index being unusable was causing inserts into AWR had been failing:

*** 2017-04-12T12:00:18.338061+08:00
*** SESSION ID:(289.8864) 2017-04-12T12:00:18.338092+08:00
*** CLIENT ID:() 2017-04-12T12:00:18.338103+08:00
*** SERVICE NAME:(SYS$BACKGROUND) 2017-04-12T12:00:18.338108+08:00
*** MODULE NAME:(MMON_SLAVE) 2017-04-12T12:00:18.338119+08:00
*** ACTION NAME:(Auto-Flush Slave Action) 2017-04-12T12:00:18.338125+08:00
*** CLIENT DRIVER:() 2017-04-12T12:00:18.338132+08:00

*** KEUUXS - encountered error: (ORA-01502: index 'SYS.WRH$_SGASTAT_U' or partition of such index is in unusable state
)
  *** SQLSTR: total-len=464, dump-len=464,
      STR={insert into  wrh$_sgastat  (dbid, per_pdb, con_dbid, snap_id, instance_number, pool, name, bytes)  select distinct  :dbid, 0 per_pdb,  :srcdbid  con_dbid,  :snap_id, :instance_number, pool, name, bytes FROM     (select con_id, pool, name, bytes,             100*(bytes) / (sum(bytes) over (partition by pool)) part_pct        from  v$sgastat      ) sgastat  WHERE (1 = 1)   and part_pct >= 1      or pool is null      or name = 'free memory' or name = 'row cache' }
*** KEWRAFM1: Error=13509 encountered by kewrfteh

As we still had the redo logs for the period of time identified by the alert log, I fired up logminer, and right after a bunch of inserts into WRH$_SGASTAT, I found the following statement:

alter table WRH$_SGASTAT drop partition WRH$_SGASTA_1521710113_17057;

I am guessing this is part of the standard mechanism for purging AWR data based on configured retention settings.
A quick check reveals that of the partitioned AWR base tables, 3 have indexes that are not partitioned:

SQL> SELECT                                                                                                                                                                                                          
  2     partitioned,                                                                                                                                                                                                 
  3     COUNT(*)                                                                                                                                                                                                     
  4  FROM                                                                                                                                                                                                            
  5     dba_indexes                                                                                                                                                                                                  
  6  WHERE                                                                                                                                                                                                           
  7     index_type = 'NORMAL'                                                                                                                                                                                        
  8  AND                                                                                                                                                                                                             
  9     table_name IN (                                                                                                                                                                                              
 10        SELECT                                                                                                                                                                                                    
 11           table_name                                                                                                                                                                                             
 12        FROM                                                                                                                                                                                                      
 13           dba_tables                                                                                                                                                                                             
 14        WHERE                                                                                                                                                                                                     
 15           table_name LIKE 'WRH$\_%' ESCAPE '\'                                                                                                                                                                   
 16        AND                                                                                                                                                                                                       
 17           partitioned = 'YES'                                                                                                                                                                                    
 18     )                                                                                                                                                                                                            
 19  GROUP BY                                                                                                                                                                                                        
 20     partitioned;                                                                                                                                                                                                 
                                                                                                                                                                                                                     
PAR   COUNT(*)                                                                                                                                                                                                       
--- ----------                                                                                                                                                                                                       
NO           3                                                                                                                                                                                                       
YES         35                                                                                                                                                                                                       

SQL> SELECT                                                                                                                                                                                                          
  2     table_owner,                                                                                                                                                                                                 
  3     table_name,                                                                                                                                                                                                  
  4     index_name                                                                                                                                                                                                   
  5  FROM                                                                                                                                                                                                            
  6     dba_indexes                                                                                                                                                                                                  
  7  WHERE                                                                                                                                                                                                           
  8     index_type = 'NORMAL'                                                                                                                                                                                        
  9  AND                                                                                                                                                                                                             
 10     table_name IN (                                                                                                                                                                                              
 11        SELECT                                                                                                                                                                                                    
 12           table_name                                                                                                                                                                                             
 13        FROM                                                                                                                                                                                                      
 14           dba_tables                                                                                                                                                                                             
 15        WHERE                                                                                                                                                                                                     
 16           table_name LIKE 'WRH$\_%' ESCAPE '\'                                                                                                                                                                   
 17        AND                                                                                                                                                                                                       
 18           partitioned = 'YES'                                                                                                                                                                                    
 19     )                                                                                                                                                                                                            
 20  AND partitioned = 'NO';                                                                                                                                                                                         
                                                                                                                                                                                                                     
TABLE_OWNER                    TABLE_NAME                     INDEX_NAME                                                                                                                                             
------------------------------ ------------------------------ ------------------------------                                                                                                                         
SYS                            WRH$_SGASTAT                   WRH$_SGASTAT_U                                                                                                                                         
SYS                            WRH$_CON_SYSMETRIC_HISTORY     WRH$_CON_SYSMET_HISTORY_INDEX                                                                                                                          
SYS                            WRH$_SYSMETRIC_HISTORY         WRH$_SYSMETRIC_HISTORY_INDEX                                                                                                                           

Running the same query on both 11.2 and 12.1 instances reveals no such mis-configured indexes We are following up with Oracle Support, but I would be interested to know if anyone else can check my findings on their 12.2 instances.

Working with Oracle Support we have worked around the issue using the following script:

ALTER TABLE wrh$_sgastat DROP CONSTRAINT wrh$_sgastat_u;

ALTER TABLE wrh$_sgastat ADD CONSTRAINT wrh$_sgastat_u UNIQUE ( dbid,snap_id,instance_number,name,pool,con_dbid )
   USING INDEX
      LOCAL
   TABLESPACE sysaux;

In addition, I believe I have found the bug in file $ORACLE_HOME/rdbms/admin/c1201000.sql at line 3036:

alter table wrh$_sgastat drop constraint wrh$_sgastat_u;

alter table wrh$_sgastat add constraint wrh$_sgastat_u
        unique (dbid, snap_id, instance_number, name, pool, con_dbid, stattype);

AWR Warehouse and Data Guard (Part Four)

After a brief interlude I wanted to follow up on something I noticed on John Hallas’s blog.

He mentioned that specifying a shared location for the extract metadata did not work, and was waiting on a fix for bug number 21826657. I checked that this bug should be fixed in db plugins version 13.2.2 and that my test hosts dga and dgb are both running this version.

To test whether the fix was effective, I first updated the extract properties table to specify a location  shared between both hosts, /home/oracle/awrw, generated a couple of snapshots (current primary is DGA) and then ran an extract process.

SQL> update dbsnmp.caw_extract_properties
  2     set property_value = '/home/oracle/awrw'
  3     where property_name = 'dump_dir_1';

1 row updated.

SQL> commit;

Commit complete.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbsnmp.mgmt_caw_extract.run_job_now;

PL/SQL procedure successfully completed.

I double-check the files are extracted to /home/oracle/awrw as expected. You can see from the filename that this corresponds to snapshots with ids between 42 and 44.

[oracle@dgb awrw]$ ls -alrt /home/oracle/awrw
total 12796
drwx------. 20 oracle oinstall     4096 Apr  8 12:21 ..
drwxr-xr-x.  1 oracle oinstall      136 Apr  9  2017 .
-rw-r-----.  1 oracle oinstall 13066240 Apr  9  2017 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp
-rw-r--r--.  1 oracle oinstall    31656 Apr  9  2017 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.log
[oracle@dgb awrw]$

I then switch-over to DGB and then performed “Upload Snapshots Now”.
Reviewing the steps for the Transfer Task looks promising, below are the outputs of a couple of the transfer subtask; first of all createArchive:

conn_url: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=dgb.localdomain)(PORT=1521)))(CONNECT_DATA=(SID=DG))), user: system, role: normal
Connected to database
Number of files ready for transfer: 1
Maximum total size of dump files: 5368709120
file: 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp size(bytes): 13066240
Number of files picked up for transfer: 1
Updated status of dump files, ready to transfer
Creating archive file using [/app/oracle/agent/agent_13.2.0.0.0/bin/zip -v 4CA58C2B03FF5F03E0530201A8C0052E.zip 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp]
  adding: 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp .	(in=13066240) (out=1116898) (deflated 91%)
total bytes=13066240, compressed=1116898 -> 91% savings
Created archive 4CA58C2B03FF5F03E0530201A8C0052E.zip
Disconnected from database

Next prepareDestinationFiles:

Extracting dump files from /app/oracle/agent/agent_inst/awr_t/4CA58C2B03FF5F03E0530201A8C0052E.zip
Extracted 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp
extract complete
Deleted archive 4CA58C2B03FF5F03E0530201A8C0052E.zip
conn_url: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.2)(PORT=1521)))(CONNECT_DATA=(SID=OMR))), user: sys, role: sysdba
Connected to database
dumpfile: 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp, emid: 1, dbid: 1852644603, bid: 42, eid: 44, size: 13066240
source target: DGB, type: oracle_database
Added dump files to the import queue
Disconnected from database

The output of the loadAWR task reveals that 3 snapshots have been successfully loaded to AWRW:

conn_url: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.2)(PORT=1521)))(CONNECT_DATA=(SID=OMR))), user: sys, role: sysdba
Connected to database
Loaded 3 snapshots in the AWR Warehouse.
Disconnected from database

Reviewing load metadata you can see that the snapshots that were extracted from DGA have been successfully loaded from DGB.

SQL> select dump_id, target_name, begin_snap_id, end_snap_id, status from dbsnmp.caw_load_metadata order by dump_id;

 SQL> select begin_snap_id, end_snap_id, target_name, status from dbsnmp.caw_load_metadata order by dump_id;

	   12	       38 DGA		      3
	   39	       41 DGB		      3
	   42	       44 DGB		      3
SQL>

So it seems that it is now OK to update the location that the dumpfiles are extracted to. If this is a shared location accessible by both hosts in dataguard configuration, then as we have proved, any data extracted from one host, can subsequently be transferred and loaded from the another host without requiring manual intervention.
This may be of use with AWRW in a RAC configuration (something I haven’t yet investigated).

AWR Warehouse and Data Guard (Interlude)

Note, in preparing next entry, I reviewed the results from previous entries and found that although extracts were being transferred successfully they had all failed to load; I should have noticed they had status 4 (IMPORT_FAILED), in the load metadata table.

Reviewing dbms_scheduler_job_run_details of the load task revealed the following error:

"move error: 
ORA-20105: Unable to move AWR data to SYS
1852644603 1_4CB614B101EE2E5EE0530401A8C066F0_1852644603_12_37.dmp 1 DGA oracle_database 1852644603"

I removed both databases from ARWR, and playing around with the internal functions used to register and un-register the databases revealed that the partitions had got in a mess.

 
SQL> exec dbms_swrf_internal.unregister_database(1852644603);
SQL> exec dbms_swrf_internal.register_database(1852644603);
BEGIN dbms_swrf_internal.register_database(1852644603); END;

*
ERROR at line 1:
ORA-13509: error encountered during updates to a AWR table
ORA-14012: resulting partition name conflicts with that of an existing partition
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 105
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 184
ORA-06512: at line 1

To resolve this, I unregistered the database again using the internal function, then identified the ‘orphan’ partitions causing problems using the following SQL (1852644603 is the DBID of the problem database):

 
SELECT * FROM DBA_TAB_PARTITIONS WHERE PARTITION_NAME LIKE '%1852644603%'

Partitions for the unregistered database still existed for the following tables:

  • WRH$_CON_SYSMETRIC_HISTORY
  • WRH$_CON_SYSSTAT
  • WRH$_CON_SYSTEM_EVENT
  • WRH$_CON_SYS_TIME_MODEL
  • WRM$_PDB_IN_SNAP

I then dropped these problem partitions (this is just a play database, that is why I am able to be so reckless modifying internal Oracle tables, do you own research and testing before making such changes, and make sure you understand the risks.)

After I did this, I was able re-add DGA to ARWR and upload some snapshots successfully; this time I actually checked the records existed in AWRW using the following query:

SQL> SELECT DISTINCT DBID FROM DBA_HIST_SNAPSHOT;

1585153544
1852644603
SQL> 

I performed the same steps as before (backup extract metadata table, add DGB to AWRW, switchover to DGB, change the mapping record for DGA, restore the extract record, Upload Snapshots Now, then restore the mapping record for DGA). Snapshots from DGB also uploaded successfully with the original DBID>

It distinctly possible that I messed up the partitioning of the AWR tables by my hacks however there are a few things that make me think this is not the case. The original upload from DGA failed (this was before I had made any non-standard changes). Also repeating my steps after manually fixing the partitions, everything worked fine. It is also worth noting that the tables causing the problem are all new in 12.2 (DGA and DGB are fresh installed 12.2 instances, the AWR Warehouse repository is 12.2 upgraded from 12.1). We also had some similar issues on our production AWR Warehouse after we upgraded it from 12.1 to 12.2 (a different error, but also related to AWRW partitions). So I suspect some bug where AWRW is not properly handling partitions on new 12.2 AWR tables.

Messing around with the register/un-register functions also reminds me of another thing to be aware of regards my ‘hack’ on the CAW_DBID_MAPPING table. If you subsequently remove the either database from AWRW (via the GUI) it will also remove all records from AWR history from both databases. This may or may not be what you want, you should be aware of it.

TNS-12599: TNS:cryptographic checksum mismatch

Checking the alert log of some of one of the instances we have recently upgraded to 12.2 I noticed the following error message occurring frequently:

NI cryptographic checksum mismatch error: 12599.

VERSION INFORMATION:
TNS for Linux: Version 12.2.0.1.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
Time: 06-APR-2017 11:38:05
Tracing to file:
Tns error struct:
ns main err code: 12599

TNS-12599: TNS:cryptographic checksum mismatch
ns secondary err code: 12656
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0

Comparing with entries in listener.log file, I identified that these corresponded to connections from Enterprise Manager. Reproducing the issue was easy, I navigated to the database in Enterprise Manager, went through the menus “Oracle Database->Target Setup->Monitoring Configuration” and clicked on the “Test Connection button”.

Every time the test succeeded, but the message was appended to the alert log.

MOS note 1927120.1 goes into some detail about the error message, including the fact that is is safe to ignore. It should be resolved when Enterprise Manager is updated to include latest JDBC driver.

One option listed to prevent the issue appearing in the alert log is to set DIAG_ADR_ENABLED=OFF in sqlnet.ora (it will be then instead be logged to sqlnet.log).

I found that setting the following two parameters in sqlnet.ora also meant the warning did not occur (setting either parameter by itself was not sufficient).

  • SQLNET.CRYPTO_CHECKSUM_SERVER=rejected
  • SQLNET.ENCRYPTION_SERVER=rejected

AWR Warehouse and Data Guard (Part Three)

This entry continues with the setup configured in parts one and two. I stated that ideally you should perform “Upload Snapshots Now” on the primary database before performing switch-over to prevent remove the possibility of missing snapshots in AWRW. We will see what happens if you don’t, either because you forgot, or it was not possible due to fail-over, and review some options to resolve the situation.

Do your own due diligence before updating any internal AWR data, be prepared to accept the consequences if you break things.

Primary database is currently DGA, I generate a couple of snapshots and perform a manual extract (note during the normal course of operations, this extract runs independently of the transfer and load components, so it is possible for there to be snapshots which have been extracted, but not transferred to AWRW host). The extract metadata table shows one extraction with status 2, the data has been extracted but has not yet been transferred to AWRW.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbsnmp.mgmt_caw_extract.run_extract;

PL/SQL procedure successfully completed.

SQL> select extract_id, begin_snap_id, end_snap_id, status from dbsnmp.caw_extract_metadata order by extract_id;

EXTRACT_ID BEGIN_SNAP_ID END_SNAP_ID      STATUS
---------- ------------- ----------- ----------
      1            1       10           5
     22           11       13           5
     41           14       15           5
     61           16       18           5
     81           19       24           2

I perform switch-over so that DGB is primary. Because the extract metadata references a file on host dga, the transfer task on dga will fail (as the database is in standby mode), and the one on dgb will also fail because the extract file is actually on dga. As no additional extracts have taken place, this ‘orphan’ extract record can be deleted (backup the table first if you are playing it safe). The next extract on DGB will re-extract the missing snapshots.

SQL> delete from dbsnmp.caw_extract_metadata where extract_id = 81;

1 row deleted.

SQL> commit;

Commit complete.

Sure enough, when I run “Upload Snapshots Now” against DGB, the job succeeds, and reviewing the load metadata table reveals that the snapshots initially extracted from DGA have been re-extracted successfully from DGB (along with some other snapshots that have since taken place).

SQL> select dump_id, target_name, begin_snap_id, end_snap_id, status from dbsnmp.caw_load_metadata order by dump_id;

   DUMP_ID TARGET_NAM BEGIN_SNAP_ID END_SNAP_ID     STATUS
---------- ---------- ------------- ----------- ----------
	 4 DGA			  1	     10 	 4
	 5 DGB			 11	     13 	 4
	 6 DGB			 14	     15 	 4
	 7 DGA			 16	     18 	 4
	 8 DGB			 19	     28 	 4

However if you don’t do this in time,  subsequent snapshots may get extracted, transferred and loaded successfully creating a gap in the snapshots held in AWRW. To reproduce this situation, I create some more snapshots and kick of the extract process.
Notice this time that the run_extract doesn’t actually do anything (in the output we can see that as it has run recently so it doesn’t actually do anything). This is part of the design of the ETL process to have minimal impact on the source database. I override this behavior by calling the procedure run_job_now.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> set serveroutput on size unlimited
SQL> exec dbsnmp.mgmt_caw_extract.run_extract;
Starting CAW Extract...
Fetched CAW properties.
Retrying most recent failure...
No recent failure to retry.
Preparing for new Extract...
Upload interval is 24 hr.
Time elapsed since last successful extract: 0 hr.
Not enough time elapsed. Quitting.
No non-recoverable extracts to retry.

SQL> exec dbsnmp.mgmt_caw_extract.run_job_now;
run_job_now completed.

PL/SQL procedure successfully completed.

SQL> select extract_id, begin_snap_id, end_snap_id, status from dbsnmp.caw_extract_metadata order by extract_id;

EXTRACT_ID BEGIN_SNAP_ID END_SNAP_ID	 STATUS
---------- ------------- ----------- ----------
	 1	       1	  10	      5
	22	      11	  13	      5
	41	      14	  15	      5
	61	      16	  18	      5
       101	      19	  28	      5
       121	      29	  30	      2

6 rows selected.

I then switch-back (primary reverts to DGA), create a couple more snapshots and perform “Upload Snapshots Now”. The process completes successfully, however checking the load metadata reveals that some snapshots have indeed been missed.

SQL> select target_name, dbid, begin_snap_id, end_snap_id from dbsnmp.caw_load_metadata order by dump_id;

TARGET_NAM	 DBID BEGIN_SNAP_ID END_SNAP_ID
---------- ---------- ------------- -----------
DGA	   1852644603		  1	     10
DGB	   1852644603		 11	     13
DGB	   1852644603		 14	     15
DGA	   1852644603		 16	     18
DGB	   1852644603		 19	     28
DGA	   1852644603		 31	     32

6 rows selected.

Checking the extract metadata in reveals the status of this extract is 5 (transferred successfully). (I don’t know why; the load metadata showed that it has not been received into AWRW).

SQL> select extract_id, BEGIN_SNAP_ID, END_SNAP_ID, STATUS from dbsnmp.caw_extract_metadata order by extract_id;

EXTRACT_ID BEGIN_SNAP_ID END_SNAP_ID	 STATUS
---------- ------------- ----------- ----------
	 1	       1	  10	      5
	22	      11	  13	      5
	41	      14	  15	      5
	61	      16	  18	      5
       101	      19	  28	      5
       121	      29	  30	      5
       141	      31	  32	      5

7 rows selected.

To resolve this situation, I update the extract metadata record to error status, to force these snapshots to be are re-extracted.

SQL> update dbsnmp.caw_extract_metadata set status = 3, e_retry_count = 0 where extract_id = 121;

1 row updated.

SQL> commit;

Commit complete.

After running Upload Snapshots Now, reviewing the extract metadata reveals that the missing snapshots have been re-extracted (the record has been updated to status 5).

SQL> select extract_id, BEGIN_SNAP_ID, END_SNAP_ID, STATUS from dbsnmp.caw_extract_metadata order by extract_id;

EXTRACT_ID BEGIN_SNAP_ID END_SNAP_ID	 STATUS
---------- ------------- ----------- ----------
	 1	       1	  10	      5
	22	      11	  13	      5
	41	      14	  15	      5
	61	      16	  18	      5
       101	      19	  28	      5
       121	      29	  30	      5
       141	      31	  32	      5

7 rows selected.

Reviewing load metadata reveals that the missing snapshots have finally been loaded successfully into AWRW, the gap has been eliminated.

SQL> select target_name, dbid, begin_snap_id, end_snap_id from dbsnmp.caw_load_metadata order by dump_id;

TARGET_NAM	 DBID BEGIN_SNAP_ID END_SNAP_ID
---------- ---------- ------------- -----------
DGA	   1852644603		  1	     10
DGB	   1852644603		 11	     13
DGB	   1852644603		 14	     15
DGA	   1852644603		 16	     18
DGB	   1852644603		 19	     28
DGA	   1852644603		 31	     32
DGA	   1852644603		 29	     30

7 rows selected.

In summary even though out of the box, AWR Warehouse does not support data-guard configuration, with a little bit of hacking it is possible to get it to work reasonably well.

Here’s hoping that in a later release these issues will have been worked out, and it will work out of the box.

Reasons to love AWR Warehouse

I realize that the majority of blog posts I have made are for problems I have encountered. I think is only natural as most of my time is spent investigating the few occasions where things don’t work as expected rather than the 99.99% of the time when things just work. Even when there are problems the code is normally very well instrumented, and there is such a wealth of great technical information available, troubleshooting issues can be almost enjoyable.

With a view to re-balance this state of affairs somewhat, I thought I would put down a few thoughts on AWR Warehouse. I am continually surprised that so few people seem to be using it, so I will try to do my part address this, and hopefully persuade you to give it a shot. If you are not using it, feel free to share via a comment why not; if you are using it also feel free to share any thoughts, good or bad about the product.

Licensing

One of the main reasons to consider using AWRW can be found in the Database Licensing Information User Manual. I quote the relevant section:

A separate single instance Oracle Database can be installed and used as an infrastructure repository for RMAN, Oracle Enterprise Manager Cloud Control, Automatic Workload Repository (AWR) Warehouse, Global Data Services Catalog, and Grid Infrastructure Management Repository without additional license requirements, provided that all the targets are correctly licensed. It may not be used or deployed for other uses.

The infrastructure repositories for RMAN, Oracle Enterprise Manager Cloud Control, AWR Warehouse, and Global Data Services Catalog can be shared in one database, or deployed in separate databases. Enterprise Edition must be used for the infrastructure repository database(s).

If Oracle RAC is used for the infrastructure database, then Oracle RAC licenses for all servers, and Enterprise Edition licenses for all but the first server, are required.

If Data Guard is used for the infrastructure database, then an Enterprise Edition license for the standby server is required.

So paraphrasing (note I am neither a lawyer nor a licensing expert), as long as the source databases are correctly licensed,  you don’t use RAC or DataGuard, and you don’t use it for anything else, you don’t need to worry to worry licensing on AWR Warehouse Repository.  This can (in fact must be) Enterprise Edition as it makes extensive use of the partitioning option.

There are not many occasions when Oracle corporation gives stuff out for free, when they do I suggest to take advantage.

The licensing advantages become even more attractive when you realize that AWRW actually saves you licensing on source databases by allowing you to offload AWR data storage and analysis from the source database.

Note, don’t even think about using an existing database for this purpose, especially not the OEM Repository database; trust me, on this, you’ll thank me later.

Tight Integration into Enterprise Manager

Database administrators  that have not embraced Enterprise Manager (OEM) for managing their databases, may consider this disadvantage, however I believe that all companies that manage more than a handful of databases, should be fully utilizing the power of OEM; its benefits far outweigh any overhead in administration.

AWRW initial configuration and the addition of database sources is basically a case of making sure they are discovered within OEM, defining preferred credentials, and then clicking a few buttons.

Once databases are contributing to AWRW, making use of the historical data is again just a case of clicking a few buttons.

AWR reports can be easily run for data in AWRW rather than just local AWR data, comparisons between different databases from at the same, or different, periods of time can be made.

ASH analytics is another area where the power of AWRW and OEM combine to provide some powerful tools for in-depth analysis.

Database Versions

The AWRW documentation states that:

(The AWR Warehouse Repository) must be an equal or higher database version of the source databases it accommodates.

You might consider this a disadvantage, however I see it differently. It is an opportunity to try out upgrading and using new database versions with non-trivial (actually substantial) volumes of data before rolling out to databases where developers or users complain in case things go wrong.

We recently upgraded our AWRW Repository to 12.2.  We had a few issues (as expected), but we could take our time resolving them, because we (the DBAs) were the only ones who noticed any outage.  After this upgrade completed, we then felt brave enough to upgrade our OMR Repository database which went through much more smoothly.

Of course perform due diligence, practice on a test system, and make sure you have backups  which will allow you to roll back in case things go wrong.

Summary

It’s license free, extends the power of Enterprise Manager, and gives you low-risk exposure to new database versions.

Why aren’t you using it?  Seriously;  let me know via comments.

AWR Warehouse and Data Guard (Part Two)

We will continue with setup configured for the previous blog entry.
This post includes details about updating some internal AWRW tables. These steps are  provided for education purposes only; if you do this on a real system make sure you know exactly what you are doing and understand the risks. If you break anything, Oracle Support or I may not be able or willing to help you.

First I delete the two instances DGA and DGB previously configured from AWR Warehouse with a view to re-configuring them to resolve some of the problems we encountered last time.

I re-add DGA and perform “Upload Snapshots Now” and make sure an entry is created in the DBID mapping table.

SQL> SELECT target_name, old_dbid, new_dbid FROM DBSNMP.CAW_DBID_MAPPING order by mapping_id;

TARGET_NAME   OLD_DBID	 NEW_DBID
----------- ---------- ----------
cawr	    1585153544 1585153544
DGA	    1852644603 1852644603

I switch-over to instance DGB. Before re-adding DGB to AWR Warehouse, I backup the extract metadata table.

SQL> create table dbsnmp.caw_extract_metadata_backup as select * from dbsnmp.caw_extract_metadata;

Table created.

I then re-add DGB to AWRW, but before uploading snapshots I update the mapping for DGA temprarily to set new_dbid column to a different value 1.

SQL> update DBSNMP.CAW_DBID_MAPPING set new_dbid=1 where target_name='DGA';
1 row updated.
SQL> COMMIT;
Commit complete.

I then verify that caw_extract_metadata is empty (it has been  recreated when adding DGB to AWRW) and restore the record from my backup table. This should fool AWR extract process into realizing which snapshots have already been uploaded from DGA. I also update the relevant sequence so there will not get an error when creating  a new extract metadata record.  I create a couple of snapshots for testing.

SQL> select count(*) from dbsnmp.caw_extract_metadata;
  COUNT(*)
----------
	 0
SQL> insert into dbsnmp.caw_extract_metadata select * from dbsnmp.caw_extract_metadata_backup;
1 row created.
SQL> commit;
Commit complete.
SQL> select dbsnmp.caw_extract_id_seq.nextval from dual;
21
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.

Finally I perform “Upload Snapshots Now” on DGB.  The process completes successfully.  Let’s check the contents of the dbid mapping table:

SQL> select mapping_id, target_name, old_dbid, new_dbid from DBSNMP.CAW_DBID_MAPPING;
MAPPING_ID TARGET_NAME	 OLD_DBID   NEW_DBID
---------- ----------- ---------- ----------
	 1 cawr        1585153544 1585153544
	 4 DGA	       1852644603	   1
	 5 DGB	       1852644603 1852644603

Because we had temporarily re-mapped DGA, OEM has not had to remap the dbid for DGB.  I now revert the  change we made to DGA mapping and review the mappings:

SQL> update DBSNMP.CAW_DBID_MAPPING set new_dbid = 1852644603 where target_name = 'DGA';
1 row updated.
SQL> commit;
Commit complete.
SQL> select mapping_id, target_name, old_dbid, new_dbid from DBSNMP.CAW_DBID_MAPPING;
MAPPING_ID TARGET_NAME	 OLD_DBID   NEW_DBID
---------- ----------- ---------- ----------
	 1 cawr        1585153544 1585153544
	 4 DGA	       1852644603 1852644603
	 5 DGB	       1852644603 1852644603

To double check everything is OK after this change, I create some new snapshots, and perform “Upload Snapshots Now” again on DGB.  I then switch back to DGA, create some more snapshots, and again perform “Upload Snapshots Now”.
Both tasks complete successfully, and I can check everything is as hoped from the load metadata table:

SQL> select target_name, dbid, begin_snap_id, end_snap_id from dbsnmp.caw_load_metadata order by dump_id;

TARGET_NAME	  DBID BEGIN_SNAP_ID END_SNAP_ID
----------- ---------- ------------- -----------
DGA	    1852644603		   1	      10
DGB	    1852644603		  11	      13
DGB	    1852644603		  14	      15
DGA	    1852644603		  16	      18

One thing to note with this ‘hack’ is that is best to perform “Upload Snapshots Now” on the active database before a switch-over is performed to make sure there are no extracts ‘left behind’ when the database become standby. Obviously this may not be possible in fail-over situations. I will try to cover work-arounds for this in a follow-up blog post.

Do not set job_queue_processes at PDB level in 12.1.0.2

MOS Doc ID 2175443.1 goes into more detail, including the following quote:

It is possible to set job_queue_processes at pdb level on 12.1.0.2 multitenant database, but it does not work correctly.

I have managed to reproduce the actual behavior. Of particular concern is that this allows a local user on one PDB to disrupt things on other PDBs.

My test system is running 12.1.0.2, multi-tenant with two PDBs, PDB1 and PDB2. Each PDB has a local user with DBA privileges, dba1, and dba2 respectively.

[oracle@awrw admin]$ sqlplus dba1/oracle@pdb1

SQL*Plus: Release 12.1.0.2.0 Production on Sun Apr 2 13:34:41 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sun Apr 02 2017 13:33:46 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show parameter job_queue_processes

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes		     integer	 1000
SQL> alter system set job_queue_processes=0 container=current;

System altered.

SQL> show parameter job_queue_processes

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes		     integer	 0
SQL>

So dba1 seems to have changed job_queue_processes, in their own PDB, pdb1.

The alert log reveals some cause for concern:

CJQ0 started with pid=34, OS id=31546
Stopping background process CJQ0
Sun Apr 02 13:35:01 2017
ALTER SYSTEM SET job_queue_processes=0 SCOPE=BOTH;

Sure enough, in PDB2, even though job_queue_processes parameter seems to be set to it’s default value, we are unable to use scheduler:

[oracle@awrw admin]$ sqlplus dba2/oracle@pdb2

SQL*Plus: Release 12.1.0.2.0 Production on Sun Apr 2 13:44:19 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sun Apr 02 2017 13:43:25 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show parameter job_queue_processes

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes		     integer	 1000
SQL> exec sys.dbms_scheduler.create_job( 'DUMMY_JOB', job_type=>'PLSQL_BLOCK', job_action=>'BEGIN NULL; END;');

PL/SQL procedure successfully completed.

SQL> exec sys.dbms_scheduler.run_job( 'DUMMY_JOB', false);
BEGIN sys.dbms_scheduler.run_job( 'DUMMY_JOB', false); END;

*
ERROR at line 1:
ORA-27492: unable to run job "DBA2"."DUMMY_JOB": scheduler unavailable
ORA-06512: at "SYS.DBMS_ISCHED", line 209
ORA-06512: at "SYS.DBMS_SCHEDULER", line 594
ORA-06512: at line 1

I connect back to PDB1 and reset the parameter:

[oracle@awrw admin]$ sqlplus dba1/oracle@pdb1

SQL*Plus: Release 12.1.0.2.0 Production on Sun Apr 2 13:46:40 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sun Apr 02 2017 13:34:41 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> alter system set job_queue_processes=1000 container=current;

System altered.
In the alert log I can see Job Queue Co-ordinator process is restarted:
Starting background process CJQ0
Sun Apr 02 13:47:08 2017
ALTER SYSTEM SET job_queue_processes=1000 SCOPE=BOTH;
Sun Apr 02 13:47:08 2017
CJQ0 started with pid=34, OS id=2227

And sure enough I am now able to run schedule jobs in PDB2.

[oracle@awrw ~]$ sqlplus dba2/oracle@pdb2

SQL*Plus: Release 12.1.0.2.0 Production on Sun Apr 2 13:48:51 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sun Apr 02 2017 13:44:19 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> exec sys.dbms_scheduler.run_job( 'DUMMY_JOB', false);

PL/SQL procedure successfully completed.