Friday, August 28, 2020

Diagnose DB performance issue from OS Process

Following steps are very useful when diagnosing performance issues related to Oracle processes
running on the server. When you run multiple databases on one server and are experiencing server-performance issues, it can be difficult to identify which database and session are consuming the most system resources. In these situations, use the top utility or the ps command to identify the highest-consuming processes. The top utility provides a dynamic interface that periodically refreshes to give you the current snapshot of resource consumption, whereas the ps command provides a quick one time snapshot of top resource usage. You may be wondering whether it is better to use top or ps to identify resource-consuming processes.
If your server has top installed, top will probably be your first choice; its ease of use and interactive output is hard to beat. Having said that, the ps command is universally available, so you may have to use ps if the top command is not available. Also the ps command may show you a more descriptive program name associated with the process.
Once you have a process identified, if it’s an Oracle process use the SQL query given below to further identify the type of Oracle process. You then have the option of trying to tune the operation
(whether it be SQL, RMAN, and so on), or you might want to terminate the process.

1. Run the below topcpu command to get the list of processes that are consuming high CPU. 

# alias topcpu='ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head'

# topcpu

[oracle@ed-ol-raclin1:MYDB1:~]$ topcpu
99.3 140618 oracle   ?        oracleMYDB1 (LOCAL=NO)
16.0   2223 oracle   ?        ora_j061_mydb1
 4.4  48053 root     ?        /u01/app/19.0.0.0/grid/bin/osysmond.bin
 2.3 319689 grid     ?        oracle+ASM1 (LOCAL=NO)

2. Now you can use the top consuming PID (140618) from the output as an input to the following query to show information about the Oracle session responsible for the high consumption of CPU resources,

SET LINES 200 PAGES 0 HEAD OFF LONG 100000
COL dummy_value NOPRINT
--
SELECT 'dummy_value' dummy_value,
'USERNAME : ' || s.username || CHR(10) ||
'SCHEMA : ' || s.schemaname || CHR(10) ||
'OSUSER : ' || s.osuser || CHR(10) ||
'MODULE : ' || s.program || CHR(10) ||
'ACTION : ' || s.schemaname || CHR(10) ||
'CLIENT INFO : ' || s.osuser || CHR(10) ||
'PROGRAM : ' || s.program || CHR(10) ||
'SPID : ' || p.spid || CHR(10) ||
'SID : ' || s.sid || CHR(10) ||
'SERIAL# : ' || s.serial# || CHR(10) ||
'KILL STRING : ' || '''' || s.sid || ',' || s.serial# || '''' || CHR(10) ||
'MACHINE : ' || s.machine || CHR(10) ||
'TYPE : ' || s.type || CHR(10) ||
'TERMINAL : ' || s.terminal || CHR(10) ||
'CPU : ' || q.cpu_time/1000000 || CHR(10) ||
'ELAPSED_TIME: ' || q.elapsed_time/1000000 || CHR(10) ||
'BUFFER_GETS : ' || q.buffer_gets || CHR(10) ||
'SQL_ID : ' || q.sql_id || CHR(10) ||
'CHILD_NUM : ' || q.child_number || CHR(10) ||
'START_TIME : ' || TO_CHAR(s.sql_exec_start,'dd-mon-yy hh24:mi') || CHR(10) ||
'STATUS : ' || s.status || CHR(10) ||
'SQL_TEXT : ' || q.sql_fulltext
FROM v$session s
JOIN v$process p ON (s.paddr = p.addr)
LEFT OUTER JOIN v$sql q ON (s.sql_id = q.sql_id)
WHERE s.username IS NOT NULL -- eliminates background procs
AND NVL(q.sql_text,'x') NOT LIKE '%dummy_value%' -- eliminates this query from output
AND p.spid = '&PID_FROM_OS'
ORDER BY q.cpu_time;


Enter value for pid_from_os: 161918
For this example, when you run the prior query and supply to it the high cpu consuming PID of 161918, you get the following output:

USERNAME : APPS
SCHEMA : APPS
OSUSER : applprod
MODULE : STANDARD@test-app01 (TNS V1-V3)
ACTION : APPS
CLIENT INFO : applprod
PROGRAM : STANDARD@test-app01 (TNS V1-V3)
SPID : 161918
SID : 645
SERIAL# : 57255
KILL STRING : '645,57255'
MACHINE : test-app01
TYPE : USER
TERMINAL :
CPU : 3150.605024
ELAPSED_TIME: 3227.807927
BUFFER_GETS : 693186617
SQL_ID : 86mp84saarg09
CHILD_NUM : 0
START_TIME : 28-aug-20 17:21
STATUS : ACTIVE
SQL_TEXT : SELECT PVS.VENDOR_SITE_CODE FROM WIP_DISCRETE_JOBS WDJ , BOM_DEPARTMENTS BD , PO_VENDOR_SITES_ALL PVS , PO_VENDORS PV WHERE WDJ.WIP_ENTITY_ID = :B1 AND WDJ.OWNING_DEPARTMENT = BD.DEPARTMENT_ID AND PVS.VENDOR_SITE_CODE = BD.DEPARTMENT_CODE AND PV.VENDOR_ID = PVS.VENDOR_ID

From the prior output, you see that a SQL*Plus session is consuming a great deal of CPU time. You can
also determine when the query started, the username, the kill string, and the SQL identifier from the output.

Once you identify information regarding the process, you can drill down further to display the execution plan and the resources the process is waiting for. For example, you can view the SQL execution plan with the SQL ID and child number:

SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('&sql_id',&child_num));

Here’s another useful query that uses the SID and serial number to display the state of a session and
whether it is working or waiting for a resource:

SELECT sid,
DECODE(state, 'WAITING','Waiting', 'Working') state,
DECODE(state, 'WAITING', 'So far '||seconds_in_wait,
'Last waited '|| wait_time/100)|| ' seconds for '||event
FROM v$session
WHERE sid = '&&session_id'
AND serial# = '&&serial';

Enter value for session_id: 645
Enter value for serial: 29301

SID STATE   DECODE(STATE,'WAITING','SOFAR'||SECONDS_IN_WAIT,'LASTWAITED'||WAIT_TIME/100)||'S
      1766 Working
Last waited -12345.67 seconds for gc current block 2-way

This information is very useful when diagnosing performance issues related to Oracle processes
running on the server.