Read Pro Oracle Database 11g Administration Online

Authors: Darl Kuhn

Tags: #Oracle DBA

Pro Oracle Database 11g Administration (100 page)

Let me paint a slightly different picture. I have an environment where one machine has a dozen databases running on it. There's a MySQL database, a PostgreSQL database, and a mix of Oracle version 9i, version 10g, and version 11g databases. Furthermore, many of these old databases are on non-terminal releases of Oracle and are therefore technically not supported by Oracle Support. There are no plans to upgrade any of these unsupported databases because the business can't take the risk of potentially breaking the applications that depend on these databases.

So what does one do in this type of environment when somebody reports that a database application is performing poorly? In this scenario, it's often something else in a different database that is causing other applications on the box to behave poorly. It may not be an Oracle process or an Oracle databases that is causing problems.

602

CHAPTER 22 ■ DATABASE TROUBLESHOOTING

In this situation, it's almost always more effective to start investigating issues by using an operating system tool. The OS tools are database agnostic. OS performance utilities help pinpoint where the most resources are consumed regardless of database vendor or version.

In Linux/Unix environments, there are several tools available for monitoring resource usage.

Table 22–1 summarizes the most commonly used OS utilities for diagnosing performance issues. Being familiar with how these operating system commands work and how to interpret the output will allow you to better diagnose server performance issues, especially when it's a non-Oracle or even a non-database process that is tanking performance for every other application on the box.

Table 22–1.
Performance and Monitoring Utilities

Tool

Purpose

vmstat

Monitors processes, CPU, memory, or disk I/O bottlenecks.

top

Identifies sessions consuming the most resources.

watch

Periodically runs another command.

ps

Identifies highest CPU- and memory-consuming sessions. Used to identify Oracle sessions consuming the most system resources.

mpstat

Reports CPU statistics.

sar

Displays CPU, memory, disk I/O, and network usage, both current and

historical.

free

Displays free and used memory.

df

Reports on free disk space.

du

Displays disk usage.

iostat

Displays disk I/O statistics.

netstat

Reports on network statistics.

When diagnosing performance issues, it's useful to determine where the operating system is constrained. For example, try to identify whether the issue is related to CPU, memory, I/O, or a combination of these resources.

Identifying System Bottlenecks

Whenever there are application performance issues or availability problems, seemingly (from the DBA’s perspective) the first question asked is, what’s wrong with the database? Regardless of the source of the problem, the onus is often on the DBA to either prove or disprove whether the database is behaving well.

I usually approach this issue by determining what system-wide resources are being consumed. There are two Linux/Unix operating system tools that are particularly useful for displaying system wide resource usage:

603

CHAPTER 22 ■ DATABASE TROUBLESHOOTING

• vmstat

• top

The vmstat (virtual memory statistics) tool is intended to help you quickly identify bottlenecks on your server. The top utility provides a dynamic real-time view of system resource usage. These two utility are discussed in the next two subsections.

Using vmstat

The vmstat utility displays real-time performance information about processes, memory, paging, disk I/O, and CPU usage. This example shows using vmstat to display the default output with no options specified:

$ vmstat

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa 14 0 52340 25272 3068 1662704 0 0 63 76 9 31 15 1 84 0

Here are some general heuristics you can use when interpreting the output of vmstat:

• If the wa (time waiting for I/O) column is high, this is usually an indication that the storage subsystem is overloaded.

• If b (processes sleeping) is consistently greater than 0, then you may not have enough CPU processing power.

• If so (memory swapped out to disk) and si (memory swapped in from disk) are consistently greater than 0, you may have a memory bottleneck.

By default, only one line of server statistics is displayed when running vmstat (without supplying any options). This one line of output displays average statistics calculated from the last time the system was rebooted. This is fine for a quick snapshot. However, if you want to gather metrics over a period of time, use vmstat with this syntax:

$ vmstat

While in this mode, vmstat reports statistics sampling from one interval to the next. For example, if you wanted to report system statistics every two seconds for ten intervals, you’d issue this command: $ vmstat 2 10

You can also send the vmstat output to a file. This is useful for analyzing historical performance over a period of time. This example samples statistics every 5 seconds for a total of 60 reports and records the output in a file:

$ vmstat 5 60 > vmout.perf

Another useful way to use vmstat is with the watch tool. The watch command is used to execute another program on a periodic basis. This example uses watch to run the vmstat command every five seconds and to highlight on the screen any differences between each snapshot: $ watch –n 5 –d vmstat

Every 5.0s: vmstat Thu Aug 9 13:27:57 2007

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 144 15900 64620 1655100 0 0 1 7 16 4 0 0 99 0

604

CHAPTER 22 ■ DATABASE TROUBLESHOOTING

When running vmstat in watch -d (differences) mode, you’ll visually see changes on your screen as they alter from snapshot to snapshot. To exit from watch, press Ctrl+C.

Note that the default unit of measure for the memory columns of vmstat is in kilobytes. If you want to view memory statistics in megabytes, then use the –S m (statistics in megabytes) option: $ vmstat –S m

For reference, Table 22–2 details the meanings of the columns displayed in the default output of vmstat.

Table 22–2.
Column Descriptions of vmstat Output

Column

Description

r

Number of processes waiting for runtime

b

Number of processes in uninterruptible sleep

swpd

Total virtual memory (swap) in use (KB)

free

Total idle memory (KB)

buff

Total memory used as buffers (KB)

cache

Total memory used as cache (KB)

si

Memory swapped in from disk (KB/s)

so

Memory swapped out to disk (KB/s)

bi

Blocks read in (blocks/s) from block device

bo

Blocks written out (blocks/s) per second to block device

in

Interrupts per second

cs

Context switches per second

us

User-level code time as a percentage of total CPU time

sy

System level code time as a percentage of total CPU time

id

Idle time as a percentage of total CPU time

wa

Time waiting for I/O completion

605

CHAPTER 22 ■ DATABASE TROUBLESHOOTING

Using top

Another tool for identifying resource-intensive processes is the top command. Use this utility to quickly identify which processes are the highest consumers of resources on the server. By default, top will repetitively refresh (every three seconds) information regarding the most CPU-intensive processes.

Here’s the simplest way to run top:

$ top

Here’s a fragment of the output:

top - 21:05:39 up 43 days, 23:45, 8 users, load average: 1.10, 0.87, 0.72

Tasks: 576 total, 2 running, 574 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 98.8%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16100352k total, 12480204k used, 3620148k free, 38016k buffers Swap: 18481144k total, 380072k used, 18101072k free, 8902940k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9236 mscd642 15 0 13000 1468 812 R 0.7 0.0 0:00.03 top

3179 oracle 16 0 2122m 1.9g 1.9g S 0.3 12.3 97:54.00 oracle

4116 oracle 16 0 618m 133m 124m S 0.3 0.8 0:08.62 oracle

20763 mscd642 15 0 609m 91m 88m S 0.3 0.6 0:00.26 oracle

1 root 15 0 10344 684 572 S 0.0 0.0 0:25.98 init

2 root RT -5 0 0 0 S 0.0 0.0 0:16.01 migration/0

3 root 34 19 0 0 0 S 0.0 0.0 0:03.16 ksoftirqd/0

The process IDs of the top-consuming sessions are listed in the first column (PID). You can use this process ID to see if it maps to a database process (see the section in this chapter on mapping a PID to a database process).

While top is running, you can interactively change its output. For example, if you type >, this will move the column that top is sorting one position to the right. Table 22–3 lists some key features that you can use to alter the top display to the desired format.

Table 22–3.
Commands to Interactively Change the top Output
Command

Function

Spacebar

Immediately refreshes the output.

< or >

Moves the sort column one to the left or to the right. By default, top sorts on the CPU column.

D

Changes the refresh time.

R

Reverses the sort order.

Z

Toggles the color output.

H

Displays help menu.

F or O

Chooses a sort column.

606

e

CHAPTER 22 ■ DATABASE TROUBLESHOOTING

Type q or press Ctrl+C to exit top. Table 22–4 describes several of the columns displayed in the default output of top.

Table 22–4.
Column Descriptions of the top Output

Column

Description

PID

Unique process identifier.

USER

OS username running the process.

PR

Priority of the process.

NI

Nice value or process. Negative value means high priority. Positive value means low priority.

VIRT

Total virtual memory used by process.

RES

Nonswapped physical memory used.

SHR

Shared memory used by process.

S Process

status.

%CPU

Processes percent of CPU consumption since last screen refresh.

%MEM

Percent of physical memory the process is consuming.

TIME

Total CPU time used by process.

TIME+

Total CPU time, showing hundredths of seconds.

COMMAND

Command line used to start a process.

You can also run top using the -b (batch mode) option and send the output to a file for later analysis: $ top –b > tophat.out

While running in batch mode, the top command will run until you kill it (with a Ctrl+C) or until it reaches a specified number of iterations. You could run the previous top command in batch mode with a combination of nohup and & to keep it running regardless if you were logged onto the system. The danger there is that you might forget about it and eventually create a very large output file (and an angry system administrator).

If you have a particular process that you’re interesting in monitoring, use the -p option to monitor a process ID or the -U option to monitor a specific username. You can also specify a delay and number of iterations by using the -d and -n options. The following example monitors the oracle user with a delay of 5 seconds for 25 iterations:

$ top –u oracle –d 5 –n 25

607

CHAPTER 22 ■ DATABASE TROUBLESHOOTING


Tip
Use the man top or top --help commands to list all the options available in your operating system version.

Mapping an Operating System Process to a SQL Statement

When identifying operating system processes, it's useful to view which processes are consuming the most amount of CPU. If the resource hog is a database process, it's also useful to map the operating system process to a database job or query. To determine the ID of the processes consuming the most CPU resources, use a command like ps, like so:

$ ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head

Here is some sample output:

72.4 25922 mscd642 ? oracleE64215 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) 1.5 28215 oracle ? oracleemrep (LOCAL=NO)

0.2 24764 oracle ? /u01/oracle/product/11.0.0/grid/agent10g/bin/emagent 0.1 3179 oracle ? ora_j000_emrep

From the output, the operating system session of 25922 is consuming the most CPU resources at 72.4 percent. In this example, the 25922 process is associated with the E64215 database. Next, log onto the appropriate database and use the following SQL statement to determine what type of program is associated with the operating system process of 25922:

select

'USERNAME : ' || s.username|| chr(10) ||

'OSUSER : ' || s.osuser || chr(10) ||

'PROGRAM : ' || s.program || chr(10) ||

'SPID : ' || p.spid || chr(10) ||

'SID : ' || s.sid || chr(10) ||

'SERIAL# : ' || s.serial# || chr(10) ||

'MACHINE : ' || s.machine || chr(10) ||

'TERMINAL : ' || s.terminal

from v$session s,

v$process p

where s.paddr = p.addr

and p.spid = '&PID_FROM_OS';

When you run the example, SQL*Plus will prompt you for the value to use in place of &PID_FROM_OS.

Other books

Greasing the Piñata by Tim Maleeny
Damsels in Distress by Joan Hess
Longarm 242: Red-light by Evans, Tabor
Briar's Book by Pierce, Tamora
The Greek Tycoon's Secret Heir by Katherine Garbera
The Shadow at the Gate by Christopher Bunn
¡A los leones! by Lindsey Davis