Showing posts with label performance tuning. Show all posts
Showing posts with label performance tuning. Show all posts

Thursday, February 14, 2013

AWR Analysis


http://www.toadworld.com/Experts/GuyHarrisonsImprovingOraclePerformance/ResolvingOracleContention/tabid/257/Default.aspx
http://www.dba-oracle.com/art_sql_tune.htm
http://www.pafumi.net/Tips_for_Effective_Queries.html


Rows processed—Queries that process a large number of rows will have high I/O and may also have impact on the TEMP tablespace.

Buffer gets—High buffer gets may indicate a resource-intensive query.

Disk reads—High disk reads indicate a query that is causing excessive I/O.

Memory KB—The memory allocation of a SQL statement is useful for identifying statements that are doing in-memory table joins.

CPU secs—This identifies the SQL statements that use the most processor resources.

Sorts—Sorts can be a huge slowdown, especially if they’re being done on a disk in the TEMP tablespace.

Executions—The more frequently executed SQL statements should be tuned first, since they will have the greatest impact on overall performance.

http://docs.oracle.com/cd/B10500_01/server.920/a96536/apc2.htm
http://savvinov.com/2012/04/06/awr-reports-interpreting-cpu-usage/
http://dbaregistry.blogspot.in/2010/01/oracle-wait-events.html
http://www.dba-oracle.com/t_online_redo_logs_raid.htm
http://samadhandba.wordpress.com/


STATISTICS_LEVEL should be TYPICAL or ALL for getting AWR report

---------------How to reduce buffer gets
Buffer gets—High buffer gets may indicate a resource-intensive query.

you should be able to reduce buffer gets by reducing read operations This can sometimes be done by indexed lookups, sometimes not. It depends on the situation.

Buffer gets means that the data block is in memory and Oracle is returning data from the buffer cache and NOT having to go to disk, which is bad

when the value of buffer gets for SQL statement increased ,The amount of Oracle CPU has been increased .
-------------------

RAM is indirectly proportional to disk IO
if RAM has high memory then disk IO will be low

RAID 1 is best for Oracle DB.
----------------
Check DB Time
If DB time is significantly higher than the elapsed time then things are waiting.
 example, the Elapsed Time is around 60 minutes while the DB Time is around 700 minutes. This means that 700 minutes of time is spent by the sessions on waiting.
---------------------

Look for the Top 5 Timed Foreground Events. Investigate what is triggering these events? (Ignore idle events)

SQL ordered by Elapsed Time => Always helps to find the time consuming sql (Observe the elapsed time as well as number of executions also).
SQL ordered by Reads should also be noticed for I/O intensive queries.


------------------------
PGA Memory Advisory
When using Auto Memory Mgmt, minimally choose a pga_aggregate_target value where Estd PGA Overalloc Count is 0
when Estd PGA Cache Hit %=99% starts means this is sufficient pga for DB.
------------------------
Tablespace IO Stats

5-10 msec is normal
not exceed from 20msec

------------------------
File IO Stats depends on disk storage
8-10 msec is normal

------------------------
Enqueue Activity
Enqueue Type (Request Reason)
 TM lock
this could be caused by missing indexes on foreign key columns.
Problem with standard edition, fine with enterprise edition
The foreign-key issue is that DML on the child table locks the parent table
-----------------

“Starting with Oracle 10.2,  you may notice a significant degradation in relative performance when you combine a GROUP BY with an ORDER BY on the same columns.

When a GROUP BY is associated with an ORDER BY on the same columns,  the Oracle optimizer may choose a SORT GROUP BY rather than the usually more efficient HASH GROUP BY.

Using the SORT GROUP BY avoids adding a SORT ORDER BY to the plan, but the overall result is usually disappointing.

To get a better result, you can perform the GROUP BY in an in-line view and perform the ORDER BY in the outer query.  Use the NO_MERGE hint to prevent the two operations from being combined.”
-------------

Performance issue is caused by soft parsing, which can be solved by holding cursors (session_cached_cursors). It may be due to oversized shared pool, which is causing delay in searching child cursors. My second thought is, there is large number of reloads, which can be due to under-sized shared pool, if invalidation activities are not going (CBO statistics collection, DDL etc), cursors are being flushed frequently.

-------------
Recursive calls are also generated due to the inavailability of dictionary info in the dictionary cache, firing of database triggers, execution of DDL, execution of SQL within PL/SQL blocks, functions or stored procedures and enforcement of referential integrity constraints.

If the recursive CPU usage percentage is a large proportion of total CPU usage, you may want to make sure the shared pool memory allocation is adequate. However, a PL/SQL-based application will always have a significant amount of recursive CPU usage.

To reduce the recursive CPU usage, make sure that shared pool is sized correctly. Also you can check the parameters like

CURSOR_SHARING – values can be EXACT, SIMILAR, FORCE
SESSION_CACHED_CURSORS – value can be anything numeric like 500.
-------------------
Remove unnecessary large-table full table scans Unnecessary full table scans cause a huge amount of unnecessary I/O, and can drag down an entire database.
---------------------
Load Profile

Parse should be near 0

Parse activity statistics should be checked carefully because they can immediately indicate a problem within the application. For example, a database has been running several days with a fixed set of applications, it should, within a course of time, parse most SQLs issued by the applications, and these statistics should be near zero.

If there are high values of Soft Parses  or especially Hard Parses statistics, such values should be taken as an indication that the applications make little use of bind variables and produce large numbers of unique SQLs. However, if the database serves developmental purposes, high vales of these statistics are not bad.

% Blocks changed per Read:
The % Blocks changed per Read statistic indicates that only 4.85 percent of all blocks are retrieved for update, and in this example, the Recursive Call % statistic is extremely high with about 90 percent. However, this fact does not mean that nearly all SQL statements executed by the database are caused by parsing activity, data dictionary management, space management, and so on.

It is also useful to check the value of the Rollback per transaction % statistic. This statistic reports the percent of transactions rolled back. In a production system, this value should be low. If the output indicates a high percentage of transactions rolled back, the database expends a considerable amount of work to roll back changes made.  This should be further investigated in order to see why the applications roll back so often.

Rollback per transaction %  should be low.

---------------------

CPU time represents time spent on CPU and does not include time waiting for CPU.

CPU usage is described by “CPU time” (or “DB CPU”) statistics. Somewhat counterintuitively, AWR report showing CPU time close to 100% in the top timed events section does not necessarily indicate a problem. It simply means that database is busy using CPU to do work for its users. However, if CPU time (expressed in CPU seconds) becomes commensurate to the total CPU power available on the host (or shows consistent growth patterns), then it becomes a problem, and a serious one: this means that at best, Oracle processes will wait lots of time to get on CPU runqueue. In the worst case scenario, the host OS won’t have adequate resources to run and may eventually hang.

DB CPU usage (% of CPU power available) = CPU time / NUM_CPUS / elapsed time

If DB CPU usage is at 80-90% of the capacity (or 70-80% and growing) then you try to reduce CPU usage or if not possible, buy more CPU power before the system freezes.

----------------------Instance Efficiency Percentage
 these all will be above 90%

Parse CPU to Parse Elapsd %: 0.00 % shows how much time the CPU is parsing SQL statments, so the lower the better.

The Instance Efficiency Percentage report section contains ratios or calculations that may provide information regarding different structures and operations in the Oracle instance. Database tuning must never be driven by hit ratios. Hit ratios only provide additional information to help the DBA understand how the instance is operating.

http://www.dba-oracle.com/oracle10g_tuning/t_instance_efficiency_percentage.htm


The following list includes the meanings of particular hit ratios:
Buffer Hit Ratio: Measures how many times a required block was found in memory rather than having to execute an expensive read operation on disk to get the block.
Buffer Nowait %: Shows the percentage of times when data buffers were accessed directly without any wait time.
Library Hit %: Shows the percentage of times when SQL statements and PL/SQL packages were found in the shared pool.
Execute to Parse %: Shows how often parsed SQL statements are reused without reparsing.
Parse CPU to Parse Elapsd %: Gives the ratio of CPU time spent to parse SQL statements.
Redo NoWait %: Shows whether the redo log buffer has sufficient size.
In-memory Sort %: Shows the percentage of times when sorts are performed in memory instead of using temporary tablespaces.
Soft Parse %: Shows how often sessions issued a SQL statement that is already in the shared pool and how it can use an existing version of that statement.
Latch Hit %: Shows how often latches were acquired without having to wait.
% Non-Parse CPU: Shows the percentage of how much CPU resources were spent on the actual SQL execution.

In the above list of statistics, special attention should be paid to parse-related statistics. The Instance Efficiency Percentage report provided previously shows that about 95 percent of the parses are soft as indicated by the Soft Parse %. This is good enough, indicating that the SQL statements are actively reused by Oracle.


The next interesting item to review is the Parse CPU to Parse Elapsd % statistic. In this case, it is about three percent, which is very low. This fact reveals that Oracle waits for some resources during parsing of SQL statements. This should be investigated further to find the cause.

In this case, % Non-Parse CPU statistic is about 97 percent, which is quite high.  This indicates Oracle utilizes the CPU mostly for statement execution but not for parsing.

As a rule of thumb, one should always minimize the number of hard parses in your production database. This reduction yields the benefit of minimizing CPU overhead spent performing costly parse work.


------------------Shared Pool Statistics ------------------------

Idealy this should be lesser. If it is very high like beyond 90, this shows the contention in the shared pool.

This following sample report section shows shared pool related statistics:

Shared Pool Statistics        Begin    End
                              ------  ------
             Memory Usage %:   92.70   92.49
    % SQL with executions>1:   86.73   84.20
  % Memory for SQL w/exec>1:   84.12   71.86

In this example, the Memory Usage % statistic shows that almost all, approximately 92 percent, of the shared pool memory is consumed. This could indicate that the system experiences some overhead while aging out old shared memory structures like cursors, PL/SQL programs, and so on. This places additional overhead on the CPU to perform reparsing aging-out. The size of the shared pool should be increased appropriately to eliminate such overhead. In general, this statistic should be near 70 percent after the database has been running a long time.
If it is quite low, memory is being wasted.

Memory Usage % >: The sweet point for the shared pool Memory Usage % is at about ~70%-80% because of aging out. You are at ~90%. It would be advisable to increase the shared pool, but if you're using 10g SGA_TARGET and SGA_MAX_SIZE, then that might require increasing those parameters.

Memory Usage % >90% then increase sga

The % SQL with executions>1 statistic indicates how many SQL statements are executed more than one time. This measures how well production applications are tuned and how well they make use of bind variables.

% Memory for SQL w/exec>1: the percentage of SQLs executed more then once, that were in Memory


-------------------------------------------Buffer Pool Advisory-----------
If you check the column Size for Est(M) for Size Factor 1.0 it is 596M which means 596MB of memory has been allocated by Oracle either automatically if sga_target > 0 OR manually in the parameter file.

Simultaneously we need to check the column “Estimated Physical Reads” which has a value of 37,612,741.

You will notice that increasing buffer cache size will reduce the number of “Estimated Physical Reads”.

Size Factor ”1.0” also has Estimated Phys Read Fator ”1.0” with “Estimated Physical Reads” as37,612,741.

Note how ‘Size Factor’ increases and ‘Estimated Phys Read Factor’ decreases.


 In cases when sga_target > 0 and db_cache_size is determined automatically, similar data would mean that allocated by Oracle size of the buffer cache is fairly balanced, i.e. not too small to have excessive number of physical reads and not too big to have unnecessary overhead while serving a bigger buffer cache.


--------------------------------Top 5 Timed Events
Sometimes, a major external problem such as bad disk RAID can cause the entire database to run slowly, so it is always a good idea to start with a listing of the AWR top five wait event listing.

This tell you where the highest resources are being consumed.

db file sequential read would indicate that there where full table scans being performed.

 Interpretation of CPU Time(s) 1033

We have 60*60=3600 CPU Seconds to use in a particular interval for single CPU in 1 hour snap


In the example we have 8 CPU (Num_CPUs under the Operating System Statistics of AWR) which relates to    60*60*8= 28800 CPU seconds to use in 1 hr interval. ( Single Database Machine is running on machine)

(1033/28800)*100 = 3.58% of Total CPU

Network Bottleneck
A slow or un-tuned network can be a nightmare.  Network overload can also be caused by Oracle TNS issues or by an inefficient application that makes too many trips to Oracle to fetch screen data.

A CPU-bound database may be due to a real overload of the CPU banks, in which case the problem can be remedied with more or faster processors.  High CPU consumption can also be due to un-tuned SQL causing excessive logical I/O in the form of consistent gets or by contention within the shared pool, such as the total lack of SQL bind variables.



----------------------------Tablespace IO Stats
This shows the IO Statistics for each tablespaces in the database.

As the thumb rule, the Av Rd(ms) [Average Reads in milliseconds] should not cross beyond 30, add myself(not greater that 30) which is considered to be IO bottleneck.

Tablespace Reads Av Reads/s Av Rd(ms) Av Blks/Rd Writes Av Writes/s Buffer Waits Av Buf Wt(ms)
TEMP 3,316,0 4.91 1 28,840 8 0 0
DAT1 520,120 146 16.06 1.21 185,846 52 902 13
DAT3 93,411 26 42.82 2.98 13,442 4 16 23.13
DAT2 98,171 28 91.97 7.97 5,333 2 325 34.89

In the above example, the Av Rd(ms) is high in all tablespaces indicating the IO contention.

------------------------------------------------Wait Events
---------------db file sequential reads
Reason:
Use of an unselective index
Fragmented Indexes
High I/O on a particular disk or mount point
Bad application design
Index reads performance can be affected by  slow I/O subsystem and/or poor database files layout, which result in a higher average  wait time

Action:
Check indexes on the table to ensure that the right index is being used
Check the column order of the index with the WHERE clause of the Top SQL statements
Rebuild indexes with a high clustering factor
Use partitioning to reduce the amount of blocks being visited
Make sure optimizer statistics are up to date
Relocate ‘hot’ datafiles
Consider the usage of multiple buffer pools and cache frequently used indexes/tables in the KEEP pool

Suggestion
The Oracle process wants a block that is currently not in the SGA, and it is waiting for the database block to be read into the SGA from disk.
Significant db file sequential read wait time is most likely an application issue.
If the DBA_INDEXES.CLUSTERING_FACTOR of the index approaches the number of blocks in the table, then most of the rows in the table are ordered. This is desirable.
However, if the clustering factor approaches the number of rows in the table, it means the rows in the table are randomly ordered and thus it requires more I/Os to complete the operation. You can improve the index’s clustering factor by rebuilding the table so that rows are ordered according to the index key and rebuilding the index thereafter.

----------------------db file scattered reads
Reason:
The Oracle session has requested and is waiting for multiple contiguous database blocks (up to DB_FILE_MULTIBLOCK_READ_COUNT) to be  read into the SGA from disk.
Full Table scans
Fast Full Index Scans

Action:
Optimize multi-block I/O by setting the parameter DB_FILE_MULTIBLOCK_READ_COUNT
Partition pruning to reduce number of blocks visited
Optimize the SQL statement that initiated most of the waits. The goal is to minimize the number of physical and logical reads.
The objective is to reduce the demands for both the logical and physical I/Os, and this is best achieved through SQL and application tuning.
Make sure all statistics are representative of the actual data. Check the LAST_ANALYZED date

----------------log file parallel write
Reason:
LGWR waits while writing contents of the redo log buffer cache to the online log files on disk
I/O wait on sub system holding the online redo log files

Action:
Reduce the amount of redo being generated
Do not leave tablespaces in hot backup mode for longer than necessary
Do not use RAID 5 for redo log files
Use faster disks for redo log files
Ensure that the disks holding the archived redo log files and the online redo log files are separate so as to avoid contention
Consider using NOLOGGING or UNRECOVERABLE options in SQL statements

----------------------------------------------log file sync
Reason:
Oracle foreground processes are waiting for a COMMIT or ROLLBACK to complete
 Oracle metric indicates the process is waiting for LGWR to finish flushing the log buffer to disk.

Action:
Tune LGWR to get good throughput to disk eg: Do not put redo logs on RAID5 Reduce overall number of commits by batching transactions so that there are fewer distinct COMMIT operations
Move the redo logs to faster disks or a faster I/O subsystem (for example, switch from RAID 5 to RAID 1).

See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options in order to reduce the amount of redo being written.
Check the size of the log buffer as it may be so large that LGWR is writing too many blocks at one time.


Log File Sync waits occur when sessions wait for redo data to be written to disk
typically this is caused by slow writes
or committing too frequently in the application
CPU overburning(very high demand => LGWR on run queue)
improper Operating System configuration(check 169706.1)
BUGs in Oracle(especially with RAC option) and 3rd Party software(like ODM/DISM)


Recommendations:
tune LGWR process to get good throughput, especially when ‘log file parallel write‘ high too:
do not put redo logs on RAID 5 without good write cache
do not put redo logs on Solid State Disk (SSD)
if CPUs are overburned(check runqueue with vmstat):
check for non-oracle system activity, like GZIP or BZIP2 running in business hours…
lower instance’s CPU usage(for example, tune SQL for LIOs)
increase LGWR priority(renice or _high_priority_processes),
decrease COMMITs count for applications with many short transactions
use COMMIT [BATCH] NOWAIT(10g+) when possible
do some processing with NOLOGGING(or may be even with _disable_logging=TRUE if just testing performance benchmark/impact), but think about database recoverability
lower system’s CPU usage or increase LGWR priority
check if there is some 3rd party software, or utilities like RMAN, activity on the same disks as redo logs placed, like trace/systemstate dump files, e.t.c
trace LGWR as the last option for troubleshooting OS/3rd party issues


---------------buffer busy waits
Reason:
Buffer busy waits are common in an I/O-bound Oracle system.
The two main cases where this can occur are:
Another session is reading the block into the buffer
Another session holds the buffer in an incompatible mode to our request
These waits indicate read/read, read/write, or write/write contention.
The Oracle session is waiting to pin a buffer.
A buffer must be pinned before it can be read or modified. Only one process can pin a buffer at any one time.
This wait can be intensified by a large block size as more rows can be contained within the block
This wait happens when a session wants to access a database block in the buffer cache but it cannot as the buffer is "busy
It is also often due to several processes repeatedly reading the same blocks (eg: if lots of people scan the same index or data block)

Action:
The main way to reduce buffer busy waits is to reduce the total I/O on the system
Depending on the block type, the actions will differ
Eliminate HOT blocks from the application.
Check for repeatedly scanned / unselective indexes.
Try rebuilding the object with a higher PCTFREE so that you reduce the number of rows per block.
 Check for 'right- hand-indexes' (indexes that get inserted into at the same point by many processes).
 Increase INITRANS and MAXTRANS and reduce PCTUSED This will make the table less dense .
Reduce the number of rows per block
Increase of number of FREELISTs and FREELIST GROUPs

----------------------------------------------free buffer waits
Reason:
This means we are waiting for a free buffer but there are none available in the cache because there are too many dirty buffers in the cache
Either the buffer cache is too small or the DBWR is slow in writing modified buffers to disk
DBWR is unable to keep up to the write requests
Checkpoints happening too fast – maybe due to high database activity and under-sized online redo log files
Large sorts and full table scans are filling the cache with modified blocks faster than the DBWR is able to write to disk
If the  number of dirty buffers that need to be written to disk is larger than the number that DBWR can write per batch, then these waits can be observed

Action:
Reduce checkpoint frequency  - increase the size of the online redo log files
Examine the size of the buffer cache – consider increasing the size of the buffer cache in the SGA
Set disk_asynch_io = true set If not using asynchronous I/O increase the number of db writer processes or dbwr slaves
Pre-sorting or reorganizing data can help

------------------------enqueue waits
Reason:
This wait event indicates a wait for a lock that is held by another session (or sessions) in an incompatible mode to the requested mode.
TX Transaction Lock
Generally due to table or application set up issues
This indicates contention for row-level lock. This wait occurs when a transaction tries to update or delete rows that are currently locked by another transaction.
This usually is an application issue.

TM DML enqueue lock
Generally due to application issues, particularly if foreign key constraints have not been indexed.

ST lock
Database actions that modify the UET$ (usedextent) and FET$ (free extent) tables require the ST lock, which includes actions such as drop, truncate, and coalesce.
Contention for the ST lock indicates there are multiple sessions actively performing dynamic disk space allocation or deallocation in dictionary managed tablespaces

Action:
Reduce waits and wait times
Whenever you see an enqueue wait event for the TX enqueue, the first step is to find out who the blocker is and if there are multiple waiters for the same resource
Waits for TM enqueue in Mode 3 are primarily due to unindexed foreign key columns
Create indexes on foreign keys  < 10g
Following are some of the things you can do to minimize ST lock contention in your database:
Use locally managed tablespaces
Recreate all temporary tablespaces using the CREATE TEMPORARY TABLESPACE TEMPFILE… command.

--------------------------------Cache buffer chain latch
Reason:
This latch is acquired when searching for data blocks Buffer cache is a chain of blocks and each chain is protected by a child latch when it needs to be scanned
Hot blocks are another common cause of cache buffers chains latch contention. This happens when multiple sessions repeatedly access one or more blocks that are protected by the same child cache buffers chains latch.
SQL statements with high BUFFER_GETS (logical reads) per EXECUTIONS are the main culprits
Multiple concurrent sessions are executing the same inefficient SQL that is going after the same data set

Action:
Reducing contention for the cache buffer chains latch will usually require reducing logical I/O rates by tuning and minimizing the I/O requirements of  the SQL involved. High I/O rates could be a sign of a hot block (meaning a block highly accessed).
Exporting the table, increasing the PCTFREE significantly, and importing the data. This minimizes the number of rows per block, spreading them over many blocks. Of course, this is at the expense of storage and full table scans operations will be slower
Minimizing the number of records per block in the table
For indexes, you can rebuild them with higher PCTFREE values, bearing in mind that this may increase the height of the index.


--------------------Direct Path Reads
Reason:
These waits are associated with direct read operations which read data directly into the sessions PGA bypassing the SGA
The "direct path read" and "direct path write" wait events are related to operations that are performed in PGA like sorting, group by operation, hash join
In DSS type systems, or during heavy batch periods, waits on "direct path read" are quite normal
However, for an OLTP system these waits are significant
These wait events can occur during sorting operations which is not surprising as direct path reads and writes usually occur in connection with temporary tsegments
SQL statements with functions that require sorts, such as ORDER BY, GROUP BY, UNION, DISTINCT, and ROLLUP, write sort runs to the temporary tablespace when the input size is larger than the work area in the PGA

Action:
Ensure the OS asynchronous IO is configured correctly.
Check for IO heavy sessions / SQL and see if the amount of IO can be reduced.
Ensure no disks are IO bound.
Set your PGA_AGGREGATE_TARGET to appropriate value (if the parameter WORKAREA_SIZE_POLICY = AUTO)
Whenever possible use UNION ALL instead of UNION, and where applicable use HASH JOIN instead of SORT MERGE and NESTED LOOPS instead of HASH JOIN.
Make sure the optimizer selects the right driving table. Check to see if the composite index’s columns can be rearranged to match the ORDER BY clause to avoid sort entirely.


------------------------Direct Path  Writes
These are waits that are associated with direct write operations that write data from users’ PGAs to data files or temporary tablespaces
Direct load operations (eg: Create Table as Select (CTAS) may use this)
Parallel DML operations
Sort IO (when a sort does not fit in memory

Action:
If the file indicates a temporary tablespace check for unexpected disk sort operations.
Ensure <Parameter:DISK_ASYNCH_IO> is TRUE . This is unlikely to reduce wait times from the wait event timings but may reduce sessions elapsed times (as synchronous direct IO is not accounted for in wait event timings).
Ensure the OS asynchronous IO is configured correctly.
Ensure no disks are IO bound


-------------------------Latch Free Waits
This wait indicates that the process is waiting for a latch that is currently busy (held by another process).
A latch is a fast, inexpensive and non-sophisticated lock. The latch is used when you need to serialize access to operations, functions and data structures in Oracle.
We could check the shared pool tuning to determine whether that might be a valid suggestion to increase the size of the shared pool. Cursor sharing can be implemented to prevent thrashing within the shared pool and the session_cached_cursors can be used to reduce the soft parses. The following Oracle initialization parameter changes resolved the performance degradation that resulted from the latch free wait event.


alter system set session_cached_cursors=150 scope=spfile;
alter system set shared_pool_size=900M scope=spfile;
alter system set cursor_sharing='FORCE' scope=spfile;

--------------------Library cache latch
The library cache latches protect the cached SQL statements and objects definitions held in the library cache within the shared pool. The library cache latch must be acquired in order to add a new statement to the library cache
Application is making heavy use of literal SQL- use of bind variables will reduce this latch considerably
Action:
You can reduce the library cache latch hold time by properly setting the SESSION_CACHED_CURSORS parameter
Note:
Larger shared pools tend to have long free lists and processes that need to allocate space in them must spend extra time scanning the long free lists while holding the shared pool latch
Oversize shared pool is also problem.

--------------------Shared pool latch
The shared pool latch is used to protect critical operations when allocating and freeing memory in the shared pool

Action:
Ways to reduce the shared pool latch are, avoid hard parses when possible, parse once, execute many.
The workaround is to set the initialization parameter CURSOR_SHARING to FORCE. This allows statements that differ in literal values but are otherwise identical to share a cursor and therefore reduce latch contention, memory usage, and hard parse.

-------------Log Buffer Space
Log Buffer Space wait event occurs when server processes write data into the log buffer faster than the LGWR process can write it out. The LGWR process begins writing entries to the online redo log file if any of the following conditions are true:

The log buffer reaches the _log_io_size threshold. By default, this parameter is set to one third of the log buffer size.
A server process performing a COMMIT o ROLLBACK posts to the LGWR process.
The DBWR process posts to the LGWR process before it begins writing.


Solution
Put log files on faster disks.
Look at tuning checkpoint or archive processes (log files switches).
Review application design, use NOLOGGING operations where appropriate, and avoid changing more data than required.
Finally, check refresh methods when using Materialized Views to reduce logging activity.


----------------------------------------------------------------------Wait Event Meaning----------------------------------------------------------------


db file sequential read  => tune indexing, tune SQL (to do less I/O), tune disks, increase buffer cache. This event is indicative of disk contention on index reads. Make sure all objects are analyzed. Redistribute I/O across disks. The wait that comes from the physical side of the database. It related to memory starvation and non selective index use. Sequential read is an index read followed by table read because it is doing index lookups which tells exactly which block to go to.
db file scattered read => disk contention on full table scans. Add indexes, tune SQL, tune disks, refresh statistics, and create materialized view. caused due to full table scans may be because of insufficient indexes or unavailability of updated statistics.
Oracle monitoring tools Make money blogging Web hosting servers Oracle Dba Job Interview
db file parallel read  => tune SQL, tune indexing, tune disk I/O, increase buffer cache. if you are doing a lot of partition activity then expect to see that wait even. it could be a table or index partition.
db file parallel write  => if you are doing a lot of partition activity then expect to see that wait even. it could be a table or index partition.
db file single write  => if you see this event than probably you have a lot of data files in your database.

control file sequential read
control file parallel write

log file sync    => committing too often, archive log generation is more. Tune applications to commit less, tune disks where redo logs exist, try using nologging/unrecoverable options, log buffer could be too large.
log file switch completion => May need more log files per group.
log file parallel write  => Deals with flushing out the redo log buffer to disk. Disks may be too slow or have an I/O bottleneck. Look for log file contention.
log buffer space   => Increase LOG_BUFFER parameter or move log files to faster disks. Tune application, use NOLOGGING, and look for poor behavior that updates an entire row when only a few columns change.
log file switch (checkpoint incomplete) => May indicate excessive db files or slow IO subsystem.
log file switch (archiving needed)    => Indicates archive files are written too slowly.
redo buffer allocation retries  => shows the number of times a user process waited for space in the redo log buffer.
redo log space wait time  => shows cumulative time (in 10s of milliseconds) waited by all processes waiting for space in the log buffer.

buffer busy waits/ read by other session  => Increase DB_CACHE_SIZE. Tune SQL, tune indexing, we often see this event along with full table scans, if the SQL is inserting data, consider increasing FREELISTS and/or INITRANS, if the waits are on segment header blocks, consider increasing extent sizes.
free buffer waits  => insufficient buffers, process holding buffers too long or i/o subsystem is over loaded. Also check you db writes may be getting clogged up.
cache buffers lru chain  => Freelist issues, hot blocks.
no free buffers   => Insufficient buffers, dbwr contention.

latch free
latch: session allocation
latch: in memory undo latch  => If excessive could be bug, check for your version, may have to turn off in memory undo.
latch: cache buffer chains  => check hot objects.
latch: cache buffer handles  => Freelist issues, hot blocks.

direct path write => You wont see them unless you are doing some appends or data loads.
direct path reads => could happen if you are doing a lot of parallel query activity.
direct path read temp or direct path write temp => this wait event shows Temp file activity (sort,hashes,temp tables, bitmap) check pga parameter or sort area or hash area parameters. You might want to increase them.

library cache load lock
library cache pin => if many sessions are waiting, tune shared pool, if few sessions are waiting, lock is session specific.
library cache lock  => need to find the session holding the lock, look for DML manipulating an object being accessed, if the session is trying to recompile PL/SQL, look for other sessions executing the code.

undo segment extension  => If excessive, tune undo.
wait for a undo record   => Usually only during recovery of large transactions, look at turning off parallel undo recovery.

enque wait events   => Look at V$ENQUEUE_STAT





Trace Connect Hang at startup


 http://www.oracleangels.com/2011/05/useful-tracing-commands-oradebug-oracle.html
-- There is an option that is helpful when the databse is hanging and we can't connect
-- to database but we need to execute commands like oradebug

-- With sqlplus -prelim
-- we can connect to the sga but not to the database, in this case no session is created.


1) sqlplus -prelim / as sysdba

2) sqlplus /nolog
set _prelim on
conn / as sysdba

Example:

$sqlplus -prelim /nolog

SQL*Plus: Release 10.2.0.4.0 - Production on Wed Oct 26 19:35:42 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

SQL> connect / as sysdba
Prelim connection established
SQL> oradebug setmypid
Statement processed.
SQL> oradebug hanganalyze 12
Hang Analysis in C:\oracle\admin\orcl\udump\orcl_ora_5564.trc
SQL> oradebug dump systemstate 10
Statement processed.
SQL> oradebug tracefile_name
C:\oracle\admin\orcl\udump\orcl_ora_5598.trc



sqlplus '/ as sysdba'
oraclebug unlimit
oradebug dump systemstate 266
wait 90 seconds
oradebug dump systemstate 266
wait 90 seconds
oradebug dump systemstate 266

oraclebug tracefile_name




-----------------------------------------Identify the process id consuming 100% CPU and take errorstack as below

connect / as sysdba
oradebug setospid 1234 << replace 1234 with actual process id
oradebug unlimit
oradebug dump errorstack 3
wait 1 min
oradebug dump errorstack 3
wait 1 min
oradebug dump errorstack 3
oradebug tracefile_name

Killed Session Hung Consuming 100% CPU
Doc ID: Note:458881.1
215858.1



Top Buffer Gets


set serverout on size 1000000

declare
top5 number;
text1 varchar2(4000);
x number;
len1 number;
Cursor c1 is
select buffer_gets,substr(sql_text,1,4000)
from v$sqlarea
order by buffer_gets desc;
begin
dbms_output.put_line('Reads'||'  '||'                          Text');
dbms_output.put_line ('-----'||'  '||'---------------------------------------------------');
dbms_output.put_line('      ');
open c1;
for i in 1 .. 5 loop
fetch c1 into top5, text1;
dbms_output.put_line(rpad(to_char(top5),9)|| ' '||substr(text1,1,66));
len1 :=length(text1);
x := 66;
while len1 > x-1 loop
dbms_output.put_line('"         '||substr(text1,x,64));
x := x+64;
end loop;
end loop;
end;
/

CPU taking SQL


If you noticed that a session is using too much CPU, you can identify the actions performed by that session using top and Explain Plan.
So first, use TOP to identify the session using high CPU and take a note of the PID.

set linesize 140
set pagesize 100
col username format a15
col machine  format a20
ACCEPT Process_ID prompt 'Pid : '
select s.inst_id,p.spid,s.sid,s.serial#,s.username,s.machine
from gv$session s, gv$process p
where s.paddr=p.addr
and p.spid=&proceso;

Once you got the SID associated to that PID, then you can use it with explain plan:
set lines 140
set pages 10000
set long 1000000
ACCEPT Process_SID prompt 'Sid : '
SELECT a.sql_id, a.sql_fulltext
FROM v$sqlarea a, v$session s
WHERE a.address = s.sql_address
 AND s.sid = &proceso;


set lines 150
set pages 40000
col operation format a55
col object format a25
ACCEPT sentencia prompt 'Identificador de SQL ejecutado : '
select lpad(' ',2*depth)||operation||' '||options||decode(id, 0, substr(optimizer,1, 6)||'
                   Cost='||to_char(cost)) operation,
object_name object, cpu_cost, io_cost
from v$sql_plan where sql_id='&sentencia';



temporary tablespace usage


– Listing of temp segments.–

SELECT A.tablespace_name tablespace, D.mb_total,
SUM (A.used_blocks * D.block_size) / 1024 / 1024 mb_used,
D.mb_total – SUM (A.used_blocks * D.block_size) / 1024 / 1024 mb_free
FROM v$sort_segment A,
(
SELECT B.name, C.block_size, SUM (C.bytes) / 1024 / 1024 mb_total
FROM v$tablespace B, v$tempfile C
WHERE B.ts#= C.ts#
GROUP BY B.name, C.block_size
) D
WHERE A.tablespace_name = D.name
GROUP by A.tablespace_name, D.mb_total;

– Temp segment usage per session.–

SELECT S.sid || ‘,’ || S.serial# sid_serial, S.username, S.osuser, P.spid, S.module,
P.program, SUM (T.blocks) * TBS.block_size / 1024 / 1024 mb_used, T.tablespace,
COUNT(*) statements
FROM v$sort_usage T, v$session S, dba_tablespaces TBS, v$process P
WHERE T.session_addr = S.saddr
AND S.paddr = P.addr
AND T.tablespace = TBS.tablespace_name
GROUP BY S.sid, S.serial#, S.username, S.osuser, P.spid, S.module,
P.program, TBS.block_size, T.tablespace
ORDER BY sid_serial;

Top 30 I/O Intensive SQL Statements Identification


-- Top 30 I/O Intensive SQL Statements Identification : top30iosql.sql
set linesize 80
set pagesize 58
set heading on
set serveroutput on
spool c:\temp\top30iosql.txt
declare cursor curs1 is
select executions, disk_reads, buffer_gets, first_load_time, sql_text from v$sqlarea order by disk_reads / decode(executions,0,1,executions) desc;
stmnt_ctr number(3);
wrt1 number(3);
begin dbms_output.enable(50000); stmnt_ctr := 0;
for inrec in curs1 loop stmnt_ctr := stmnt_ctr + 1;
if stmnt_ctr >= 31 then
exit;
end if;
 dbms_output.put_line('--------------------------------------' || '--------------------------------------');
dbms_output.put_line('SQL Stmnt Number: ' || to_char(stmnt_ctr)); dbms_output.put_line('--------------------------------------' || '--------------------------------------');
 dbms_output.put_line('Executions : ' || to_char(inrec.executions));
 dbms_output.put_line('Disk Reads : ' || to_char(inrec.disk_reads) || ' Buffer Gets : ' || to_char(inrec.buffer_gets));
dbms_output.put_line('First Load Time: ' || inrec.first_load_time);
dbms_output.put_line('SQL Statement-------->');
end loop;
end;
/

Tuesday, July 10, 2012

RAID recommendation for Oracle Database


Redo logs are extremely write intensive, and high bandwidth storage such as SSD are ideas for relieving I/O bottlenecks.  Also, online redo logs should always be multiplexed, and RAID1 provides few additional benefits.
RAID
Type of Raid
Control File
Database File
Redo Log File
Archive Log File
0
Striping
Avoid
OK
Avoid
Avoid
1
Shadowing
Best
OK
Best
Best
1+0
Striping and Shadowing
OK
Best
Avoid
Avoid
3
Striping with static parity
OK
OK
Avoid
Avoid
5
Striping with rotating parity
OK
Best if RAID0-1 not available
Avoid
Avoid

RAID Recommendations (From MOSC NOTE: 45635.1)

Tuesday, July 3, 2012

insert update without unod redo


Is it possible to issue an INSERT statement or an UPDATE statement without generating Redo Logs and Undo?

This UNDO information itself generates REDO. There is nothing you can do about this situation: temporary tables need UNDO and that's the end of it.

To minimize the amount of UNDO is quite simple: just insert records and select records. INSERT generates the smallest amount of UNDO, because rolling back an INSERT requires simply the rowid. Conversely DELETE statements generate the most UNDO, because the database has to store the entire record. Basically, to rollback an INSERT issue a DELETE, to rollback a DELETE issue an INSERT. An UPDATE generates a variable amount of UNDO, because we need the old versions of the changed columns; the more columns changed and the bigger they are, the larger the amount of UNDO generated.

UNDO is always protected by redo.

If you direct path the global temporary table ( insert /*+ APPEND */) you can bypass undo ON THE TABLE - but not on the indexes. Hence you can reduce (marginally typically as it is usually indexes that generate the most undo) the amount of redo, but you cannot eliminate it.

The append hint suggests that Oracle should use DIRECT-PATH operations, which can result in faster inserts. If I recall correctly you should have exclusive access to the table. It is important to commit after the insert, so that you can select information from it.


More archives and more UNDOs in direct load on table with indexes.

Putting a primary key index in NOLOGGING mode did not help because NOLOGGING applies only to a limited number of operations.

http://www.dba-oracle.com/oracle_news/news_plsql_forall_performance_insert.htm


1.The undo would normally be used to un-insert the rows in the event of a
failure or rollback - with DIRECT LOAD, undo is not necessary since the new rows are added entirely above the high water mark for the table.
2. When "Direct load" (/*+ append */) is used, Oracle can skip undo generation for the TABLE data - but not on the indexes.
3. In archivelog mode , REDO is normally generated with "Direct load" (/*+ APPEND */), it is UNDO that is skipped and then only for the table itself.
4.If table is placed into "nologging" mode (or use nologging in insert
command), then redo for the table as well as undo can be skipped..Again, only for the table - not for any indexes on the table itself.
5. Small redos which are generated in nologgin/DIRECT LOAD is used to protect the data dictionary.
6. To prevent archivelog generation in "Direct load", database and tablespace should not be in "Force logging mode". (Check v$database and dba_tablespaces).


http://oradbatips.blogspot.in/2007/11/tip-63-direct-load-and-redo-log.html

insert /*+ append */ into tbl1 nologging select * from dba_objects;


select a.value,b.name from v$sesstat a,v$statname b where a.statistic#=b.statistic# and a.sid in (select sid from v$mystat) and name like '%redo size%';
select sum(undoblks)*8192/1024/1024 UNDOMB from v$undostat where trunc(begin_time,'DD')=trunc(sysdate,'DD');


Final Solution:
To use direct load and to take advantage of less archivelog generation and better performance, always consider the followings :
1. Disable indexes during direct load.
2. Make sure to use both /*+ append */ with nologging at the same time.
3. Make sure database and tablespace are not in nologging mode.




Use a combination of the following two features to maximize the speed of insert statements:


Set the table’s logging attribute to NOLOGGING; this minimizes the generation redo
for direct path operations (this feature has no effect on regular DML operations).

Use a direct path loading feature, such as the following:

INSERT /*+ APPEND */ on queries that use a subquery for determining which
records are inserted

INSERT /*+ APPEND_VALUES */ on queries that use a VALUES clause





Analyze Fast



exec  DBMS_SCHEDULER.drop_job ('analyze');

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'analyze',
    job_type        => 'PLSQL_BLOCK',
    job_action      => 'BEGIN DBMS_STATS.gather_schema_stats(''LDBO'',CASCADE=>TRUE,degree => DBMS_STATS.DEFAULT_DEGREE); END;',
    start_date      => '01-APR-12 06.00.00 AM ASIA/CALCUTTA',
    repeat_interval=> 'FREQ=HOURLY;INTERVAL=2',
    end_date        => '02-APR-16 11.00.00 PM ASIA/CALCUTTA',
    enabled         => TRUE,
    comments        => 'JOB to gather LDBO statistics every 2 hours');
END;
/

exec  DBMS_SCHEDULER.run_job ('analyze');

SESSION_CACHED_CURSORS


alter system set SESSION_CACHED_CURSORS=500 scope=spfile;


If an application repeatedly issues parse calls on the same set of SQL statements, the reopening of the session cursors can affect system performance. Session cursors can be stored in a

session cursor cache. This feature can be particularly useful for applications designed using Oracle Forms because switching from one form to another closes all session cursors associated with

the first form.
The Oracle Server uses the shared SQL area to determine whether more than three parse requests have been issued on a given statement. If so, the Oracle Server assumes the session cursor

associated with the statement should be cached and moves the cursor into the session cursor cache. Subsequent requests to parse that SQL statement by the same session will then find the

cursor in the session cursor cache.
To enable caching of session cursors, use SESSION_CACHED_CURSORS to specify the maximum number of session cursors kept in the cache. A Least Recently Used (LRU) algorithm

removes entries in the session cursor cache to make room for new entries when required.
To determine whether the session cursor cache is sufficiently large for your instance, examine the session statistic "session cursor cache hits" in the V$SESSTAT view. This statistic counts the

number of times a parse call found a cursor in the session cursor cache. If this statistic is a relatively low percentage of the total parse call count for the session, you should consider setting

SESSION_CACHED_CURSORS to a larger value.

if SESSION_CACHED_CURSORS is not set, it defaults to 0 and no cursors will be cached for your session. (Your cursors will still be cached in the shared pool, but your session will have to find

them there.) If it is set, then when a parse request is issued, Oracle checks the library cache to see whether more than 3 parse requests have been issued for that statement. If so, Oracle moves

the session cursor associated with that statement into the session cursor cache. Subsequent parse requests for that statement by the same session are then filled from the session cursor cache,

thus avoiding even a soft parse. (Technically, a parse can't be completely avoided; a "softer" soft parse is done that's faster and requires less CPU.)
In the session cursor cache, Oracle manages the cached cursors using a LRU list. Once more than SESSION_CACHED_CURSORS closed cursors are cached, Oracle starts dropping cached

cursors off the LRU end of the list whenever it needs to make room to cache a new cursor.


If the session cursor cache count is maxed out, session_cursor_cache_hits is low compared to all parses, and you suspect that the application is re-submitting the same queries for parsing

repeatedly, then increasing SESSION_CURSOR_CACHE_COUNT may help with latch contention and give a slight boost to performance. Note that if your application is not resubmitting the same

queries for parsing repeatedly, then session_cursor_cache_hits will be low and the session cursor cache count may be maxed out, but caching cursors by session won't help at all. For example, if

your application is using a lot of unsharable SQL, raising this parameter won't get you anything.



if this parameter is set to a high value, the amount of fragmentation in the shared pool may be increased. The primary cause of library cache latch contention is fragmentation of the shared pool, a

common symptom of which is the ORA-04031 error.


select 'session_cached_cursors' parameter, lpad(value, 5) value,
   decode(value, 0, ' n/a', to_char(100 * used / value, '9990') || '%') usage
   from ( select max(s.value) used from V$STATNAME n, V$SESSTAT s
  where n.name = 'session cursor cache count' and s.statistic# = n.statistic# ),
   ( select value from V$PARAMETER where name = 'session_cached_cursors' )
  union all
   select 'open_cursors', lpad(value, 5), to_char(100 * used / value, '9990') || '%'
  from ( select max(sum(s.value)) used from V$STATNAME n, V$SESSTAT s
   where n.name in ('opened cursors current', 'session cursor cache count') and s.statistic# = n.statistic# group by s.sid ),
   ( Select Value From V$parameter Where Name = 'open_cursors' );




SELECT a.value curr_cached, p.value max_cached,
 s.username, s.sid, s.serial#
 FROM v$sesstat a, v$statname b, v$session s, v$parameter2 p
 Where A.Statistic# = B.Statistic# And S.Sid=A.Sid 
 ---and a.sid=&sid
 And P.Name='session_cached_cursors'
 AND b.name = 'session cursor cache count';


Above query gave a result that CUR_CACHED=MAX_CACHED . If the session cursor cache count is maxed out, session_cursor_cache_hits is low compared to all parses, and you suspect that

the application is re-submitting the same queries for parsing repeatedly, then increasing SESSION_CURSOR_CACHE_COUNT will help with latch contention and can give boost to performance.



Evaluate cached cursors for sessions as compared to parsing

select a.sid, a.value parse_cnt, 
       (select x.value
        from   v$sesstat x, v$statname y
        where  x.sid = a.sid
        and    y.statistic# = x.statistic#
        and    y.name = 'session cursor cache hits') cache_cnt
from   v$sesstat a, v$statname b
where  b.statistic# = a.statistic#
and    b.name = 'parse count (total)'
and    value > 0;

The CACHE_CNT ('session cursor cache hits') of a session should be compared to the PARSE_CNT ('parse count (total)'), if the difference is high, consider increasing the

SESSION_CACHED_CURSORS parameter.


------------------------------------------

If you shoot this query against your DB you’ll see what kind of cursor caching is your system doing.

The sum of all the tree values must be 100%:

select
    to_char(100 * sess / calls, '999999999990.00') || '%' cursor_cache_hits,
    to_char(100 * (calls - sess - hard) / calls, '999990.00') || '%' soft_parses,
      to_char(100 * hard / calls, '999990.00') || '%' hard_parses
    from
     ( select value calls from v$sysstat where name = 'parse count (total)' ),
    ( select value hard  from v$sysstat where name = 'parse count (hard)' ),
   ( select value sess  from v$sysstat where name = 'session cursor cache hits' );


If the value of the first column is bellow 80% then you must raise the parameter SESSION_CACHED_CURSORS.

----------------------------------------

steps for execution a SELECT query in Oracle:
1) Parse
  1.1) Syntatical check - Oracle checks whether this is a valid SQL.
  1.2) Semantical check - Oracle checks the SQL in terms of data structures, privileges, etc.
  1.3) Shared Pool check - Oracle checks whether the statement is already parsed and optimized. If yes, goes directly to step (4) - Oracle will do "soft parsing". If not, goes through step (2) and (3)

- Oracle will do "hard parsing".
2) Optimize - the CBO generates different execution plans and chooses the one with the lowest cost;
3) Generate source - a raw-source generator prepares the code to be executed;
4) Execute;
5) Output the result.

Item 1 to 4 form part of the parsing of the query, while that Section 5 is the execution itself. When you run a query, always performed at least steps 1 and 2. After executing these steps, Oracle

transforms the query into a hash and sends it to the Shared Pool Library Cache in sought if there is any query with the same hash value (if any session and used it at some point). If there is, it

compares the query text with the one found in the Library Cache to validate if they are exactly the same (this extra step is performed because it may be several consultations with the same hash

value) in case that they are, it proceeds to execute the query. This is what we call a Soft Parse . If the query does not exist, Oracle performs the steps 3 and 4. This is known as a Hard Parse . The

Hard Parse is very costly for Oracle Server since it involves making a number of latches (loqueos) in the SGA and CPU-intensive. As we know, every query run involves using a cursor (a cursor is a

memory space allocated to the execution of our query). The ideal is that our application opens the cursor to be used, run the sentences x times and then close them. Many applications do not

usually run as Forms cursors thus implying that we can not reuse the cursors and always have to reopen the longer run. To reduce this problem, we can use the initialization parameter
SESSION_CACHED_CURSORS that will allow us make a "Softer" Soft Parse . If the parameter setemos 100, 100 Oracle open cursors remain for us to reuse and avoid us having to open each

time. This memory space for the handling of cursors is maintained with a LRU list. Oracle recommend that the parameter setee in the first instance in 50 and be monitoring it to see if it should

increase its value. This parameter must setearse considering the value of OPEN_CURSORS.




soft parse's reduces the scalability of an system and a SQL statement should be more likely parsed once and executed many times.

soft parse will always consume less resources and less time-memory latches which is a good thing and more over many dba's expect the session connecting to database issue sql statements that

are already present in shared pool, so can you please explain why many soft parses would reduce the scalability of the system.

Thursday, June 7, 2012


Following Steps to perform. SYSAUX tablespace


SELECT *  FROM dba_jobs WHERE SCHEMA_USER='SYSMAN';

SELECT *  FROM SYSMAN.mgmt_metrics_1hour;

SELECT *  FROM SYSMAN.MGMT_SEVERITY WHERE LOAD_TIMESTAMP<TRUNC(ADD_MONTHS(SYSDATE,-2));

DELETE FROM SYSMAN.MGMT_SEVERITY WHERE LOAD_TIMESTAMP<TRUNC(ADD_MONTHS(SYSDATE,-2));
COMMIT;

SELECT trigger_name FROM dba_triggers WHERE trigger_name LIKE '%SEV%DEL%';

spool c:\del_severity.sql
SELECT 'exec sysman.em_severity.delete_current_severity('''|| target_guid || ''','''|| metric_guid ||''','''|| key_value ||''');' FROM sysman.MGMT_SEVERITY;
spool off
@c:\del_severity.sql
commit;

-------------
EXEC em_severity.delete_current_severity('142E273EE3BDA54ECF9C42EF7CCB7616','6E65075DA52ACA744B4B8C3FCB018289','/billing01');

EXEC em_severity.delete_current_severity('85DE1181E5B760BBB8F70F97FFBD7070','10633143D11A8FCF6CB1A732CEE5352A','Sat Dec 31 20:03:38 2011/115843');
COMMIT;

EXEC em_severity.delete_current_severity('85DE1181E5B760BBB8F70F97FFBD7070','10633143D11A8FCF6CB1A732CEE5352A','Sat Dec 31 20:47:33 2011/134506');

EXEC em_severity.delete_current_severity('85DE1181E5B760BBB8F70F97FFBD7070','3E6F70DB22758B7B9756EF342180E7BB','SYSAUX');

EXEC em_severity.delete_current_severity('85DE1181E5B760BBB8F70F97FFBD7070','F95BA0D95585002889E1ABF92B2DA7C3','ARBORAD');

EXEC em_severity.delete_current_severity('85DE1181E5B760BBB8F70F97FFBD7070','F95BA0D95585002889E1ABF92B2DA7C3','SYSMAN');

COMMIT;
-----------------------------------
SELECT s.target_guid,
           s.metric_guid,
           s.key_value
       FROM mgmt_targets t JOIN mgmt_current_severity s
           ON s.target_guid != t.target_guid
      WHERE LOWER(t.target_name) LIKE '%myhost%';
     

SELECT DBMS_STATS.get_stats_history_retention FROM dual;
EXEC DBMS_STATS.alter_stats_history_retention(15);
EXEC DBMS_STATS.PURGE_STATS(SYSDATE-15);

col Mb FORM 9,999,999
col SEGMENT_NAME FORM a40
col SEGMENT_TYPE FORM a6
SET lines 120
SELECT SUM(bytes/1024/1024)Mb,segment_name,segment_type FROM dba_segments
 WHERE  tablespace_name='SYSAUX'
AND segment_name LIKE 'WRI$_OPTSTAT%'
AND segment_type='TABLE'
GROUP BY segment_name,segment_type ORDER BY 1 ASC;


spool c:\rebuild_index.sql
SELECT 'alter index '||segment_name||' rebuild online parallel (degree 14);' FROM dba_segments WHERE tablespace_name='SYSAUX'
AND segment_name LIKE '%OPT%' AND segment_type='INDEX';
spool off
@c:\rebuild_index.sql


SELECT *  FROM dba_indexes WHERE owner='SYS' AND status!='VALID';




Friday, May 18, 2012

Resolving common Oracle Wait Events

Resolving common Oracle Wait Events

Wait Event
Possible Causes
Actions
Remarks

db file sequential reads

Use of an unselective index

Fragmented Indexes

High I/O on a particular disk or mount point

Bad application design

Index reads performance can be affected by
 slow I/O subsystem and/or poor database
files layout, which result in a higher average
 wait time


Check indexes on the table to ensure
that the right index is being used

Check the column order of the index
with the WHERE clause of the Top
SQL statements

Rebuild indexes with a high clustering
factor

Use partitioning to reduce the amount
of blocks being visited

Make sure optimizer statistics are up
to date

Relocate ‘hot’ datafiles

Consider the usage of multiple buffer
pools and cache frequently used
indexes/tables in the KEEP pool

Inspect the execution plans of the
SQL statements that access data
through indexes

Is it appropriate for the SQL
statements to access data through
index lookups?

Is the application an online transaction
 processing (OLTP) or decision
support system (DSS)?

Would full table scans be more
efficient?

Do the statements use the right driving
 table?

The optimization goal is to minimize
 both the number of logical and
physical I/Os.


The Oracle process wants a block that is currently not in the SGA, and it is waiting for the database block to be read into the SGA from disk.
Significant db file sequential read wait time is most likely an application issue.
If the
DBA_INDEXES.CLUSTERING_FACTOR of the index approaches the number of blocks in the table, then most of the rows in the table are ordered. This is desirable.

 However, if the clustering factor approaches the number of rows in the table, it means the rows in the table are randomly ordered and thus it requires more I/Os to complete the operation. You can improve the index’s clustering factor by rebuilding the table so that rows are ordered according to the index key and rebuilding the index thereafter.

The OPTIMIZER_INDEX_COST_ADJ and OPTIMIZER_INDEX_CACHING initialization parameters can influence the optimizer to favour the nested loops operation and choose an index access path over a full table scan.

Tuning I/O related waits Note# 223117.1

db file sequential read Reference Note# 34559.1


db file scattered reads

The Oracle session has requested and is
waiting for multiple contiguous database
blocks (up to DB_FILE_MULTIBLOCK_READ_COUNT) to be
 read into the SGA from disk.
Full Table scans

Fast Full Index Scans

Optimize multi-block I/O by setting the
parameter DB_FILE_MULTIBLOCK_READ_COUNT

Partition pruning to reduce number of
blocks visited

Consider the usage of multiple buffer
pools and cache frequently used
indexes/tables in the KEEP pool
Optimize the SQL statement that
initiated most of the waits. The goal is
to minimize the number of physical
and logical reads.
Should the statement access the data
by a full table scan or index FFS?
Would an index range or unique scan
 be more efficient?
Does the query use the right driving
table?
Are the SQL predicates appropriate
for hash or merge join?
 If full scans are appropriate, can
parallel query improve the response
time?
The objective is to reduce the
demands for both the logical and
physical I/Os, and this is best
achieved through SQL and application tuning.
Make sure all statistics are
representative of the actual data.
Check the LAST_ANALYZED date


If an application that has been running fine for a while suddenly clocks a lot of time on the db file scattered read event and there hasn’t been a code change, you might want to check to see if one or more indexes has been dropped or become unusable.
db file scattered read Reference Note# 34558.1

log file parallel write

LGWR waits while writing contents of the
redo log buffer cache to the online log files
on disk
I/O wait on sub system holding the online
 redo log files

Reduce the amount of redo being
generated

Do not leave tablespaces in hot
backup mode for longer than
necessary

Do not use RAID 5 for redo log files

Use faster disks for redo log files

Ensure that the disks holding the
archived redo log files and the online
redo log files are separate so as to
avoid contention

Consider using NOLOGGING or
UNRECOVERABLE options in SQL
statements


Reference Note# 34583.1

log file sync

Oracle foreground processes are waiting
for a COMMIT or ROLLBACK to complete


Tune LGWR to get good throughput to
 disk eg: Do not put redo logs on
RAID5

Reduce overall number of commits by
batching transactions so that there
are fewer distinct COMMIT operations

Reference Note# 34592.1

High Waits on log file sync Note# 125269.1

Tuning the Redolog Buffer Cache and Resolving Redo Latch Contention
Note# 147471.1


buffer busy waits

Buffer busy waits are common in an I/O-
bound Oracle system.
The two main cases where this can occur
are:
Another session is reading the block into the
buffer
Another session holds the buffer in an
incompatible mode to our request
These waits indicate read/read, read/write,
 or write/write contention.
The Oracle session is waiting to pin a buffer.
A buffer must be pinned before it can be
read or modified. Only one process can pin a
buffer at any one time.

This wait can be intensified by a large block
 size as more rows can be contained within
the block

This wait happens when a session wants to
access a database block in the buffer cache
but it cannot as the buffer is "busy

It is also often due to several processes
repeatedly reading the same blocks (eg: if
lots of people scan the same index or data
 block)

The main way to reduce buffer busy
waits is to reduce the total I/O on the
system

Depending on the block type, the
actions will differ

Data Blocks

Eliminate HOT blocks from the
application.

Check for repeatedly scanned /
unselective indexes.

Try rebuilding the object with a higher
PCTFREE so that you reduce the
number of rows per block.

 Check for 'right- hand-indexes'
(indexes that get inserted into at the
same point by many processes).

 Increase INITRANS and MAXTRANS
and reduce PCTUSED This will make
the table less dense .

Reduce the number of rows per block

Segment Header

Increase of number of FREELISTs
  and FREELIST GROUPs

Undo Header

Increase the number of Rollback
Segments


A process that waits on the buffer busy waits event publishes the reason code in the P3 parameter of the wait event.

The Oracle Metalink note # 34405.1 provides a table of reference - codes 130 and 220 are the most common.

Resolving intense and random buffer busy wait performance problems. Note# 155971.1


free buffer waits

This means we are waiting for a free buffer
but there are none available in the cache
because there are too many dirty buffers in
 the cache

Either the buffer cache is too small or the
DBWR is slow in writing modified buffers to
disk

DBWR is unable to keep up to the write
requests

Checkpoints happening too fast – maybe due
 to high database activity and under-sized
 online redo log files

Large sorts and full table scans are filling the
 cache with modified blocks faster than the
 DBWR is able to write to disk
If the  number of dirty buffers that need to be
 written to disk is larger than the number that
DBWR can write per batch, then these waits
 can be observed


Reduce checkpoint frequency  -
increase the size of the online redo
log files

Examine the size of the buffer cache
– consider increasing the size of the
buffer cache in the SGA

Set disk_asynch_io = true set
 
If not using asynchronous I/O 

increase the number of db writer 

processes or dbwr slaves
 
Ensure hot spots do not exist by
spreading datafiles over disks and
disk controllers

Pre-sorting or reorganizing data can
help


Note# 163424.1

enqueue waits

This wait event indicates a wait for a lock
that is held by another session (or sessions)
in an incompatible mode to the requested
mode.

TX Transaction Lock

Generally due to table or application set up
issues

This indicates contention for row-level lock.
 This wait occurs when a transaction tries to
update or delete rows that are currently
 locked by another transaction.

This usually is an application issue.

TM DML enqueue lock

Generally due to application issues, 

particularly if foreign key constraints have 

not been indexed.
 
ST lock
 
Database actions that modify the UET$ (used

extent) and FET$ (free extent) tables require 

the ST lock, which includes actions such as 

drop, truncate, and coalesce.
 
Contention for the ST lock indicates there are 

multiple sessions actively performing 

dynamic disk space allocation or deallocation 

in dictionary managed tablespaces



Reduce waits and wait times

The action to take depends on the lock
 type which is causing the most problems

Whenever you see an enqueue wait
event for the TX enqueue, the first
step is to find out who the blocker is
and if there are multiple waiters for
the same resource

Waits for TM enqueue in Mode 3 are primarily due to unindexed foreign key columns.

Create indexes on foreign keys  < 10g

Following are some of the things you
can do to minimize ST lock contention
in your database:
 
Use locally managed tablespaces
Recreate all temporary tablespaces
using the CREATE TEMPORARY
TABLESPACE TEMPFILE… command.
 


Maximum number of enqueue resources that can be concurrently locked is controlled by the ENQUEUE_RESOURCES parameter.

Reference Note# 34566.1

Tracing sessions waiting on an enqueue Note# 102925.1

Details of V$LOCK view and lock modes Note:29787.1






Cache buffer chain latch

This latch is acquired when searching
for data blocks
Buffer cache is a chain of blocks and
each chain is protected by a child
latch when it needs to be scanned
Hot blocks are another common
cause of cache buffers chains latch
contention. This happens when
multiple sessions repeatedly access
 one or more blocks that are
protected by the same child cache
buffers chains 
latch.
 SQL statements with high
BUFFER_GETS (logical reads) per
EXECUTIONS are the main culprits

Multiple concurrent sessions are
executing the same inefficient SQL
that is going after the same data set

Reducing contention for the cache
buffer chains latch will usually require
reducing logical I/O rates by tuning
and minimizing the I/O requirements of
 the SQL involved. High I/O rates could
be a sign of a hot block (meaning a
block highly accessed).  
Exporting the table, increasing the
PCTFREE significantly, and importing
the data. This minimizes the number of
 rows per block, spreading them over
many blocks. Of course, this is at the
expense of storage and full table
scans operations will be slower

Minimizing the number of records per
block in the table
For indexes, you can rebuild them
with higher PCTFREE values, bearing
in mind that this may increase the
height of the index.
Consider reducing the block size
 Starting in Oracle9i Database, Oracle
supports multiple block sizes. If the
current block size is 16K, you may
move the table or recreate the index in
a tablespace with an 8K block size.
This too will negatively impact full
table scans operations. Also, various
 block sizes increase management
complexity.


The default number of hash latches is usually 1024

The number of hash latches can be adjusted by the parameter _DB_BLOCKS_HASH_LATCHES


Cache buffer LRU chain latch

Processes need to get this latch when they
need to move buffers based on the LRU
block replacement policy in the buffer cache
The cache buffer lru chain latch is acquired
in order to introduce a new block into the
buffer cache and when writing a buffer
back to disk, specifically when trying  to
scan the LRU (least recently used) chain
containing all the dirty blocks in the buffer
cache.
Competition for the cache buffers lru chain 

latch is symptomatic of intense buffer cache

 activity caused by inefficient SQL 

statements. Statements that repeatedly scan

 large unselective indexes or perform full 

table scans are the prime culprits.  
Heavy contention for this latch is generally 

due to heavy buffer cache activity which 

can be caused, for example, by:

 Repeatedly scanning large unselective 

indexes
 

Contention in this latch can be
avoided implementing multiple
buffer pools or increasing the
number of LRU latches with the
 parameter DB_BLOCK_LRU_LATCHES
(The default value is generally
 sufficient for most systems).

Its possible to reduce
contention for the cache buffer
lru chain
 latch by increasing the
size of the buffer cache and
thereby reducing the rate at
which new blocks are
introduced into the buffer cache




Direct Path Reads

These waits are associated with direct read operations which read data directly into the sessions PGA bypassing the SGA

The "direct path read" and "direct path write" wait events are related to operations that are performed in PGA like sorting, group by operation, hash join

In DSS type systems, or during heavy batch periods, waits on "direct path read" are quite normal

However, for an OLTP system these waits are significant
These wait events can occur during sorting operations which is not surprising as direct path reads and writes usually occur in connection with temporary tsegments
SQL statements with functions that require sorts, such as ORDER BY, GROUP BY, UNION, DISTINCT, and ROLLUP, write sort runs to the temporary tablespace when the input size is larger than the work area in the PGA


Ensure the OS asynchronous IO is configured correctly.

Check for IO heavy sessions / SQL and see if the amount of IO can be reduced.

Ensure no disks are IO bound.

Set your PGA_AGGREGATE_TARGET to appropriate value (if the parameter WORKAREA_SIZE_POLICY = AUTO)

Or set *_area_size manually (like sort_area_size and then you have to set WORKAREA_SIZE_POLICY = MANUAL

Whenever possible use UNION ALL instead of UNION, and where applicable use HASH JOIN instead of SORT MERGE and NESTED LOOPS instead of HASH JOIN.


 Make sure the optimizer selects the right driving table. Check to see if the composite index’s columns can be rearranged to match the ORDER BY clause to avoid sort entirely.

Also, consider automating the SQL work areas using PGA_AGGREGATE_TARGET in Oracle9i Database.



Default size of HASH_AREA_SIZE  is twice that of SORT_AREA_SIZE

Larger HASH_AREA_SIZE will influence optimizer to go for hash joins instead of nested loops

Hidden parameter DB_FILE_DIRECT_IO_COUNT can impact the direct path read performance.It sets the maximum I/O buffer size of direct read and write operations. Default is 1M in 9i


Direct Path  Writes

These are waits that are associated with
direct write operations that write data from
users’ PGAs to data files or temporary
tablespaces

Direct load operations (eg: Create Table as
 Select (CTAS) may use this)

Parallel DML operations

Sort IO (when a sort does not fit in memory

If the file indicates a temporary
tablespace check for unexpected disk
sort operations.

Ensure
<Parameter:DISK_ASYNCH_IO> is
TRUE . This is unlikely to reduce wait
times from the wait event timings but
may reduce sessions elapsed times
(as synchronous direct IO is not
accounted for in wait event timings).

Ensure the OS asynchronous IO is
configured correctly.

Ensure no disks are IO bound




Latch Free Waits


















This wait indicates that the process is
waiting for a latch that is currently busy
(held by another process).

When you see a latch free wait event in the
V$SESSION_WAIT view, it means the
process failed to obtain the latch in the
willing-to-wait mode after spinning
_SPIN_COUNT times and went to sleep.
When processes compete heavily for
latches, they will also consume more CPU
resources because of spinning. The result is
a higher response time


If the TIME spent waiting for latches is
significant then it is best to determine
which latches are suffering from
contention
.



A latch is a kind of low level lock.

Latches apply only to memory
structures in the SGA. They do not
apply to database objects. An Oracle
SGA has many latches, and they
exist to protect various memory
structures from potential corruption
 by concurrent access.

The time spent on latch waits is an
effect, not a cause; the cause is that
you are doing too many block gets,
and block gets require
cache buffer chain latching




Library cache latch

The library cache latches protect the
cached SQL statements and objects
definitions held in the library cache within the
shared pool. The library cache latch must be
acquired in order to add a new statement to
the library cache

Application is making heavy use of literal
SQL- use of bind variables will reduce this
latch considerably


Latch is to ensure that the application
is reusing as much as possible SQL
statement representation. Use bind
variables whenever possible in the
application

You can reduce the library cache
latch hold time by properly setting the
SESSION_CACHED_CURSORS parameter

Consider increasing shared pool

Larger shared pools tend to have
long free lists and processes that
need to allocate space in them must
 spend extra time scanning the long
free lists while holding the shared
pool
 latch

if your database is not yet on
Oracle9i Database, an oversized
shared pool can increase the
contention for the shared pool latch.

Shared pool latch

The shared pool latch is used to protect
critical operations when allocating and
freeing memory in the shared pool

Contentions for the shared pool and library
cache
 latches are mainly due to intense hard
 parsing. A hard parse applies to new
cursors and cursors that are aged out and
must be re-executed

The cost of parsing a new SQL statement is
expensive both in terms of
CPU requirements and the number of times
the library cache and shared pool latches
may need to be acquired and released.

Ways to reduce the shared pool latch
are, avoid hard parses when
possible, parse once, execute many.

Eliminating literal SQL is also useful to
avoid the shared pool latch. The size
 of the shared_pool and use of MTS
(shared server option) also greatly
influences the shared pool latch.

The workaround is to set the
initialization parameter
CURSOR_SHARING to FORCE. This
allows statements that differ in literal
 values but are otherwise identical to
share a cursor and therefore reduce
latch contention, memory usage, and
 hard parse.


<Note 62143.1> explains how to
identify and correct problems with the
shared pool, and shared pool latch.


Row cache objects latch


This latch comes into play when user
processes are attempting to  access the
cached data dictionary values.



It is not common to have contention in
this latch and the only way to reduce
contention for this latch is by
increasing the size of the shared pool
(SHARED_POOL_SIZE).

Use Locally Managed tablespaces for
your application objects especially
indexes

Review and amend your database
logical design , a good example is to
merge or decrease the number of
indexes on tables with heavy inserts

Configuring the library cache to an
acceptable size usually ensures that
the data  dictionary cache is also
properly sized. So tuning Library
Cache will tune Row Cache indirectly


Followers