Database News You Can Use
A monthly newsletter for Oracle usersJanuary 2005
From Database Specialists
Expert onsite and remote DBA services
for your Oracle databases
Tel: 415-344-0500 • Toll-Free: 1-888-648-0500
Ask DB Specialists: DBMS_STATS versus ANALYZE
You’re invited to submit your Oracle-related questions to us at email@example.com. Include your name, telephone number and problem description. If your question is answered in The Specialist next month, you will receive a free copy of our Oracle Wait Events reference poster. (US shipping addresses only).
This month’s question comes to us from Carl in Chicago: I’ve heard that Oracle recommends using DBMS_STATS instead of the ANALYZE TABLE command, but I’m wondering does it make any real difference in the statistics obtained?
Terry Sutton of the Database Specialists team responds:The ANALYZE command and the DBMS_STATS package offer two different ways to collect statistics about the objects in your database. Oracle’s query optimizer uses these statistics when determining the most efficient way to perform a query. For instance, if the statistics say one table has only 20 rows, then Oracle may opt for a full table scan instead of an index lookup when accessing that table. If the statistics on another table say a certain column has very few distinct values, Oracle may choose not to use an index on that column.
The DBMS_STATS package was introduced in Oracle 8i, and for the last few years Oracle Corporation has been strongly advising customers to use DBMS_STATS instead of ANALYZE. DBMS_STATS does have some useful features not available with ANALYZE, such as parallel statistics collection, gathering statistics only when existing statistics are stale, gathering partition-level and subpartition-level statistics, among others. One particularly valuable feature, available starting in Oracle 9i, is the GATHER_SYSTEM_STATS procedure. This enables the optimizer to include CPU costs in its calculations. This feature is best described in Jonathan Lewis’s article Understanding System Statistics.
But in addition to the new features, DBMS_STATS can sometimes produce better information. The statistics gathered by DBMS_STATS can be more accurate than those collected by ANALYZE, as we found recently at a client site. The following query:
SELECT ... FROM file_history WHERE fname = :b1
was performing full table scans of a table with 1.7 million rows, even though there was an index on the FNAME column and the dba_indexes view showed that the index was believed to have 1.7 million distinct values. But when we looked at the dba_tab_columns view, we saw:
COLUMN_NAME NUM_DISTINCT LAST_ANALYZED SAMPLE_SIZE ------------ ------------ ------------------ ----------- FNAME 1458 18-DEC-04 04:09:49 9050
A query of the FILE_HISTORY table showed 1,741,405 rows and 1,741,405 distinct values in the FNAME column. So the optimizer was choosing an execution plan for the query on the basis that there were only 1458 distinct values for the column in a 1.7 million row table. From this the optimizer determined that a full scan was the appropriate execution path. The client had been gathering statistics using the ANALYZE command with a sample size of 10,000 rows. Inaccurate statistics led to an inferior execution plan.
Using ANALYZE with a larger sample size helped:
ANALYZE TABLE file_history ESTIMATE STATISTICS SAMPLE 5 PERCENT;
The column statistics in dba_tab_columns now were:
COLUMN_NAME NUM_DISTINCT LAST_ANALYZED SAMPLE_SIZE ------------ ------------ ------------------ ----------- FNAME 10827 18-DEC-04 16:40:22 83096
An even larger sample size helped more:
ANALYZE TABLE file_history ESTIMATE STATISTICS SAMPLE 10 PERCENT; COLUMN_NAME NUM_DISTINCT LAST_ANALYZED SAMPLE_SIZE ------------ ------------ ------------------ ----------- FNAME 17002 18-DEC-04 16:53:36 159672
But even using the ANALYZE command with a 10% sample gave us a fairly inaccurate count of 17,002 distinct values. So next we tried collecting statistics with the DBMS_STATS package:
EXECUTE dbms_stats.gather_table_stats (ownname=>'PROD', - tabname=>'FILE_HISTORY',estimate_percent=>5,cascade=>true) COLUMN_NAME NUM_DISTINCT LAST_ANALYZED SAMPLE_SIZE ------------ ------------ ------------------ ----------- FNAME 1737740 18-DEC-04 17:01:53 86887
We see that collecting statistics with the DBMS_STATS package and a sample size of 5% in this case yielded an estimate of distinct values within 0.2% of the actual figure. This is certainly good enough for the optimizer, while the statistics collected by the ANALYZE command were off by a couple orders of magnitude.
Please note that there are a few statistics that the DBMS_STATS package does not gather. The values in the dba_tables view for CHAIN_CNT, AVG_SPACE, and EMPTY_BLOCKS are not updated by DBMS_STATS, so if you want these statistics to be updated you’ll still have to use ANALYZE. But generally these statistics do not need to be as current as those used by the optimizer, so an occasional ANALYZE can be performed if you want these statistics, while DBMS_STATS can be used more frequently to provide the best information for the query optimizer.
In summary, the DBMS_STATS package offers many features not available with the ANALYZE command. Furthermore, there are times when DBMS_STATS has been seen to collect far more accurate statistics. For these reasons, moving to DBMS_STATS seems like a good idea. However, if you are interested in statistics on chained rows, average space within each block below the high water mark, or number of blocks in allocated extents above the high water mark, you’ll need to use the ANALYZE command to collect this information.
When is it better to use an IN clause than an EXISTS clause in your query? Can Oracle use an index on a NOT EXISTS clause? How do you get Oracle to do a hash join when evaluating a NOT IN clause? NoCOUG Winter Conference
These are just a few of the questions answered in a definitive white paper published recently by Roger Schrag of Database Specialists entitled, Speeding Up Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT EXISTS, IN, and NOT IN.
This paper uses examples from real applications, as well as examples from the scott/tiger schema that you can reproduce on your own development database in order to master the fundamentals. Understanding how Oracle evaluates these SQL constructs and how to leverage semi-join and anti-join access paths will empower you to write more efficient SQL and make a whole class of queries run orders of magnitude faster. Read the paper at http://dbspecialists.wpengine.com/presentations.html#semijoins.
Roger Schrag of Database Specialists will speak at the Winter Conference of the Northern California Oracle Users Group (NoCOUG), sharing his knowledge of performance tuning and speeding up queries. The educational event takes place on Tuesday, February 8 at the Oracle Conference Center in Redwood Shores, California. Database Specialists at RMOUG Training Days
Roger will be joined by Gaja Vaidyanatha, Jeffrey Jacobs, David Austin, and many others sharing their Oracle expertise at this one-day educational event hosted by Oracle Corporation. Click here for more information. If you can’t make it to the conference, stay posted here for published speaker presentations on topics such as:
Two members of the Database Specialists team have been invited to share their expertise at the Rocky Mountain Oracle Users Group Training Days 2005 taking place in Denver, Colorado February 9-10, 2005. Emergency Patch #68: Oracle Security
Senior Staff Consultant Terry Sutton returns for his second year at the conference. He’ll be discussing a real-life database recovery challenge he encountered and the problem-solving techniques and creative solutions used to meet the challenge. Roger Schrag will discuss Oracle performance optimization and outline specific how-tos for speeding up SQL statements to improve database performance.
RMOUG Training Days packs over 90 technical sessions into a two-day event for all levels and includes educational tracks such as:
In September 2004, we included an article about an emergency security patch #68 from Oracle Corp. This is a severity 1 alert, and a revision of the alert was issued on December 27, 2004. You’ll find current security alerts on Oracle’s website at http://www.oracle.com/technology/deploy/security/alerts.htm.
If you were wondering whether or not patch #68 was important, Pete Finnigan, an Oracle security specialist, has given links to ten postings that identify some of the vulnerabilities addressed by emergency patch #68. If you are interested in reviewing them, go to his website at http://www.petefinnigan.com/weblog/archives/00000183.htm issues. Suffice it to say that, after you finish reading, you’ll probably want to rush out and apply the security patch to all of your servers if you haven’t already!