July 2005

The Specialist
Database News You Can Use
A monthly newsletter for Oracle usersJuly 2005
From Database Specialists
Expert onsite and remote DBA services
for your Oracle databases
Tel: 415-344-0500 • Toll-Free: 1-888-648-0500

Ask DB Specialists: Multi-table Inserts

You’re invited to submit your Oracle-related questions to us at askdbspecialists07@dbspecialists.com. Include your name and telephone number in your email. If your question is answered in The Specialist next month, you will receive a free copy of our Oracle Wait Events reference poster. (US shipping addresses only).

This month’s question comes to us from Joseph in Northern California: We have a batch job that reads a large table X, and for each row found two rows are inserted into a second table Y. The two rows will be similar, but some of the columns differ. Right now our batch job works by scanning table X twice. Since the table is very large, it would be great if we could get Oracle to just scan the table once. Is there a way to do this?

Terry Sutton of the Database Specialists team responds: We posed this question to a couple of clever SQL specialists and came up with two possible solutions. Suppose the original INSERT statement looked like:

INSERT INTO temp_table (msg_id, msg_type, other_id, other_type)
SELECT msg_id, msg_type, src_id, src_type
FROM   main_table
UNION ALL
SELECT msg_id, msg_type, dst_id, dst_type
FROM   main_table;

One approach would be to rewrite the statement with a Cartesian join instead of a UNION ALL:

INSERT INTO temp_table (msg_id, msg_type, other_id, other_type)
SELECT A.msg_id, A.msg_type,
       DECODE (B.dummy, 1, src_id, 2, dst_id),
       DECODE (B.dummy, 1, src_type, 2, dst_type)
FROM   main_table A,
       (
       SELECT 1 dummy FROM SYS.dual
       UNION ALL
       SELECT 2 dummy FROM SYS.dual
       ) B;

This approach eliminates the need to scan MAIN_TABLE twice, and can be used in either INSERT or SELECT statements. Another approach would be to use the multi-table insert functionality available beginning in Oracle 9i:

INSERT ALL
INTO temp_table (msg_id, msg_type, other_id, other_type)
VALUES (msg_id, msg_type, src_id, src_type)
INTO temp_table (msg_id, msg_type, other_id, other_type)
VALUES (msg_id, msg_type, dst_id, dst_type)
SELECT msg_id, msg_type, src_id, src_type, dst_id, dst_type
FROM   main_table;

This approach also eliminates the need to scan MAIN_TABLE twice. However, it can only be used in an INSERT statement—this technique will not work for SELECT statements.

We wanted to know how each of these options performs, so we ran a SQL trace while executing each of them. We used Oracle 9.2.0.4 Enterprise Edition, and created TEMP_TABLE as a standard table (not a global temporary table). We ran each of the statements once before tracing to minimize disk reads and to preallocate space in TEMP_TABLE.

Looking at the first version:

INSERT /* Version 1 */ INTO temp_table (msg_id, msg_type, other_id, other_type)
SELECT msg_id, msg_type, src_id, src_type
FROM   main_table
UNION ALL
SELECT msg_id, msg_type, dst_id, dst_type
FROM   main_table

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      3.38       3.76          0       2990       7658      200000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      3.39       3.77          0       2990       7658      200000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95  

Rows     Row Source Operation
-------  ---------------------------------------------------
 200000  UNION-ALL  (cr=1202 r=0 w=0 time=1349223 us)
 100000   TABLE ACCESS FULL MAIN_TABLE (cr=601 r=0 w=0 time=363892 us)
 100000   TABLE ACCESS FULL MAIN_TABLE (cr=601 r=0 w=0 time=337393 us)

We see that, as expected, two full table scans are performed on MAIN_TABLE, performing 1202 consistent reads. The CPU time used for execution was 3.38 seconds. The total number of logical reads was 10,648. The difference between these figures and those shown in the Row Source Operation area is accounted for by the inserts into TEMP_TABLE.

Here’s the second version:

INSERT /* Version 2 */ INTO temp_table (msg_id, msg_type, other_id, other_type)
SELECT A.msg_id, A.msg_type,
       DECODE (B.dummy, 1, src_id, 2, dst_id),
       DECODE (B.dummy, 1, src_type, 2, dst_type)
FROM   main_table A,
       (
       SELECT 1 dummy FROM SYS.dual
       UNION ALL
       SELECT 2 dummy FROM SYS.dual
       ) B

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          2          0           0
Execute      1      3.95       4.38          0       2329       7466      200000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      3.96       4.39          0       2331       7466      200000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95  

Rows     Row Source Operation
-------  ---------------------------------------------------
 200000  MERGE JOIN CARTESIAN (cr=607 r=0 w=0 time=1619807 us)
      2   VIEW  (cr=6 r=0 w=0 time=328 us)
      2    UNION-ALL  (cr=6 r=0 w=0 time=314 us)
      1     TABLE ACCESS FULL DUAL (cr=3 r=0 w=0 time=141 us)
      1     TABLE ACCESS FULL DUAL (cr=3 r=0 w=0 time=129 us)
 200000   BUFFER SORT (cr=601 r=0 w=0 time=1348514 us)
 100000    TABLE ACCESS FULL MAIN_TABLE (cr=601 r=0 w=0 time=155193 us)

In this version MAIN_TABLE is scanned only once. After adding the additional small number of logical reads from the inline view, retrieving the data only requires 607 consistent reads. The CPU time used for execution is 3.95 seconds. The total number of logical reads is 9795, a reduction of 853 from version 1. But it seems that the addition of a buffer sort and merge join add to the CPU time required.

The results of the third version:

INSERT /* Version 3 */ ALL
INTO temp_table (msg_id, msg_type, other_id, other_type)
VALUES (msg_id, msg_type, src_id, src_type)
INTO temp_table (msg_id, msg_type, other_id, other_type)
VALUES (msg_id, msg_type, dst_id, dst_type)
SELECT msg_id, msg_type, src_id, src_type, dst_id, dst_type
FROM   main_table

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      1.93       2.23          0       3132       9898      200000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      1.93       2.23          0       3132       9898      200000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  MULTI-TABLE INSERT  (cr=3131 r=0 w=0 time=2229706 us)
 100000   TABLE ACCESS FULL MAIN_TABLE (cr=601 r=0 w=0 time=225990 us)

Again, only one scan of MAIN_TABLE is required. The total number of logical reads is 13,030, higher than version 2 by 3235 reads. The multi-table insert step appears to add 2,530 consistent reads in the row sourcing. But the consolidation of activity into fewer steps drops the CPU time used to 1.93 seconds. This is a reduction of more than 40% from version 1.

So it appears in this example that the multi-table insert method is the most efficient. The results may vary depending on the size of the table being scanned relative to the amount of data being inserted into the output table. For instance, a WHERE clause significantly limiting the number of inserted rows would probably widen the gap in response time between the original statement and versions 2 and 3. But it appears that use of Oracle’s multi-table insert functionality may have substantial performance benefits. You are encouraged to perform your own tests to determine applicability in your environment.

 

Capacity Planning for Oracle Systems

Capacity planning is not a topic that is written about very often. If you’re looking for resources on the topic of capacity planning, we’ve found both a case study and a discussion of theory and fundamentals that you may wish to review. The first resource, a case study published by Sun Microsystems entitled, Capacity Planning as a Performance Tuning Tool—Case Study for a Very Large Database Environment, discusses the performance and scalability impact caused by severe CPU and I/O bottlenecks in a database over 20 terabytes in size. Using Oracle and Statspack in a Sun environment, the case study is organized into the following sections:

  • Analysis and High-Level Observations
  • Resolving CPU and I/O Bottlenecks Through Modeling and Capacity Planning
  • Conclusions
  • Recommendations
  • I/O Infrastructure Performance Improvement Methodology

You’ll find this resource at http://www.phptr.com/articles/article.asp?p=101160&rl=1.

A second resource is a white paper entitled, An Operation Analysis Approach to Capacity Planning. This paper is not Oracle-specific, but it presents information on the theory and fundamental concepts of capacity planning. Mathematical formulas and concepts abound, and the paper states that “Operational analysis is based on directly measurable quantities and a few basic assumptions. The main benefits of this approach are that the quantities are directly measurable, the assumptions are easily verified, and the results are not derived using advanced queuing theory. This makes the results easier to understand and apply.” You’ll find it at http://www.teamquest.com/pdfs/whitepaper/tqwp13.pdf.

Free Two-Day Oracle Self-Study Course

Oracle Corporation is offering a free, no-strings-attached Oracle 10g DBA course for individuals who are new to Oracle. Billing it as a “free DBA self-study tutorial,” it is a two-day course designed for individuals who are new to database administration. Through a series of ten web pages, the course walks participants through basic database administration activities ranging from installing Oracle software to creating a new tablespace. Various tasks are demonstrated using Enterprise Manager and other GUI tools. The course includes topics such as:

  • Installing Oracle and Building the Database
  • Getting Started with Oracle Enterprise Manager
  • Configuring the Network Environment
  • Managing the Oracle Instance
  • Administering Users and Security

You’ll find it at http://www.oracle.com/technology/obe/2day_dba/index.html.

Upcoming Oracle User Group Meetings

Mark your calendar for some upcoming educational opportunities. Chris Date, a well-known founding father of relational database theory, will be the keynote speaker at the summer conference of the Northern California Oracle Users Group (NoCOUG). Roger Schrag and Terry Sutton of Database Specialists, Inc. will offer a session on Wait Event Enhancements in 10g. The full-day conference takes place at Chevron in San Ramon, California on Thursday, August 18. To stay posted, check out http://www.nocoug.org/next.html.

In addition, the next meeting of the Los Angeles Oracle Users Group (LAOUG) will take place on Wednesday, August 17. LAOUG meetings run a full day in length and take place at the Norwalk Marriott Hotel in Norwalk, California.

Leave a Reply

Your email address will not be published. Required fields are marked *