Practical Space Management in Data Warehouse Environments

by Hamid Minoui
Database Specialists, Inc.

About Database Specialists, Inc.
Database Specialists, Inc. provides remote DBA services and onsite database support for your mission critical Oracle systems. Since 1995, we have been providing Oracle database consulting in Solaris, HP-UX, Linux, AIX, and Windows environments. We are DBAs, speakers, educators, and authors. Our team is continually recognized by Oracle, at national conferences and by leading trade publications. Learn more about our remote DBA, database tuning, and consulting services. Or, call us at 415-344-0500 or 888-648-0500.

Abstract

Managing space in a data warehouse environment is often one of the most challenging tasks for data warehouse database administrators. Not only are there many data files and tablespaces to manage, but the size of individual segments and partitions tends to be very large. This white paper addresses best practices in effectively managing very large databases and data warehouse environments with a focus on efficiently using the existing disk space. Applying these practices will help save space, reduce or eliminate space-related issues, and enhance query performance. Of course, as with any technical paper, it is imperative that you thoroughly test any of these methods before applying them to your own specific production environment.

Introduction

I have had several years of practical experience working as a DBA in both OLTP and data warehouse environments. This experience has allowed me to develop and implement various techniques and methodologies that are relevant for each of the two environments. Each technique is not necessarily appropriate for—or applicable to—both environments, because of the subtle differences between the characteristics of the two.

Objectives

In writing this paper and preparing the associated presentation, my objective is to point out space management issues that are encountered in a data warehouse environment and offer resolutions and proactive strategies that I have learned and successfully put into practice while managing and administrating data warehouse databases.

Since the Oracle RDBMS offers many features for effective space management practices, this paper is not limited to features and techniques that I personally have implemented. Therefore, I will mention other space management features that I have explored but have not yet implemented—including those that are now available in Oracle 10g.

Some of the methodologies discussed here might be applicable to both OLTP and data warehouse environments, but the focus of this paper is on the latter.

Characteristics of Data Warehouses

Loading large amounts of data and running a variety of queries to get responses to business questions dominate the activities of a typical data warehouse database. Most data warehouses range from hundreds of gigabytes to several terabytes in size. Managing databases with that much data is very challenging and takes experience. Working with terabytes of data is an order of magnitude different from the gigabytes of data in most OLTP-based systems.

The business activities associated with the utilization of data warehouses mostly involve reading with periods of huge loads, sometimes accompanied with data extractions and transformations. This data manipulation is collectively referred to as ETL (extraction, transformation and load). After the data is loaded—usually into a staging area or an operational data store—it is massaged and manipulated. Data massaging, aggregations, validation or pre-processing, are some of the functions that are applied to the data before the results are stored into a target area of the data warehouse. To the data warehouse users, the target area is the data warehouse for which the queries are generated and against which the data mining operations are performed.

Some maintenance activities commonly performed in data warehouse environments include frequent table reorganizations, index rebuilds, and various partition maintenance operations.

The data to be loaded into a data warehouse could come from literally hundreds of sources running on a variety of platforms. There are often data integrity issues—or data with inconsistent conventions or different formats—and the generation of duplicate data due to missing primary keys to deal with. Further complicating matters, the structure of data coming from source systems can be changed anytime without prior notice.

Since new data is loaded into tables on a daily basis and new indexes are built on these tables, statistics on the new objects need to be collected so that the cost-based optimizer will have valid and up-to-date statistics to work with.

Database features such as materialized views, bitmap indexes, bitmap-join indexes, index-organized tables, parallel query execution, star schemas, table compression, index key compression, and dimensions are among those that are either specific to data warehouse databases or at least very common in data warehouse environments. Data loads are often performed using Oracle’s direct path and parallel options with data pre-sorted whenever possible.

Business questions asked by means of queries can change frequently. The queries themselves are usually generated by third party front-end products such as Business Objects. These queries are dynamically generated and run against the data warehouse at any time and at any interval without following any specific pattern. The questions of the day need answers, and the users often want them quickly.

Once the data is loaded into the warehouse, SQL tuning and sometimes re-writing of large queries becomes the next challenging task. Star transformation, materialized views, bitmap indexing, and the usage of database resource management might be among methods used to resolve query performance problems.

These—and many other techniques and features—are less commonly seen in OLTP environments.

Space as a Coveted Resource in Data Warehouses

As stated previously, a data warehouse consumes a lot of disk space. The amount of data is generally large. Consequently, there are many segments containing various types of data and indexes—often of various types—to help provide efficient access to the data.

The incoming data for data warehouses generally originates from multiple sources. This includes other databases in the enterprise from which old data might be purged once it is successfully stored in the data warehouse. The data warehouse often contains years of historical data online and is associated with many years of data that has been archived to tape and retained offline to be brought back online when needed.

It is typical in a data warehouse to duplicate data or de-normalize tables to reduce the number of joins in order to improve the performance of queries. There might be many versions of a group of records stored with various levels of dimensions or for different time periods—daily, weekly, monthly, and so on. Materialized views, which are often used, usually require an additional amount of extra storage and space management.

The DBA needs to plan for and provide enough available space to accommodate the aforementioned space-intensive activities as well as new data coming in. A valid capacity plan should be in place to closely estimate the amount of space needed for the next few months taking into consideration the time it would take to purchase and add new disk space. There should also be enough reserved disk space allowing for emergency space requirements and for circumstances under which particular historical data is recalled. There should be enough disks readily available within the file systems so that either the DBA or the system (in the case of auto-extensible data files) can extend the size of the existing database files when needed.

From an IT budget standpoint, obtaining the approval to purchase a huge amount of disk to pre-allocate for the provision of data growth, future expansion, and “just in case” requirements has its own political and justification problems which are beyond the scope of this paper.

Because of all of these issues and the enormous demand for disk space, space might be considered a coveted resource in a data warehouse environment.

Monitoring Tablespace and Disk Capacity

Monitoring and in-time reacting to the growth of segments beyond the capacity of the tablespaces in a data warehouse can be very tedious and challenging. Unlike in OLTP environments, a single load can easily fill up an entire tablespace without giving a DBA enough time to react to warning alerts and remedy the situation prior to the impact of the problem to the load operation. If a space problem is encountered during a long load process, the entire process might have to be started over again. This can lead to the data not being available when it is expected, and also the waste of time and resources to reload the data from scratch once the space issue is resolved.

Fortunately, in data warehouse environments, loads are usually scheduled. Therefore we have some idea of when data will arrive, how much data there will be, and where the data will reside. This knowledge allows us to prepare. We also know that once the data is loaded, most of it will not change. This allows us to do some post-load space planning.

Database Backups in a Data Warehouse

Even if a large amount of disk space is at hand, pre-allocating large amounts of space for tablespaces can have a significant impact on the database backup. The more online tablespaces are used, the more data there is to back up every night. And the more there is to back up, the longer database and system resources will be used up to back up the database. Furthermore, if third-party backup software is used, there may be licensing issues relative to the amount of data being backed up. Restoring and recovering a large amount of data in the event of a media failure is not a trivial matter.

Fortunately, in data warehouse environments tables (or the partitions of the tables) holding historical information can often be made read-only (by making their tablespaces read-only). This also applies to the associated indexes (or locally partitioned indexes). By having this strategy in place, as part of the monthly maintenance process, such tablespaces can be made read-only. However, there is a caveat. Even in Oracle 10g, a tablespace cannot be made read-only until all transactions in the database are committed. (Metalink bug 3752361 discusses this.) On a busy database it can be hard to find a time when there are no uncommitted transactions, and you may need to schedule a downtime for this purpose.

After tablespaces are made read-only, a special one-time database backup can be run to back up only the read-only tablespaces. The regular nightly hot backup can back up just the writable (not read-only) tablespaces.

Avoiding Unnecessary Redo Generation

It will also help to minimize the generation of redo entries. Oracle generates redo entries as part of its recovery mechanism. This can be a waste of system resources for operations that do not need to be recoverable. There are a number of database operations that can be performed in a data warehouse for which redo log generation is not needed. Assuming log generation is not enforced at the database or tablespace level (with the FORCE LOGGING keywords), these operations should be re-visited and appropriately modified. In this section we will look at some specific areas to consider.

Creating Tables with the NOLOGGING Keyword and Performing a Direct Path Load

Data is often loaded into a staging area in the database from flat files using SQL*Loader, Informatica, or other utilities. Once the data is loaded into the staging area, it will be transformed and stored in the operational data store or the target area of the data warehouse. Often there is no need to generate redo entries for the loading of data into the staging area—assuming the data source is not discarded and the load can be repeated if necessary.

The load can be done with the direct path option and the tables to which the loaded data will reside can have the NOLOGGING keyword set. All indexes on these tables can be created with the NOLOGGING keyword as well.

Should there be any media failure, the data load process or the building of the indexes can simply be repeated as long as the source data and the index creation scripts are still available.

Creating Tables with the NOLOGGING Keyword and Performing Other Bulk Load Operations

The same is true for data that is generated, perhaps from an external table, with bulk load operations into transient tables. These are insert operations with the APPEND hint or CREATE TABLE AS SELECT statements. Examples are shown below:

INSERT  /* +APPEND */ INTO transient_table
SELECT * FROM  source_table;

or

CREATE TABLE transient_table
AS
SELECT * FROM source_table;

One should consider the fact that the starting point for bulk loading into an existing table is the block above the high-water mark even if there are blocks with available space below the high-water mark. This typically will not be an issue, however, because the tables are usually empty before a bulk load.

Using Global Temporary Tables

The use of global temporary tables is recommended when possible. Global temporary table can be utilized during a transaction or within a single session, but not across session boundaries because the data in global temporary tables cannot persist beyond a session. The data retention of global temporary tables depends on which option was used with the ON COMMIT clause when the table was created. The two available options are: PRESERVE ROWS and DELETE ROWS. The use of global temporary tables is very efficient because very little redo is generated and space allocated in the user’s temporary tablespace is automatically released upon the expiration of the temporary data. Since row changes in global temporary tables do not directly generate redo entries, the FORCE LOGGING setting does not impact global temporary tables.

Creating Indexes with the NOLOGGING Keyword

Indexes are usually created and maintained via scripts. There is little reason to make the indexes recoverable. In case of media failure, impacted indexes can simply be re-created as part of the database backup and recovery strategy.

The same argument might apply to certain tables, such as tables that can be regenerated from their source. It might be worth the time to make a list of candidate tables along with the data-regeneration methods.

For the above cases, it is advisable to maintain an up-to-date structure and dependencies list for tables and indexes. I also recommend maintaining a list and recovery instructions, including the DDL scripts, for segments that are not recoverable by means of database recovery.

Improving the Performance of Bulk Load Operations

To improve the performance of bulk load operations, designate all the indexes on the target table unusable before loading. In addition to the APPEND hint, you can also use the PARALLEL hint with an appropriate degree of parallelism when bulk loading into the table.

To designate the indexes unusable, use the UNUSABLE keyword for all the indexes on the target table as shown below:

ALTER INDEX target_table_index1 UNUSABLE;

With Oracle’s default behavior, inserting into a table with unusable indexes will fail. In order to proceed with a bulk insert after designating indexes as unusable, you must alter your session to cause Oracle to skip the unusable indexes. The following statement accomplishes this:

ALTER SESSION SET skip_unusable_indexes = TRUE;

At this point you can perform the bulk load operation without receiving an error regarding unusable indexes. However, if there are any triggers associated with the target tables, you may wish to disable them first to speed up the load further. Triggers are not common in data warehouse tables since the tables are not typically subject to DML operations. Triggers can be disabled as follows:

ALTER TABLE target_table DISABLE ALL TRIGGERS;

Note that in the case of indexes associated with unique or primary key constraints, you should elect not to make them unusable unless you can trust the source data and are convinced that there will be no violation to the uniqueness of the keys and there will be no null values in the case of primary key constraints. You can simply disable the constraints associated with the indexes and enable them after the load.

Once the load operation is completed, commit your transaction, rebuild the indexes, and enable the constraints and triggers, if any. To make the indexes usable again you can generate the rebuild statements as shown below:

SELECT 'ALTER INDEX ' || index_name || ' REBUILD;'
FROM   user_indexes
WHERE  status = 'UNUSABLE';

If the indexes are partitioned, the statements to mark them unusable and rebuild them will have the following structure instead:

ALTER INDEX target_table_index1 MODIFY PARTITION part1 UNUSABLE;
ALTER INDEX target_table_index1 REBUILD PARTITION part1;

If you are using SQL*Loader to perform a direct path load, you can still skip unusable indexes by setting the skip_unusable_indexes parameter in the SQL*Loader control file.

After the indexes have been rebuilt, you can enable triggers on the target table with a statement like the following:

ALTER TABLE target_table ENABLE ALL TRIGGERS;

Space Issues in Data Warehouses

Depending on what activities are being performed at the time, different types of space issues can surface in a data warehouse. Space issues involving temporary or undo tablespaces are quite different from those involving permanent segments, so we’ll look at each in turn.

Space Issues Involving Permanent Tablespaces

These space issues are the result of poor extent sizing, setting a maximum for the number of extents, or the associated data files being too small. Another type of error than can take place with permanent tablespaces is when a user’s quota is exceeded.

Some methods to reduce the possibility of these types of issues from occurring—and the historical reasons for the faulty configurations—will be covered in more detail later in this paper.

Space Issues Involving Temporary Tablespaces

Space issues in temporary tablespaces typically happen when there is not enough space for the sort segments generated by queries performing disk sorts to allocate a new extent for the temporary segment. This situation can also be caused by the growth of other temporary segments such as global temporary tables in the user’s assigned temporary tablespace.

Lack of sufficient temporary space is especially a problem in data warehouse environments, where concurrent users frequently run queries requiring very large sorts. Sort operations are very common in data warehouse environments because of frequent use of ORDER BY, GROUP BY, and DISTINCT functionality in queries.

You might be able to reduce the frequency of disk sorts by using the PGA dynamic memory allocation feature introduced in Oracle 9i. This feature is activated by setting the instance parameter workarea_size_policy to AUTO and setting the pga_aggregate_target parameter to an appropriate value. However, when sorts are too big to be performed entirely in memory, the issue of sufficient space in the temporary tablespace remains.

By configuring the new Oracle 10g tablespace group feature that allows DBAs to assign a group of temporary tablespaces instead of a single temporary tablespace to a user, the occurrence of this type of space issue should be somewhat reduced.

Space Issues Involving Undo Segments and Undo Tablespaces

One such issue is the failure of a long running query or transaction due to the ORA-1555 (snapshot too old) error. One of the causes of this error is that the undo records needed by a query started before a change is committed and is later committed and overwritten while the query is running. Other issues that may cause the failure of a transaction due to space problems are ORA-1562 and ORA-1547 errors, which are related to the failure of the rollback segment to extend.

Using a Large Database Block Size

When setting up a data warehouse for the first time, take a serious look at the default database block size (specified by the db_block_size instance parameter), since this attribute can only be specified at database creation time. This is one of the most important decisions that you can make with your new data warehouse project.

The effect of setting an inappropriate database block size in a data warehouse can be disastrous. It can negatively impact the overall performance of the database by causing inefficient I/O for the majority of the queries. It can lead to the failure or success of your project and that is why it needs to be taken seriously.

The db_block_size parameter should be set to a multiple of the operating system block size. Set it to the largest value supported by Oracle that your I/O subsystem can handle in a single read. A database block size of 32 Kb or 64 Kb is not uncommon for a data warehouse database. Ideally a row should be contained within a single block. You get the best benefit from larger block size if your database is configured on raw devices or direct I/O is available to you.

The most significant benefit you get from larger block size is efficiency with index scan. This is because a larger block size reduces the number of reads required to probe an index and scan a range of values from its leaf blocks.

There are other advantages as well. One of them is that memory requirements for the default buffer cache are reduced due to fewer buffers needed for index branch blocks. Also, table and index key compression will offer a more efficient compression ratio as these compressions are performed on repeated values within a single block. Since the percentage of total block space occupied by the fixed portion of the block header is reduced as block size increases, there will be an improvement in the overall data density as larger block sizes are used.

Larger blocks are less likely to encounter row chaining, as blocks can accommodate longer rows. And as a result of the increase in the size of the transaction table in undo segment header blocks, the occurrence of the “snapshot too old” error is also reduced.

Data loads can also benefit from a larger block size, because when blocks are modified, sequentially fewer writes are required and less redo is generated as a result of the reduced block-level overhead.

If your existing data warehouse is already configured with a small default block size you might consider using multiple buffer caches in order to add new tablespaces to the database with a larger block size. However, additional management and administration is necessary for databases with multiple block sizes.

Disks, I/O, and Database File Configuration

Nothing can impair a data warehouse more effectively than a poorly configured I/O subsystem. When planning a data warehouse, a major consideration should be on the read and write efficiency of the data, which is highly dependent on how the disks are configured and how data is distributed on them. If at all possible, use raw I/O. Otherwise, make sure your file system supports and uses direct I/O. Also take advantage of asynchronous I/O and parallel read and write. Stripe and mirror using RAID 0+1 or mirror and stripe using RAID 1+0 if these are available configurations from your disk vendor.

Evenly spread your data files and use the Stripe and Mirror Everything (SAME) methodology on as many disk devices as you can. If you are using a file system instead of raw devices, leave enough unallocated space in each file system to be able to make data files auto extendable and monitor your Unix file systems constantly.

Managing and Monitoring Undo Segments

In versions prior to Oracle 9i, the rollback segments and their sizing were managed manually by the DBA. Beginning in Oracle 9i, Automatic Undo Management (AUM) became available as an alternative way to let Oracle manage these segments. The new term “undo segments” replaces the previous term “rollback segments.”

With manual undo management, undo information for active transactions is stored in an available, possibly session-specified rollback segment until the transaction is committed or rolled back. At that point undo entries from the completed transaction become inactive. The oldest inactive entries will be overwritten by Oracle if the space is needed by a new active transaction. The DBA has to size the rollback segments accordingly. I consider this method of undo management a thing of the past so I am not covering it in this paper. If you are using Oracle 9i or later, you should take advantage of AUM.

One of the advantages of automatic undo management is that you can set the undo_retention parameter to specify an amount of time (in seconds) during which inactive (committed) undo entries should be retained in the undo segment as unexpired.

The immediate benefit of using AUM is that when you properly set the undo_retention parameter, you will not encounter the ORA-1555 error as frequently, since the inactive transactions do not expire for the length of time specified by undo_retention. However, the undo information of the expired transactions can still be overwritten if the undo tablespace is too small and cannot be auto-extended.

Once the time specified by this parameter is reached for an unexpired undo, it is marked as expired, and the space it occupies becomes available to be reused by other new transactions. The expired undo is still retained unless the space is needed for upcoming transactions.

An undo tablespace is a special type of locally managed tablespace that may contain only undo segments. More than one undo tablespaces can be created in a database, but only the one specified by the undo_tablespace parameter is used when the database is opened.

If the undo tablespace is not large enough to accommodate all unexpired undo entries, transactions will fail with an ORA-1562 error message. Using an improved algorithm, Oracle will attempt to acquire the needed space by overwriting the oldest unexpired undo in favor of failing active transactions.

In the event that the undo tablespace is too small or the undo_retention setting is too large, either unexpired undo in the same undo segment will be reused (undo reuse), or unexpired undo in a different undo segment will be reused (undo stealing) with the former being more common. SMON periodically performs space management within the undo tablespace and shrinks the size of idle undo segments. It also performs space management when transactions need to steal extents.

The number of times unexpired undo extents are reused or stolen is tracked internally by Oracle in ten-minute intervals. The v$undostat view can be queried to display this information. Information for the last seven days is stored by Oracle. Since the current cycle is incomplete, the statistics for the current cycle reflect partial information. Some columns of interest in v$undostat are:

Column NameDescription
begin_time Beginning time for this interval
end_time Ending time for this interval
undotsn Tablespace number of the last active undo within the interval
undoblks Number of consumed undo blocks within the period
unxpstealcnt Number of attempts to obtain undo space by stealing unexpired extents from other undo segments
unxpblkrelcnt Number of unexpired blocks released from undo segments to be used by other transactions
unxpblkreucnt Number of unexpired blocks that were reused by transactions from the same segment
maxquerylen The longest length of time (in seconds) a query took to complete within this period
txncount Total number of transactions executed during the period
ssolderrcnt Number of times ORA-1555 occurred during the period
nospaceerrcnt Number of times space was unavailable in the undo tablespace when requested and a failure resulted

The information in v$undostat will assist you to properly set the undo_retention parameter and to size the undo tablespace. The default retention is 300 seconds. For example, the value of the maxquerylen column can be used in the query below to determine the length of the longest running query in the last seven days:

SELECT MAX (maxquerylen)
FROM v$undostat;

The value returned by this query is a recommended starting point for the undo_retention parameter in Oracle 9i.

Beginning in Oracle 10g Release 1, undo retention is automatically tuned by Oracle. Oracle tracks all active queries and automatically adjusts the undo retention according to the value of the maximum query length. Note that the undo_retention parameter is still available in Oracle 10g, but now it defines the lower limit for how long committed undo entries are kept. Manual configuration of undo_retention is still needed to support the flashback query feature of Oracle 10g.

Managing the Undo Tablespace

As we have seen, the setting of the undo_retention parameter impacts the size of the undo tablespace. This is because the size requirement of the tablespace increases as entries in the undo segments are to be kept longer. In the event that the amount of space in the undo tablespace is lower than required, transactions will fail or undo entries will not be retained as long as specified by the undo_retention parameter.

If a retention guarantee clause is specified when creating the undo tablespace, then expired undo entries are treated like active entries and thus will never be overwritten during the retention period. This guarantees that undo from committed transactions will be retained for the time period specified by the undo_retention parameter. However, this also increases the possibility of the tablespace running out of space.

Whether you create the undo tablespace as part of the database creation or via the CREATE UNDO TABLESPACE command, I recommend enabling the auto-extend feature. The undo tablespace must be sized appropriately based on the setting of the undo_retention parameter. If you don’t need to use the flashback query feature in Oracle 10g, sizing your undo tablespace is relatively easy. The following formula can be used to determine how much space undo segments will consume:

Undo Segment Space Required (Mb) = (undo_retention * undo blocks per second * db_block_size) / 1024 / 1024

For example, for a 16 Kb block size database, if the value of undo_retention is 1200 seconds (20 minutes) and the rate of undo blocks consumption is 80 blocks per second, the above formula generates:

Undo Segment Space Required (Mb) = 1200 * 80 * 16384 / 1024 / 1024 = 1500 Mb

By querying the ssolderrcnt and nospaceerrcnt columns of v$undostat, you can monitor the occurrences of “snapshot too old” errors or space issues in the undo tablespace. .The undoblks column will help you determine the average number of undo blocks consumed per second.

Database Fragmentation Issues

Database fragmentation is another issue that causes inefficient space utilization and greatly impacts performance. Oracle DBAs deal with three levels of fragmentation.

Fragmentation within a Tablespace or File

This refers to unused space within a tablespace that is not large enough for the allocation of an additional extent to a segment (bubble fragmentation), or free un-coalesced extents next to each other that are not being considered as one larger free space (honeycomb fragmentation).

Fragmentation within a Segment

This occurs when the space allocated to a segment is not completely utilized and is considered wasted, such as all the blocks above a table’s high-water mark. These blocks are referred to as unused blocks.

Fragmentation within a Block

This refers to the space within a block that is not being used as a result of the PCTFREE and PCTUSED setting of the segment using the block. This can also be caused by deletion of rows from a table, or row migration.

To make better use of space and to improve performance, fragmentation should be avoided, and existing fragmentation should be identified and reduced or eliminated. Some of the recommendations covered in this paper address identifying different fragmentation levels along with strategies to avoid, reduce, or eliminate them.

If a staging table is to be emptied before new data is loaded, avoid using DELETE to remove all the rows from the table; use TRUNCATE instead. In addition to the fact that DELETE is significantly less efficient and generates transactional undo information, it does not reset the table’s high-water mark as TRUNCATE does.

Tablespace Planning

Locally managed tablespaces have been available since Oracle 8i, and I would like to concur with frequent recommendations by Oracle and various publications and presentations about the advantages associated with using them instead of dictionary managed tablespaces.

From a performance perspective, locally managed tablespaces are more efficient because they do not require DML operations on dictionary tables such as fet$, uet$, seg$, and file$ as is the case with dictionary managed tablespaces. Updating the data dictionary tables is a serialized operation involving the use of the ST enqueue. This can impact the performance of DDL and DML operations in dictionary managed tablespaces. It should be pointed out, however, that even with locally managed tablespaces there will still be some rare activities against the dictionary table tsq$ for updating the user’s tablespace quota.

When extents are allocated or released in locally managed tablespaces, Oracle simply modifies bitmap values to reflect the new status. Consequently, the amount of undo and redo generation is reduced and concurrency is greatly increased.

From a space management perspective, uniformity of the size of extents in a locally managed tablespace can be strictly enforced with the UNIFORM keyword. Therefore, tablespace-level fragmentation (both honeycomb and bubble) can be eradicated with locally managed tablespaces. As another benefit, coalescing of free space is no longer required with locally managed tablespaces since the bitmap is automatically updated.

Objects created in locally managed tablespaces automatically have their maximum number of extents set to unlimited. Therefore, concerns about large number of extents and a segment reaching the maximum number of extents evaporates.

Locally Managed Tablespace Considerations

There are some things to keep in mind when using locally managed tablespaces, such as the size of the extent allocation bitmap and the use of storage parameters on individual segments.

Extent Allocation Bitmap

The header of each database file belonging to a locally managed tablespace is reserved for the bitmap that is used for the management of the extents residing in that data file. The size of this bitmap is 64 Kb.

When you create a tablespace or add data files to an existing tablespace, you should size data files as a multiple of the uniform extent size plus 64 Kb so that you don’t waste space. This may seem like a trivial amount of space to be concerned about, but in a data warehouse it can add up—especially if the uniform extent size is large. Multiply a large extent size by a large number of data files, and you may find a significant amount of wasted space.

Storage Parameters

Avoid setting the INITIAL and NEXT storage parameters when creating a segment in a locally managed tablespace. Also, take care when moving a segment that has storage parameters set to values larger that the uniform extent size of the locally managed tablespace. The reason is that Oracle will pre-allocate extents based on the settings of the INITIAL, NEXT, and MINEXTENTS storage parameters. By specifying storage parameters, you may cause more space to be pre-allocated for the segment than needed.

In the past there was a belief that segments with fewer extents led to better query performance. For this reason there was a tendency to create segments with large INITIAL and NEXT storage parameters. The proper sizing and the number of extents today do make some difference, but not much. For example, it is best to make the size of the extents a multiple of the db_file_multiblock_read_count parameter so that reads don’t span extent boundaries during a full table scan or index range scan.

Automatic Segment Space Management (ASSM)

If you are using locally managed tablespaces, you can take advantage of the Automatic Segment Space Management (ASSM) feature introduced in Oracle 9i. This should be counted as another advantage of locally managed tablespaces. This feature is enabled at the tablespace level and applies to all segments contained in the tablespace.

ASSM manages free space within a segment by using bitmaps instead of free lists. It offers a more efficient way of reducing segment-level and block-level fragmentation. The bitmap identifies the space utilization for each block in the segment.

This feature automates the management of free blocks and replaces the use of the FREELISTS, FREELIST GROUPS, and PCTUSED storage parameters and at the same time substantially reduces the number of buffer free waits, improving database performance.

ASSM can also lead to more effective usage of space. While FREELISTS simply show if a block is available for insert or not, ASSM bitmaps provide a more granular and accurate picture of space utilization within blocks. This allows better space utilization of the data blocks helping to reduce block-level fragmentation. Use of locally managed tablespaces in conjunction with ASSM offers a best way to minimize tablespace-level, segment-level, and block-level fragmentation.

To enable ASSM for a tablespace, specify the SEGMENT SPACE MANAGEMENT AUTO clause in the CREATE TABLESPACE statement.

ASSM also enables the use of the online segment shrink feature introduced in Oracle 10g. This feature will be discussed later.

Configuring Multiple Tablespace Size Models

Obviously not all the segments in a database are of the same size, and the data warehouse is no exception. In order to avoid tablespace-level fragmentation and at the same time reduce space waste, try to group your database segments based on their estimated size and configure tablespaces with different uniform extent sizes for storing segments of each group. This methodology, known as SAME, was the subject of a famous white paper titled, “How to Stop Fragmenting and Start Living: The Definitive Word on Fragmentation” by Himatsingka and Loaiza.

Three groups should be sufficient. I have personally developed and used the following three groups in a data warehouse based on segment size.

Segment SizeExtent SizeSize Model
Less than 128 Mb128 KbSmall
Between 128 Mb and 4 Gb4 MbMedium
Over 4 Gb128 MbLarge

Once you have defined the tablespace size models, develop a naming convention for your tablespaces so that you can identify them easily by name. Also include, as part of your naming convention, characters for distinguishing between tablespaces for indexes, tables and partitions of tables and indexes. You could include S, M and L for Small, Medium and Large based on the above size model.

Tablespaces for Different Types of Segments

Fortunately it is now a common practice to store each type of segments in different tablespaces. However, there are some who argue that since tables and indexes are not accessed in the same I/O operation, there is no need to do that. This is not entirely true, because—from the manageability and usage perspective—you would still want them to be separated. Indexes in a data warehouse are often made unusable before the load and rebuilt after the load, so keeping them separated would make their extents management more attainable and practical.

Setting the PCTFREE and PCTUSED Storage Parameters

The lower the setting of a segment’s PCTFREE storage parameter, the more table rows or index entries can be held in each block. The default value of 10 for PCTFREE may not be appropriate for most of the tables and indexes in a data warehouse. The PCTUSED storage parameter, meanwhile, has a default value of 40 but is not used for segments in ASSM tablespaces.

The PCTFREE setting for a table sets an upper limit for inserting rows in each of the blocks for the table, reserving room for future updates. For an index, PCTFREE sets a limit in the index block to reserve space for another purpose. The space reserved for an index block is only used for the insertion of new rows into the table when the corresponding index entry must be placed in the correct index block. (That is, between preceding and following index entries). If no more space is available in the appropriate index block, the indexed value is placed in another index block.

For a data warehouse, the space reserved in data blocks by PCTFREE is usually not needed. Most of the tables and indexes are static once they are initially populated. A setting of 0 for PCTFREE for both tables and indexes (and possibly 99 for the PCTUSED for the tables) might be more appropriate in most cases. Setting PCTFREE to 0 will reduce the amount of unused space in the segment, therefore reducing or eliminating segment-level fragmentation. This will also reduce the amount of I/O, because a smaller number of blocks will need to be accessed.

Index-Organized Tables

For certain data warehouse tables, all or most of the columns of the table belong to the primary key. In order to minimize the space used by such tables and their indexes—and to reduce the amount of I/O required to scan a primary key index and fetch additional columns from the table—you might want to consider using index-organized tables. An index-organized table forces a physical sort order on the table segment.

Compression

The table compression feature introduced in Oracle 9i Release 2 and index key compression introduced in Oracle 8i can significantly reduce the amount of disk storage and buffer cache use, which lead to improved read operations and better query performance. The compression of tables and index keys are different in some ways, but they are similar in that they both factor out repetitive values within a block.

Table Compression

Table compression works by eliminating duplicate values found within the same block. This feature, which is applicable to tables, individual table partitions, and materialized views, is very useful in data warehouse environments where data is mostly being read.

When rows of data are bulk loaded into a compressed table, a single copy of duplicate data found within the block is stored in a special data structure in the beginning of the block called a symbol table. All occurrences of the same value in the block are replaced with a pointer to the relevant spot in the symbol table holding that value.

Since all the information needed to reconstruct the data is available in the block, relatively small CPU overhead is involved. This CPU overhead is associated with compressing the data in the block as row data is being added to the block and uncompressing the data within the block once the compressed block is read into memory.

A potentially significant performance gain is achieved by performing less I/O, because the compressed segment fits into less space and fewer blocks need to be read and written.

Table compression offers a lot of flexibility. Not all blocks of a compressed table are necessarily compressed. Only data loaded into the table in a bulk load (direct path load) or block insert is eligible for compression. This means that if you alter a table to enable compression, the existing data will remain unchanged and compression applies to new data loaded via direct path or block insert.

To compress the data in an existing table, you can move it using the COMPRESS keyword. The following examples move an existing table and an existing partition in the same tablespace, compressing all existing data in the segment at the same time:

ALTER TABLE t1 MOVE COMPRESS;
ALTER TABLE t1 MOVE PARTITION p1 COMPRESS;

It should be noted that a table being moved in this way may not be updated during the operation. If this is not acceptable, the online definition utility (dbms_redefinition) can be used instead.

An alternative way to compress the data of an existing table is to copy the data into a new table or partition using the CREATE TABLE AS SELECT statement with the COMPRESS keyword. This method offers a way to achieve a better compression factor, as rows can be bulk inserted into the compression-enabled table pre-sorted on specific column values by specifying an ORDER BY clause in the SELECT statement.

To get the best results from compression, order the rows in a CREATE TABLE AS SELECT statement on a wide column that contains a lot of repetition of non-null values. To identify good candidate columns, collect optimizer statistics for the table and look at the column_name, num_distinct, num_nulls, and avg_col_len columns of the dictionary view dba_tab_columns.

The best candidate columns for sorting—in order to achieve maximum compression—would be those with the lowest number of distinct values (relative to the number of rows) representing a relatively large number of repetitions- and having a large average column length.

Compression should be avoided for tables with frequent updates, as there is some overhead in updating these tables, and a lot of update activities can cause space to be wasted. This defeats the purpose of compression.

There are some limitations with the table compression feature. Table compression cannot be used on LOB columns or index-organized tables. Existing uncompressed tables cannot have compression enabled if bitmap indexes are defined on them; the indexes must first be dropped and then re-created after compression is enabled.

Also, it is not possible to add or drop a column from a compressed table in Oracle 9i. (This limitation is lifted in Oracle 10g.) Attempting to add or drop a column from a table for which the compression feature is enabled causes Oracle to generate a misleading error message. (Refer to Meta link note 281472.1 for more information.) The workaround is to uncompress the table, add or drop the columns, and compress the table again.

Some of the best candidate tables for compression in a data warehouse environment are the historical tables and partitions storing a sliding window of time-specific data. These tables are populated from the tables in the staging area in which data is pre-loaded.

Index Key Compression

Beginning in Oracle 8i, indexes with concatenated keys can be compressed on a specified number of leading key columns. This type of compression, known as index key compression, reduces the storage overhead of repeated values and can result in a significant amount of space savings. Leading columns of index-organized tables can also be compressed, since they are a special type of index.

Index keys are grouped into two parts: a suffix entry made out of the unique piece and the prefix entry consisting of the grouping piece. As an over-simplified example, the entries of an index defined on three columns—Type, Color and Model—of the current year’s cars inventory table of a car dealer would be typically stored like this:

<SUV><Black><Rock Climber>
<SUV><Black><Jungle Cruiser>
<SUV><Black><Mountaineer>
<Sedan><Blue><Charisma>
<Sedan><Blue><Fantasy>
<Sedan><Blue><Starlet>
...

Key compression is achieved by sharing the prefix entries among the suffix entries within the same leaf block. Using key compression, the index values shown above would be stored like this:

<SUV><Black> <Rock Climber> <Jungle Cruiser> <Mountaineer>
<Sedan><Blue> <Charisma> <Fantasy> <Starlet>
...

When keys of a composite index with many repeated values are compressed, a larger number of entries can be stored in each index leaf block. Only non-partitioned indexes may be compressed, and bitmap indexes cannot be compressed.

As with table compression, there is a slight CPU overhead for reconstructing the key column values during an index scan. However, the advantages in compressing indexes are that they consume less space and can reduce the amount of physical I/O. They also increase the buffer cache efficiency simply because there are fewer index blocks to cache.

Index key compression is also ideal for a data warehouse for both space savings and improved query performance.

You can compress an index at creation time or during an index rebuild by using the COMPRESS keyword. For index key compression, the COMPRESS keyword takes an integer argument to specify the number of prefix columns to compress. With no argument, the default value—which is the number of key columns minus one—is used.

Statistics regarding the potential amount of space saved, based on the number of compressible prefix columns, is available in the opt_cmpr_count and opt_cmpr_pctsave columns of the index_stats view. This information is made available after validating an index with the VALIDATE INDEX statement. For example, after validating an index named indx1, index_stats might look like this:

SQL> SELECT name, opt_cmpr_count, opt_cmpr_pctsave
  2  FROM   index_stats;

NAME       OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
---------- -------------- ----------------
INDX1                   2               57

The information in this example indicates that if the first two columns of the index were compressed, a 57% space savings would be achieved.

Since the information in index_stats is overwritten by the next VALIDATE INDEX statement, its contents can be saved to a temporary table before the next index is validated. Code to do this might look like:

VALIDATE INDEX indx2;

INSERT INTO temp_table
SELECT name, opt_cmpr_count, opt_cmpr_pctsave 
FROM index_stats;

Reclaiming Unused Space

There is always some unused space in a data warehouse database that is allocated in such a way that it might never be used again. That is because in many cases once tables are populated and indexes are created, they may never grow again. The same thing is true for certain tablespaces when they are only designated for a period of time. When the period has lapsed, the tablespace will no longer be required for writing, and will soon be designated read-only.

In order to reclaim the unused space, some of the above techniques of rebuilding or reorganizing segments into more compact sizes or moving them around might be effective but not always practical. However, during the maintenance windows, a lot could be done to de-allocate and reclaim the unused space. Opportunities to reclaim unused space can be found at the segment level, tablespace level, and data file level.

One desirable time to reclaim space is just before changing the status of a tablespace to read-only. Once a tablespace is made read-only, none of the space de-allocation operations discussed in this section can be performed on it.

De-allocating Space at the Segment Level

The unused_space procedure in the dbms_space package returns information about the amount of unused space in a segment and the position of the high-water mark. All the space above the high-water mark of a segment is considered to be unused. Even if there are empty blocks below the high-water mark, they cannot be de-allocated without moving or rebuilding. But the space above the segment’s high-water mark can be de-allocated by using one of the following statements:

ALTER TABLE table DEALLOCATE UNUSED;
ALTER INDEX index DEALLOCATE UNUSED;
ALTER CLUSTER cluster DEALLOCATE UNUSED;

You can keep a specified amount of space above the high-water mark by using the KEEP keyword with the above statements. The KEEP keyword takes an argument that specifies the amount of space to be retained.

The other procedures in the dbms_space package return space usage information within a segment not limited to the space above the high-water mark. For example, the free_blocks procedure returns information about the number of blocks on the freelist groups for a segment, and the space_usage procedure returns information about the space usage of blocks under the high-water mark of the segment.

Beginning in Oracle 10g, you can shrink segments online. Oracle shrinks segments by moving rows in order to more densely pack data and lower the high-water mark. This is quite different from simply de-allocating unused space above the high-water mark. Online segment shrink is performed with the following statement:

ALTER TABLE table SHRINK SPACE [COMPACT];

To shrink a segment, Oracle first performs a series of INSERT and DELETE statements to move the segment data as close to the beginning of the segment as possible. This compacts the data. DML-compatible locks are held on individual rows and blocks of the segment during this phase. When compacting is complete, Oracle resets the high-water mark of the segment appropriately. At this point only, exclusive locks are held briefly on the data. Unused blocks (blocks above the new high-water mark) are de-allocated from the segment.

Row movement must be enabled and all table triggers based on ROWIDs must be disabled before shrinking a table. In a data warehouse environment, a number of tables for which table locking is not a problem can be identified as candidates for online shrinkage.

Since the high-water mark adjustment operation requires a brief exclusive lock, the optional COMPACT keyword can be used to compact the rows into fewer blocks without adjusting the high-water mark. This avoids the need for an exclusive lock on the table. At a later time, the statement can be re-issued without the COMPACT keyword in order to quickly reset the high-water mark.

De-allocating Space at the Tablespace Level

A tablespace can become fragmented a few different ways. The existence of segments with different extent sizes in one tablespace (not using the uniform extent sizing model) will cause fragmentation in the tablespace. Moving and rebuilding segments in and out of a tablespace also causes fragmentation by leaving empty extents in the middle of the tablespace.

Both of these situations represent wasted space that can be reclaimed. The most effective way to reclaim the space is to temporarily move the segments that have extents above some of the free extents to another tablespace (designated for this purpose). The Tablespace Map feature in Oracle Enterprise Manager produces a nice visual view of this situation. Once all the affected tables are relocated, they can be moved back to the original tablespace later. Once again, many tables can be identified as candidates for this operation.

De-allocating Space at the Data File Level

For tablespaces that will no longer have new segments—and their existing segments will not be subject to any growth—the free space in the tablespace can be forever wasted. However, some actions can be taken to reclaim their free but unusable space.

Oracle does not provide a statement for shrinking the size of a database file down to the last used block (the data file’s high-water mark). However, once you determine a data file’s high-water mark you can resize the data file to that or any larger size.

Resizing a data file can safely be done while the database is open by using the ALTER DATABASE DATAFILE RESIZE statement. Oracle will not allow you to shrink a data file below its high-water mark. By querying dba_extents you can easily identify a data file’s high-water mark. Here is a SQL*Plus script that queries dba_data_files and dba_extents and generates statements to shrink data files:

--
-- Create a table listing tablespaces whose data files are to
-- be considered for resizing.
--
CREATE GLOBAL TEMPORARY TABLE shrinking_tbs_gtt
ON COMMIT PRESERVE ROWS
AS
SELECT tablespace_name, block_size
FROM   dba_tablespaces
WHERE  tablespace_name IN ('TBS1', 'TBS2', 'TBS3');
--
-- Create a table to list data files to be resized.
--
CREATE GLOBAL TEMPORARY TABLE space_admin_gtt
ON COMMIT PRESERVE ROWS
AS
SELECT file_name, tablespace_name,
       bytes smallest, bytes currsize, bytes savings
FROM   dba_data_files
WHERE  1=0;
--
-- Determine the high-water mark for candidate data files.
--
INSERT INTO space_admin_gtt
(
file_name, tablespace_name, smallest, currsize, savings
)
SELECT a.file_name, a.tablespace_name,
       CEIL ((NVL (hwm, 1) * c.block_size) / 1024 / 1024 ) smallest,
       CEIL (a.blocks * c.block_size / 1024 / 1024) currsize,
       CEIL (a.blocks * c.block_size / 1024 / 1024) -
       CEIL ((NVL (hwm, 1) * c.block_size) / 1024 / 1024 ) savings
FROM   dba_data_files a,
       (
       SELECT   file_id, MAX (block_id + blocks - 1) hwm
       FROM     dba_extents
       GROUP BY file_id
       ) b,
       shrinking_tbs_gtt c
WHERE  a.file_id = b.file_id (+)
AND    a.tablespace_name = c.tablespace_name;
--
COMMIT;
--
-- Generate the ALTER DATABASE DATAFILE RESIZE commands to resize only those
-- data files having savings of 5 Mb or more.
--
COLUMN cmd FORMAT a95 WORD_WRAPPED
SET TRIMSPOOL ON
--
SELECT 'ALTER DATABASE DATAFILE ''' || file_name || ''' RESIZE ' ||
       TO_CHAR (smallest) || 'm;' cmd
FROM   space_admin_gtt
WHERE  savings >= 5;

Suspending Activity When Space Issues Arise

Beginning in Oracle 9i it is possible to direct Oracle to suspend a session when a space issue arises instead of rolling back the statement and giving an error. The session will resume automatically when the problem is resolved, or will fail with an error if the problem is not resolved within a certain amount of time. This feature is very beneficial for data warehouse environments, where a failed large load job would otherwise have to be rolled back and restarted from the beginning.

A user must have the RESUMABLE system privilege in order to use this feature. A user can then make their current session resumable by issuing the ALTER SESSION ENABLE RESUMABLE statement. If the session encounters a space problem, the session will be suspended and no error will be issued. Oracle will then execute an AFTER SUSPEND trigger if one is defined and enabled. One of the tasks this trigger might do is to automatically attempt to remedy the space problem.

If the space issue is resolved (by the AFTER SUSPEND trigger or otherwise) then the suspended session will resume normal operation and pick up where it left off. An error will be issued if the problem is not resolved within a specifiable timeout period.

The default timeout period is two hours (3600 seconds). It can be modified at the system level by setting the resumable_timeout instance parameter in the parameter file or via the ALTER SYSTEM statement. The timeout period can also be modified at the session level via the ALTER SESSION statement or a call to dbms_resumable.set_timeout.

Resumable operations include DML statements, DDL statements (such as index creation), queries that run out of temporary space, and load operations. Both the SQL*Loader and Import utilities have new directives to enable resumable sessions.

One caveat of the suspend feature is that a session does not issue any message to the user when it gets suspended. However, the suspension is logged in the alert log file. Also, the dba_resumable view indicates details of sessions that are suspended. The dbms_resumable package also provides functions for gathering more information about suspended sessions.

An AFTER SUSPEND trigger might be coded to automatically form an email with all obtained information and notify the user and page the DBA about the space problem. This will allow the DBA to take corrective action.

One restriction with this feature is that it does not support rollback segments that are located in dictionary managed tablespaces. Some restrictions also exist with remote operations and parallel executions.

Other Helpful Space-Related Features

Other Oracle features and facilities that could prove beneficial for managing and monitoring space usage in data warehouse environments include:

Conclusion

With the introduction of each new release, Oracle has been consistently offering new features to help with monitoring space-related issues and managing space usage.

Getting familiar with these new features and taking advantage of the benefits they provide should be part of each database administrator’s list of tasks. Managing space effectively enhances I/O and overall database performance, and it reduces space-related failures. Data warehouses, which rely heavily on space, can immediately benefit from space management and the many related feature offerings of the Oracle database.

About the Author

Hamid Minoui has been an Oracle DBA for over ten years. He has worked with Oracle technology in many industries including transportation and logistics, e-commerce, and insurance. Hamid has also been a volunteer with the Northern California Oracle Users Group for the last four years; he is currently NoCOUG's Training Day Cordinator. Hamid has given technical presentations at NoCOUG conferences, and frequently contributes material to the organization's quarterly publication. Hamid joined the Database Specialists team in 2005.

Still Looking for Help on this Subject?

Get a Consultation
We would be happy to talk with you about our services and how our senior-level database team might help you. Call Database Specialists at 415-344-0500 or 888-648-0500 or fill out a free consultation request form.
Complimentary Newsletter
If you'd like to receive our complimentary monthly newsletter with database tips and new white paper announcements, sign up for The Specialist.


Copyright © 2005 Database Specialists, Inc. http://www.dbspecialists.com