Document Type | Technical Information

Field | Administration

Applicable Product Version | 7FS02PS

Document Number | TADTI176

Overview

The Tibero Optimizer is the core engine inside the DBMS that finds the optimal (lowest cost) processing path to execute SQL as quickly and efficiently as possible, and it establishes the execution plan based on the plan with the lowest resource usage cost among executable plans. This method is called the Cost Based Optimizer (CBO) approach.

A lower cost means a more efficient execution plan can be established, and to accurately estimate the cost of an execution plan, information about all objects used by the SQL statement and performance data of the system's I/O, CPU, and memory are required. All object information, including tables and indexes, as well as system information, are called Optimizer statistics. To maintain appropriate performance, it is necessary to decide on an efficient time and method for collecting statistics. Understanding the characteristics of optimizer statistics and methods for collecting them is the most important factor in optimizing SQL execution.

Method

1. Understanding Statistics

1.1 Efficient Statistics Collection Policy

The primary method used to optimize execution plans is the collection of statistics information. However, the most important thing is the accuracy of the collected statistics information.

Even if the statistics information is recent, if it is inaccurate, incorrect execution plans may be established, which can negatively impact database performance. Therefore, database administrators (DBAs) must establish and manage efficient statistics collection policies to ensure the statistics information is accurate and stable.

The following are items that DBAs must manage to maintain the accuracy and stability of statistics information.

Check for the existence of old statistics collection dates

If the statistics collection date is old, performance issues may occur. This problem often occurs especially with partitions when values are inserted into newly added partitions but the statistics information is not updated.

Accuracy of collected statistics information

Even recent statistics information, if inaccurate, can negatively affect database performance.

Management of statistics collection policy

Statistics information must be managed accurately and stably based on an efficient statistics collection policy.

1.2 Considerations for Statistics Collection

Statistics information should be collected periodically while simultaneously improving accuracy and stability. The following factors should be considered for an efficient statistics collection policy.

Select a time with no load for faster collection.
Minimize the sample size collected to reduce impact. However, if accuracy and stability cannot be ensured, a full scan is recommended.
When performing a full scan, accuracy must be high to approximate the statistics values.
If data changes are minimal, consider stability to prevent statistics values from changing due to frequent collections.

1.3 Causes of Performance Degradation

To implement an efficient statistics collection policy, DBAs must understand the considerations for statistics collection to collect accurate and stable statistics. However, they must recognize that statistics collection can be a cause of performance degradation.

In most operational systems, after statistics collection, a sudden surge in hard parses occurs, causing library cache pin contention and resulting in system performance degradation.

System performance degradation often occurs in the following situations:

Statistics information is changed due to statistics collection.
When statistics change, related SQL cursors are invalidated.
When SQL cursors are invalidated, a hard parse occurs the next time the SQL is accessed.
During hard parse, an exclusive mode PP cache pin is acquired for the PP cache object.
Other sessions trying to access the PP cache object under hard parse wait for the PP cache pin event.

Since performance issues may arise due to cursor invalidation caused by statistics updates, control is applied using the no_invalidate option. Performance degradation may be caused by statistics collection, and there may also be other reasons such as incorrect statistics collection options.

1.4 Statistics Collection Strategy

While statistics collection is a means to improve performance, it can also cause performance degradation. Therefore, various considerations when collecting statistics must be based on policies suited to the system operated for each business.

In other words, DBAs must establish a statistics collection strategy, and Tibero proposes the following statistics collection strategies.

1) Establishing a Statistics Collection Policy is Essential

For objects where statistics collection is unnecessary, lock options are used to fix statistics information to prevent changes.
For some critical programs that should not be affected by statistics information, hints are applied to fix execution plans.
Always back up the most stable recent statistics information in case serious application performance issues occur due to statistics changes.

2) Column Histogram Collection

Histograms incur significant costs to collect and manage. Therefore, histograms should only be collected for necessary columns, primarily those frequently used in WHERE clauses with skewed data distributions.
When calculating the final selectivity after accessing a table, the selectivity of WHERE clause columns without indexes is also used as an input, and based on this selectivity, join order and join method with other sets are determined.

3) Balance Between Accuracy and Performance of Statistics

To collect accurate statistics, a large amount of data must be collected and analyzed, which can negatively impact database performance during collection. Conversely, collecting less data for performance considerations may result in inaccurate statistics.

4) Statistics Collection Frequency

Statistics collection frequency should be chosen based on data characteristics. For tables with frequently updated data or where partitions are added frequently at short intervals, set a short collection frequency or select appropriate collection options. Conversely, if data updates are infrequent, set a longer collection frequency.

5) Statistics Collection Level

Statistics collection levels are divided into all objects and partition-level. Global statistics for all objects target the entire table, so collecting global statistics in large-scale databases takes a lot of time and load. To shorten this, partition-level statistics collection ('PARTITION' granularity) can be considered. However, while partition-level collection shortens collection time, it may create execution plans with inaccurate Number of Distinct Values (NDV), potentially causing performance issues, so the collection level must be decided carefully.

When aggregating global statistics using partition-level statistics ('APPROX_GLOBAL AND PARTITION' granularity), approximate global-level statistics are derived from partition-level data, securing collection performance and statistics accuracy. However, it should be noted that this is still less accurate than statistics collected at the global level.

6) Sampling Rate

Increasing the sampling rate improves the accuracy of statistics but takes more time to collect. Conversely, lowering the sampling rate reduces accuracy somewhat but allows faster collection.

7) Column Data Distribution

If the data distribution of index column values is not uniform, it is important to generate histogram information for the column. If no histogram info exists, the optimizer decides whether to generate histograms and collects statistics for all columns. Usually, the optimizer assigns appropriate histogram buckets, but unnecessary histograms may be generated.

8) System Statistics

Periodic statistics collection and column histograms alone are insufficient for optimization, so system statistics features are used.

9) Automatic Statistics Collection

Automatic statistics collection can be efficient for most DB objects that are updated at regular intervals. However, it is inefficient for tables or objects that are deleted or truncated during operations and recreated, or when more than 10% of the total size is loaded in bulk.

Alternative Statistics Collection Strategies

- Statistics Setting

- Dynamic Sampling

- Incremental Global Statistics

1.5 Cost Based Optimizer (CBO)

Tibero adopts a cost-based optimizer that quantifies the expected time to execute a plan and optimizes based on the plan with the lowest cost.

When an SQL statement is executed in Tibero, the query optimizer considers various factors to select the most efficient execution plan. The execution plan greatly affects the SQL execution time, making it a very important factor.

In CBO, even the same query can have different execution plans depending on data characteristics, and it involves complex cost calculation principles, so precise knowledge is required.

Cost measurement includes I/O cost, which evaluates the number of I/O requests, and CPU cost, which converts CPU operations into time. To perform this evaluation, the optimizer estimates the cost at each execution step based on various statistics previously collected for tables, indexes, etc., and selects the execution plan with the lowest total cost. Therefore, maintaining statistics information that accurately reflects data status is essential to produce the optimal execution plan.

CBO Execution Steps
The optimizer creates plans for the SQL statement based on possible access paths and hints.
The optimizer estimates the cost of the plan based on data dictionary statistics about the storage characteristics and data distribution of tables, indexes, and partitions accessed by the SQL. The cost estimates the proportion of resources needed to execute the SQL, and the optimizer calculates the cost of access paths and join orders based on expected system resources including I/O, CPU, and memory.
The optimizer compares the costs of plans and selects the plan with the lowest cost.
Three Functions of Cost Measurement
When measuring the cost of a plan, the following methods are used to pass the SQL with the minimum cost to the plan generator for execution plan creation.

Search

Welcome to Tibero GTS!

Tibero Large-scale System Statistics Collection Guide - 1. Tibero Optimizer Statistics

Overview

Method

1. Understanding Statistics

1.1 Efficient Statistics Collection Policy

1.2 Considerations for Statistics Collection

1.3 Causes of Performance Degradation

1.4 Statistics Collection Strategy

1) Establishing a Statistics Collection Policy is Essential

2) Column Histogram Collection

3) Balance Between Accuracy and Performance of Statistics

4) Statistics Collection Frequency

5) Statistics Collection Level

6) Sampling Rate

7) Column Data Distribution

8) System Statistics

9) Automatic Statistics Collection

1.5 Cost Based Optimizer (CBO)

업무 외 시간 안내

Search

Welcome to Tibero GTS!

Overview

Method

1. Understanding Statistics

1.1 Efficient Statistics Collection Policy

1.2 Considerations for Statistics Collection

1.3 Causes of Performance Degradation

1.4 Statistics Collection Strategy

1) Establishing a Statistics Collection Policy is Essential

2) Column Histogram Collection

3) Balance Between Accuracy and Performance of Statistics

4) Statistics Collection Frequency

5) Statistics Collection Level

6) Sampling Rate

7) Column Data Distribution

8) System Statistics

9) Automatic Statistics Collection

1.5 Cost Based Optimizer (CBO)