Document Type | Technical Information

Category | Administration

Applicable Product Version | 7FS02PS

Document Number | TADTI179

Overview

This document explains the method for collecting Partition Table statistics.

Optimizer statistics represent detailed information about the Database and Objects, and the query Optimizer uses statistics to select the best execution plan for SQL statements.

Method

1. Partition Table Statistics

In an operating system where numerous log-type and statistical tables have partitions added or split daily or monthly, it is difficult to determine the timing for collecting statistics, and in large-scale systems, the time burden for statistics collection is significant.

This section explains the level of statistics collection as a method to reduce the statistics collection load in large-scale systems. It also describes strategies and utilization of statistics information in systems where partitions are added or split periodically, such as daily or monthly.

1.1 Statistics Level by Query Condition

In PARTITION, either Global statistics or partition statistics are used depending on the SQL condition. Except when the partition key is queried as a constant, Global statistics are used; therefore, collecting Global statistics is preferable for accurate execution plans.

Create Table Schema

SQL> CREATE TABLE TEST_TABLE (A NUMBER, B NUMBER, C CHAR(2000))
 PARTITION BY RANGE (A)
 (PARTITION P1 VALUES LESS THAN (10000),
 PARTITION P2 VALUES LESS THAN (40000),
 PARTITION P3 VALUES LESS THAN (MAXVALUE));
 
SQL> INSERT INTO TEST_TABLE SELECT LEVEL,MOD(LEVEL,10),LPAD(LEVEL,10,'0')
 FROM DUAL CONNECT BY LEVEL<=100000;

SQL> commit;

SQL>  CREATE INDEX TEST_INDEX ON TEST_TABLE(A,B) LOCAL;

SQL>  BEGIN
 DBMS_STATS.GATHER_TABLE_STATS ('TEST_USER','TEST_TABLE',GRANULARITY='ALL');
 END;
 /

Select Global or partition statistics differently depending on the query condition.

Statistics Level - GLOBAL Statistics

When no condition clause is used

SQL> SELECT * FROM TEST_TABLE;

Execution Plan
--------------------------------------------------------------------------------
1 PARTITION RANGE (ALL PART) (Cost:14037, %CPU:0, Rows:100090) (PS:1, PE:3)
2    TABLE ACCESS (FULL): TEST_TABLE (Cost:14037, %CPU:0, Rows:100090)

When partition key condition is not used

SQL> SELECT * FROM TEST_TABLE WHERE B=3;

Execution Plan
--------------------------------------------------------------------------------
1 PARTITION RANGE (ALL PART) (Cost:14037, %CPU:0, Rows:100090) (PS:1, PE:3)
2   
 TABLE ACCESS (FULL): TEST_TABLE (Cost:14037, %CPU:0, Rows:100090

When two or more partition key conditions are used

SQL> SELECT * FROM TEST_TABLE WHERE A<10 OR A>50000;

Execution Plan
--------------------------------------------------------------------------------
1 PARTITION RANGE (ALL PART) (Cost:14037, %CPU:0, Rows:50049) (PS:1, PE:3)
2   
 TABLE ACCESS (FULL): TEST_TABLE (Cost:14037, %CPU:0, Rows:50049)

When a single partition condition is used but with bind variables

SQL> SELECT * FROM TEST_TABLE WHERE A=:x;
   
Execution Plan
--------------------------------------------------------------------------------
1 PARTITION RANGE (SINGLE PART) (Cost:3, %CPU:0, Rows:1) (PS:KEY, PE:KEY)
2    TABLE ACCESS (ROWID) LOCAL: TEST_TABLE (Cost:3, %CPU:0, Rows:1)
3         INDEX (RANGE SCAN): TEST_INDEX (Cost:2, %CPU:0, Rows:1)

Check statistics information after collecting Global statistics

SQL> BEGIN
 DBMS_STATS.DELETE_TABLE_STATS ('TEST_USER','TEST_TABLE');
 DBMS_STATS.GATHER_TABLE_STATS ('TEST_USER','TEST_TABLE',GRANULARITY='GLOBAL');
 END;
 /
 
SQL> SELECT NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED FROM USER_TABLES WHERE 
TABLE_NAME='TEST_TABLE';

 NUM_ROWS  BLOCKS    AVG_ROW_LEN  LAST_ANALYZED
 --------- --------- ------------ --------------------------------
 990030    33920     2005.787=279 2025/12/05
 
 SQL> SELECT NUM_ROWS,BLOCKS,LAST_ANALYZED FROM USER_TAB_PARTITIONS WHERE 
TABLE_NAME='TEST_TABLE';

 NUM_ROWS   BLOCKS   LAST_ANALYZED
 --------- --------- --------------
 0          0
 0          0 
 0          0

Statistics Level - PARTITION Statistics

Single partition key condition

SQL> SELECT * FROM TEST_TABLE WHERE A=100;
 
Execution Plan
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): TEST_TABLE (Cost:3, %CPU:0, Rows:1)
2   PARTITION RANGE (SINGLE PART) (Cost:2, %CPU:0, Rows:1) (PS:1, PE:1)
3     INDEX (RANGE SCAN): TEST_INDEX (Cost:2, %CPU:0, Rows:1)

Using multiple partition key conditions expressing a closed range

SQL> SELECT * FROM TEST_TABLE WHERE A>100 AND A<35000;

Execution Plan
--------------------------------------------------------------------------------
1 PARTITION RANGE (SUBSET PART) (Cost:5669, %CPU:0, Rows:34993) (PS:1, PE:2)
2   
TABLE ACCESS (FULL): TEST_TABLE (Cost:5669, %CPU:0, Rows:34993

Global statistics are used except when the partition key is queried as a constant, so collecting Global statistics is preferable for accurate execution plans.
Example of checking statistics information after collecting partition statistics

SQL> BEGIN
 DBMS_STATS.DELETE_TABLE_STATS ('TEST_USER','TEST_TABLE');
 DBMS_STATS.GATHER_TABLE_STATS ('TEST_USER','TEST_TABLE',GRANULARITY='PARTITION');
 END;
 /
 
SQL> SELECT NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED FROM USER_TABLES WHERE 
TABLE_NAME='TEST_TABLE';

NUM_ROWS   BLOCKS     AVG_ROW_LEN LAST_ANALYZED
---------- ---------- ----------- -------------------------------
0          0           0

SQL> SELECT NUM_ROWS,BLOCKS,LAST_ANALYZED FROM USER_TAB_PARTITIONS WHERE 
TABLE_NAME='TEST_TABLE';

NUM_ROWS   BLOCKS   LAST_ANALYZED
--------- --------- ------------- 
10040      3456      2025/12/05
30320      10240     2025/12/05
59950      20224     2025/12/05

1.2 Whether to Collect Global Statistics

For tables with large volumes of data, collecting Global statistics can be time-consuming and resource-intensive, so sometimes only partition statistics are collected.

Tibero provides a feature to reduce the load of collecting Global statistics by deriving Global statistics values from Partition-level statistics. This means that instead of scanning the entire table for Global statistics, the Global statistics are updated using information collected at the Partition-level.

However, since this is an estimate, if the date of the last Global statistics collection is old, Global statistics should be collected again.

1.3 Collecting Global and Partition Level Statistics

The following explains the principles of collecting Global and Partition statistics. Choose the appropriate method when collecting statistics.

Collect all statistics at Global and Partition-levels

By using the granularity option 'ALL', statistics at both Partition-level and Global level are collected simultaneously.

Because the entire PARTITION is FULL SCANned twice, it takes a lot of time, but this method is chosen when initially generating statistics information.

Collect only Partition-level statistics and update Global statistics based on Partition-level statistics

When the target table for statistics collection is very large and collecting Global statistics takes a long time, collect Partition-level statistics. At this time, min, max, distinct counts, and histogram information for partition key columns are updated, improving the accuracy of collected statistics.

Although this may cause the execution plan to be somewhat less accurate than when collecting Global statistics, it has the advantage of not requiring a long time for statistics collection.

Collection time for Global and Partition statistics

The following is an example of collection time for Global and Partition statistics.

-- When collecting all statistics at Global and Partition levels

SQL> /* Collect Partition-level statistics
(granularity='PARTITION', no_invalidate= FALSE) */
BEGIN
 DBMS_STATS.DELETE_TABLE_STATS (OWNNAME=USER,TABNAME='TEST_TABLE' );
 DBMS_STATS.GATHER_TABLE_STATS
 (OWNNAME='TEST_USER',TABNAME='TEST_TABLE',
 PARTNAME=NULL, NO_INVALIDATE=FALSE,
 GRANULARITY='PARTITION');
 END;
 /

-- Check Global and Partition statistics collection
SQL> SELECT NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED
  FROM USER_TABLES WHERE TABLE_NAME='TEST_TABLE';

SQL> SELECT NUM_ROWS,BLOCKS,LAST_ANALYZED
 FROM USER_TAB_PARTITIONS WHERE TABLE_NAME='TEST_TABLE';
 
SQL>  /* Collect Global Partition-level statistics
(granularity='GLOBAL', estimate_percent = 10, degree = 4,no_invalidate= FALSE)
Using sampling 10% and Parallel=4 options to reduce time */
 BEGIN
 DBMS_STATS.DELETE_TABLE_STATS (OWNNAME=USER,TABNAME='TEST_TABLE' );
 DBMS_STATS.GATHER_TABLE_STATS
 (OWNNAME='TEST_USER',TABNAME='TEST_TABLE',
 PARTNAME=NULL,ESTIMATE_PERCENT=10,
 DEGREE=4,NO_INVALIDATE=FALSE,
 GRANULARITY='GLOBAL');
 END;
 /
 
 -- Check Global and Partition statistics collection
SQL> SELECT NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED
 FROM USER_TABLES WHERE TABLE_NAME='TEST_TABLE';

 SQL> SELECT NUM_ROWS,BLOCKS,LAST_ANALYZED
 FROM USER_TAB_PARTITIONS WHERE TABLE_NAME='TEST_TABLE';

1.4 Tables with Partitions Added Periodically

Since days or months are used as Range Partition Key Column values and partitions continuously increase at regular intervals, the range of key column values changes daily, causing statistics information to fail to follow the actual data distribution. This results in inaccurate selectivity and cardinality estimates for query conditions.

The following is an example explaining statistics collection in systems where partitions are added or split periodically, such as daily or monthly, using schema formats.

Create a Partition table.

SQL> @create_table_partition.sql

Insert test data.

SQL> BEGIN
 FOR i IN 1..20 LOOP
   INSERT INTO DAILY_LOG
     SELECT '202501'||LPAD(i,2,'0'), LEVEL, LEVEL, LEVEL, LEVEL, LEVEL
     FROM DUAL CONNECT BY LEVEL <= 10000;
     COMMIT;
   END LOOP;
 END;

Collect statistics information.

SQL> CALL DBMS_STATS.GATHER_TABLE_STATS (USER, 'DAILY_LOG', granularity=>'ALL');

Add a new partition and insert data. (add_partition.sql)

The table uses a column representing date values as a string for the partition key, and since a new partition is added daily with the current date value, the maximum value of the key column increases daily.

SQL> ALTER TABLE DAILY_LOG ADD PARITTION P20250121 VALUES LESS TAHN ('20250122');
SQL> INSERT INTO DAILY_LOG SELECT '20250121',1,1,1,1, FROM DUAL CONNECT BY LEVEL <=10000;
SQL> COMMIT;

In the above example, each partition contains 10,000 rows. The table has partitions from P20250101 to P20250120, statistics collection has been completed, and then the P20250121 partition was added with data inserted. Therefore, since statistics collection has not been done for P20250121, the statistics information for the LOG_DAY column will have a maximum value of '20250120'.

Check the Plan.
Result of checking the plan for 20250120 with statistics information (view_partition_plan.sql)

SQL> @view_partition_plan.sql

SQL> set autotrace traceonly exp planstat
SQL> SELECT * FROM DAILY_LOG WHERE LOG_DAY='20250120';
 
Execution Plan
--------------------------------------------------------------------------------
1 PARTITION RANGE (SINGLE PART) (Cost:4240, %CPU:0, Rows:9944) (PS:20, PE:20)
 2   
TABLE ACCESS (FULL): DAILY_LOG (Cost:4240, %CPU:0, Rows:9944)

Predicate Information
---------------------------------------------------------------------------
2 - filter: ("DAILY_LOG"."LOG_DAY" = '20250120') (0.050)

Execution Stat
--------------------------------------------------------------------------------
1 PARTITION RANGE (SINGLE PART) (Time:.02 ms, Rows:10000, CR:0, CU:0, Starts:1)
 2   
TABLE ACCESS (FULL): DAILY_LOG (Time:108.34 ms, Rows:10000, CR:10145, CU:203, 
Starts:1

SQL> set autotrace off

Since the maximum value in the histogram for the LOG_DAY column is '20230120', the cardinality is predicted similarly, resulting in a correct (FULL) plan.

Result of checking the plan for 20250121 without statistics information (view_partition_plan.sql)

SQL>  @view_partition_plan.sql
SQL> set autotrace traceonly exp planstat
SQL> SELECT * FROM DAILY_LOG WHERE LOG_DAY='20250121';
 
Execution Plan
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): DAILY_LOG (Cost:3, %CPU:0, Rows:1)
2   
PARTITION RANGE (SINGLE PART) (Cost:2, %CPU:0, Rows:1) (PS:21, PE:21)
3     
  
INDEX (RANGE SCAN): DAILY_LOG_IDX (Cost:2, %CPU:0, Rows:1)

Predicate Information
---------------------------------------------------------------------------
2 - filter: ("DAILY_LOG"."LOG_DAY" = '20250121') (0.000)

Execution Stat
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): DAILY_LOG (Time:2582.04 ms, Rows:10000, CR:10000, CU:0, 
Starts:1)
2   
PARTITION RANGE (SINGLE PART) (Time:0. ms, Rows:10000, CR:0, CU:0, Starts:1)
3       INDEX (RANGE SCAN): DAILY_LOG_IDX (Time:12.49 ms, Rows:10000, CR:369, CU:0, 
Starts:1)

SQL> set autotrace off

For the date '20230121' which exceeds the maximum value in the histogram, an incorrect plan (RANGE SCAN) is generated, causing slower query performance. In this table with daily partitions, the query speed difference may not be noticeable, but with hourly partitions, the statistics information will deviate more quickly.

1.5 Large-Scale Partition Statistics Collection

In large-scale partition environments, the automatic statistics collection feature provided by Tibero is not recommended. Instead, strategic statistics collection and utilization tailored to the operational system environment are required.

The following explains the impact of the Granularity value on partition statistics collection performance.

Large-Scale Partitions and Granularity

In large-scale systems, some tables may have thousands of partitions and sizes of several terabytes. Even sampling less than 1% results in sampling several gigabytes of data, and collecting statistics for each partition means thousands of samplings, which cannot be ignored in terms of time. If there are several local indexes, even more work is generated.

The following explains how many samplings and collection processes occur when creating local indexes on the LOG_DAY partition from the example in “2.2.4. Tables with Partitions Added Periodically” and setting Granularity to the default ALL during collection.

Create Indexes (create_index.sql)

SQL> CREATE INDEX DAILY_LOG_IDX_01 ON DAILY_LOG (LOG_DAY) LOCAL;
SQL> CREATE INDEX DAILY_LOG_IDX_02 ON DAILY_LOG (LOG_DAY, C1) LOCAL;
SQL> CREATE INDEX DAILY_LOG_IDX_03 ON DAILY_LOG (LOG_DAY, C1, C2) LOCAL;
SQL> CREATE INDEX DAILY_LOG_IDX_04 ON DAILY_LOG (LOG_DAY, C3) LOCAL;

Considering leap years 2024 and 2028, the number of partitions is 365 x 10 + 2 = 3562.

Statistics Collection

The order of statistics collection is as follows. If the number of partitions is 3562 as above, the number of samplings can be calculated.

a. Collect for Table: 1 time

b. Collect for Table partitions: 3,562 times

c. Collect for each Index: 4 times

d. Collect for each Index partition: 3,562 * 4 times

In total, 17,815 samplings and collections are performed. If options like 'FOR ALL COLUMNS SIZE 200' are used to collect histograms precisely, sampling is done for each column.

Even if ESTIMATE_PERCENT is low, performing 17,815 samplings can take hours or days. For partitions increasing by unit, spending this much time on one table is a very inefficient collection method.

Note
A strategy considering operational environment characteristics is necessary when collecting statistics in large-scale systems. For features usable in such environments, refer to "Advanced Features of Statistics".

Related to

Search

Welcome to Tibero GTS!

Tibero Large-Scale System Statistics Collection Guide - 4. Partition Table Statistics Collection

Overview

Method

1. Partition Table Statistics

1.1 Statistics Level by Query Condition

Statistics Level - GLOBAL Statistics

Statistics Level - PARTITION Statistics

1.2 Whether to Collect Global Statistics

1.3 Collecting Global and Partition Level Statistics

1.4 Tables with Partitions Added Periodically

1.5 Large-Scale Partition Statistics Collection

Large-Scale Partitions and Granularity

업무 외 시간 안내

Search

Welcome to Tibero GTS!

Overview

Method

1. Partition Table Statistics

1.1 Statistics Level by Query Condition

Statistics Level - GLOBAL Statistics

Statistics Level - PARTITION Statistics

1.2 Whether to Collect Global Statistics

1.3 Collecting Global and Partition Level Statistics

1.4 Tables with Partitions Added Periodically

1.5 Large-Scale Partition Statistics Collection

Large-Scale Partitions and Granularity