Document Type | Technical Information
Category | Administration
Applicable Product Version | 7FS02PS
Document Number | TADTI179
Overview
This document explains the method for collecting Partition Table statistics.
Optimizer statistics represent detailed information about the Database and Objects, and the query Optimizer uses statistics to select the best execution plan for SQL statements.
Method
1. Partition Table Statistics
In an operating system where numerous log-type and statistical tables have partitions added or split daily or monthly, it is difficult to determine the timing for collecting statistics, and in large-scale systems, the time burden for statistics collection is significant.
This section explains the level of statistics collection as a method to reduce the statistics collection load in large-scale systems. It also describes strategies and utilization of statistics information in systems where partitions are added or split periodically, such as daily or monthly.
1.1 Statistics Level by Query Condition
In PARTITION, either Global statistics or partition statistics are used depending on the SQL condition. Except when the partition key is queried as a constant, Global statistics are used; therefore, collecting Global statistics is preferable for accurate execution plans.
- Create Table Schema
SQL> CREATE TABLE TEST_TABLE (A NUMBER, B NUMBER, C CHAR(2000))
PARTITION BY RANGE (A)
(PARTITION P1 VALUES LESS THAN (10000),
PARTITION P2 VALUES LESS THAN (40000),
PARTITION P3 VALUES LESS THAN (MAXVALUE));
SQL> INSERT INTO TEST_TABLE SELECT LEVEL,MOD(LEVEL,10),LPAD(LEVEL,10,'0')
FROM DUAL CONNECT BY LEVEL<=100000;
SQL> commit;
SQL> CREATE INDEX TEST_INDEX ON TEST_TABLE(A,B) LOCAL;
SQL> BEGIN
DBMS_STATS.GATHER_TABLE_STATS ('TEST_USER','TEST_TABLE',GRANULARITY='ALL');
END;
/- Select Global or partition statistics differently depending on the query condition.
Statistics Level - GLOBAL Statistics
- When no condition clause is used
SQL> SELECT * FROM TEST_TABLE; Execution Plan -------------------------------------------------------------------------------- 1 PARTITION RANGE (ALL PART) (Cost:14037, %CPU:0, Rows:100090) (PS:1, PE:3) 2 TABLE ACCESS (FULL): TEST_TABLE (Cost:14037, %CPU:0, Rows:100090)
- When partition key condition is not used
SQL> SELECT * FROM TEST_TABLE WHERE B=3; Execution Plan -------------------------------------------------------------------------------- 1 PARTITION RANGE (ALL PART) (Cost:14037, %CPU:0, Rows:100090) (PS:1, PE:3) 2 TABLE ACCESS (FULL): TEST_TABLE (Cost:14037, %CPU:0, Rows:100090
- When two or more partition key conditions are used
SQL> SELECT * FROM TEST_TABLE WHERE A<10 OR A>50000; Execution Plan -------------------------------------------------------------------------------- 1 PARTITION RANGE (ALL PART) (Cost:14037, %CPU:0, Rows:50049) (PS:1, PE:3) 2 TABLE ACCESS (FULL): TEST_TABLE (Cost:14037, %CPU:0, Rows:50049)
- When a single partition condition is used but with bind variables
SQL> SELECT * FROM TEST_TABLE WHERE A=:x; Execution Plan -------------------------------------------------------------------------------- 1 PARTITION RANGE (SINGLE PART) (Cost:3, %CPU:0, Rows:1) (PS:KEY, PE:KEY) 2 TABLE ACCESS (ROWID) LOCAL: TEST_TABLE (Cost:3, %CPU:0, Rows:1) 3 INDEX (RANGE SCAN): TEST_INDEX (Cost:2, %CPU:0, Rows:1)
- Check statistics information after collecting Global statistics
SQL> BEGIN
DBMS_STATS.DELETE_TABLE_STATS ('TEST_USER','TEST_TABLE');
DBMS_STATS.GATHER_TABLE_STATS ('TEST_USER','TEST_TABLE',GRANULARITY='GLOBAL');
END;
/
SQL> SELECT NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED FROM USER_TABLES WHERE
TABLE_NAME='TEST_TABLE';
NUM_ROWS BLOCKS AVG_ROW_LEN LAST_ANALYZED
--------- --------- ------------ --------------------------------
990030 33920 2005.787=279 2025/12/05
SQL> SELECT NUM_ROWS,BLOCKS,LAST_ANALYZED FROM USER_TAB_PARTITIONS WHERE
TABLE_NAME='TEST_TABLE';
NUM_ROWS BLOCKS LAST_ANALYZED
--------- --------- --------------
0 0
0 0
0 0Statistics Level - PARTITION Statistics
- Single partition key condition
SQL> SELECT * FROM TEST_TABLE WHERE A=100; Execution Plan -------------------------------------------------------------------------------- 1 TABLE ACCESS (ROWID): TEST_TABLE (Cost:3, %CPU:0, Rows:1) 2 PARTITION RANGE (SINGLE PART) (Cost:2, %CPU:0, Rows:1) (PS:1, PE:1) 3 INDEX (RANGE SCAN): TEST_INDEX (Cost:2, %CPU:0, Rows:1)
- Using multiple partition key conditions expressing a closed range
SQL> SELECT * FROM TEST_TABLE WHERE A>100 AND A<35000; Execution Plan -------------------------------------------------------------------------------- 1 PARTITION RANGE (SUBSET PART) (Cost:5669, %CPU:0, Rows:34993) (PS:1, PE:2) 2 TABLE ACCESS (FULL): TEST_TABLE (Cost:5669, %CPU:0, Rows:34993
- Global statistics are used except when the partition key is queried as a constant, so collecting Global statistics is preferable for accurate execution plans.
- Example of checking statistics information after collecting partition statistics
SQL> BEGIN
DBMS_STATS.DELETE_TABLE_STATS ('TEST_USER','TEST_TABLE');
DBMS_STATS.GATHER_TABLE_STATS ('TEST_USER','TEST_TABLE',GRANULARITY='PARTITION');
END;
/
SQL> SELECT NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED FROM USER_TABLES WHERE
TABLE_NAME='TEST_TABLE';
NUM_ROWS BLOCKS AVG_ROW_LEN LAST_ANALYZED
---------- ---------- ----------- -------------------------------
0 0 0
SQL> SELECT NUM_ROWS,BLOCKS,LAST_ANALYZED FROM USER_TAB_PARTITIONS WHERE
TABLE_NAME='TEST_TABLE';
NUM_ROWS BLOCKS LAST_ANALYZED
--------- --------- -------------
10040 3456 2025/12/05
30320 10240 2025/12/05
59950 20224 2025/12/05
1.2 Whether to Collect Global Statistics
For tables with large volumes of data, collecting Global statistics can be time-consuming and resource-intensive, so sometimes only partition statistics are collected.
Tibero provides a feature to reduce the load of collecting Global statistics by deriving Global statistics values from Partition-level statistics. This means that instead of scanning the entire table for Global statistics, the Global statistics are updated using information collected at the Partition-level.
However, since this is an estimate, if the date of the last Global statistics collection is old, Global statistics should be collected again.
1.3 Collecting Global and Partition Level Statistics
The following explains the principles of collecting Global and Partition statistics. Choose the appropriate method when collecting statistics.
- Collect all statistics at Global and Partition-levels
By using the granularity option 'ALL', statistics at both Partition-level and Global level are collected simultaneously.
Because the entire PARTITION is FULL SCANned twice, it takes a lot of time, but this method is chosen when initially generating statistics information.
- Collect only Partition-level statistics and update Global statistics based on Partition-level statistics
When the target table for statistics collection is very large and collecting Global statistics takes a long time, collect Partition-level statistics. At this time, min, max, distinct counts, and histogram information for partition key columns are updated, improving the accuracy of collected statistics.
Although this may cause the execution plan to be somewhat less accurate than when collecting Global statistics, it has the advantage of not requiring a long time for statistics collection.
- Collection time for Global and Partition statistics
The following is an example of collection time for Global and Partition statistics.
-- When collecting all statistics at Global and Partition levels SQL> /* Collect Partition-level statistics (granularity='PARTITION', no_invalidate= FALSE) */ BEGIN DBMS_STATS.DELETE_TABLE_STATS (OWNNAME=USER,TABNAME='TEST_TABLE' ); DBMS_STATS.GATHER_TABLE_STATS (OWNNAME='TEST_USER',TABNAME='TEST_TABLE', PARTNAME=NULL, NO_INVALIDATE=FALSE, GRANULARITY='PARTITION'); END; / -- Check Global and Partition statistics collection SQL> SELECT NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED FROM USER_TABLES WHERE TABLE_NAME='TEST_TABLE'; SQL> SELECT NUM_ROWS,BLOCKS,LAST_ANALYZED FROM USER_TAB_PARTITIONS WHERE TABLE_NAME='TEST_TABLE'; SQL> /* Collect Global Partition-level statistics (granularity='GLOBAL', estimate_percent = 10, degree = 4,no_invalidate= FALSE) Using sampling 10% and Parallel=4 options to reduce time */ BEGIN DBMS_STATS.DELETE_TABLE_STATS (OWNNAME=USER,TABNAME='TEST_TABLE' ); DBMS_STATS.GATHER_TABLE_STATS (OWNNAME='TEST_USER',TABNAME='TEST_TABLE', PARTNAME=NULL,ESTIMATE_PERCENT=10, DEGREE=4,NO_INVALIDATE=FALSE, GRANULARITY='GLOBAL'); END; / -- Check Global and Partition statistics collection SQL> SELECT NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED FROM USER_TABLES WHERE TABLE_NAME='TEST_TABLE'; SQL> SELECT NUM_ROWS,BLOCKS,LAST_ANALYZED FROM USER_TAB_PARTITIONS WHERE TABLE_NAME='TEST_TABLE';
1.4 Tables with Partitions Added Periodically
Since days or months are used as Range Partition Key Column values and partitions continuously increase at regular intervals, the range of key column values changes daily, causing statistics information to fail to follow the actual data distribution. This results in inaccurate selectivity and cardinality estimates for query conditions.
The following is an example explaining statistics collection in systems where partitions are added or split periodically, such as daily or monthly, using schema formats.
- Create a Partition table.
SQL> @create_table_partition.sql
- Insert test data.
SQL> BEGIN
FOR i IN 1..20 LOOP
INSERT INTO DAILY_LOG
SELECT '202501'||LPAD(i,2,'0'), LEVEL, LEVEL, LEVEL, LEVEL, LEVEL
FROM DUAL CONNECT BY LEVEL <= 10000;
COMMIT;
END LOOP;
END;- Collect statistics information.
SQL> CALL DBMS_STATS.GATHER_TABLE_STATS (USER, 'DAILY_LOG', granularity=>'ALL');
- Add a new partition and insert data. (add_partition.sql)
The table uses a column representing date values as a string for the partition key, and since a new partition is added daily with the current date value, the maximum value of the key column increases daily.
SQL> ALTER TABLE DAILY_LOG ADD PARITTION P20250121 VALUES LESS TAHN ('20250122');
SQL> INSERT INTO DAILY_LOG SELECT '20250121',1,1,1,1, FROM DUAL CONNECT BY LEVEL <=10000;
SQL> COMMIT;In the above example, each partition contains 10,000 rows. The table has partitions from P20250101 to P20250120, statistics collection has been completed, and then the P20250121 partition was added with data inserted. Therefore, since statistics collection has not been done for P20250121, the statistics information for the LOG_DAY column will have a maximum value of '20250120'.
- Check the Plan.
- Result of checking the plan for 20250120 with statistics information (view_partition_plan.sql)
SQL> @view_partition_plan.sql
SQL> set autotrace traceonly exp planstat
SQL> SELECT * FROM DAILY_LOG WHERE LOG_DAY='20250120';
Execution Plan
--------------------------------------------------------------------------------
1 PARTITION RANGE (SINGLE PART) (Cost:4240, %CPU:0, Rows:9944) (PS:20, PE:20)
2
TABLE ACCESS (FULL): DAILY_LOG (Cost:4240, %CPU:0, Rows:9944)
Predicate Information
---------------------------------------------------------------------------
2 - filter: ("DAILY_LOG"."LOG_DAY" = '20250120') (0.050)
Execution Stat
--------------------------------------------------------------------------------
1 PARTITION RANGE (SINGLE PART) (Time:.02 ms, Rows:10000, CR:0, CU:0, Starts:1)
2
TABLE ACCESS (FULL): DAILY_LOG (Time:108.34 ms, Rows:10000, CR:10145, CU:203,
Starts:1
SQL> set autotrace off
Since the maximum value in the histogram for the LOG_DAY column is '20230120', the cardinality is predicted similarly, resulting in a correct (FULL) plan.
- Result of checking the plan for 20250121 without statistics information (view_partition_plan.sql)
SQL> @view_partition_plan.sql
SQL> set autotrace traceonly exp planstat
SQL> SELECT * FROM DAILY_LOG WHERE LOG_DAY='20250121';
Execution Plan
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): DAILY_LOG (Cost:3, %CPU:0, Rows:1)
2
PARTITION RANGE (SINGLE PART) (Cost:2, %CPU:0, Rows:1) (PS:21, PE:21)
3
INDEX (RANGE SCAN): DAILY_LOG_IDX (Cost:2, %CPU:0, Rows:1)
Predicate Information
---------------------------------------------------------------------------
2 - filter: ("DAILY_LOG"."LOG_DAY" = '20250121') (0.000)
Execution Stat
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): DAILY_LOG (Time:2582.04 ms, Rows:10000, CR:10000, CU:0,
Starts:1)
2
PARTITION RANGE (SINGLE PART) (Time:0. ms, Rows:10000, CR:0, CU:0, Starts:1)
3 INDEX (RANGE SCAN): DAILY_LOG_IDX (Time:12.49 ms, Rows:10000, CR:369, CU:0,
Starts:1)
SQL> set autotrace offFor the date '20230121' which exceeds the maximum value in the histogram, an incorrect plan (RANGE SCAN) is generated, causing slower query performance. In this table with daily partitions, the query speed difference may not be noticeable, but with hourly partitions, the statistics information will deviate more quickly.
1.5 Large-Scale Partition Statistics Collection
In large-scale partition environments, the automatic statistics collection feature provided by Tibero is not recommended. Instead, strategic statistics collection and utilization tailored to the operational system environment are required.
The following explains the impact of the Granularity value on partition statistics collection performance.
Large-Scale Partitions and Granularity
In large-scale systems, some tables may have thousands of partitions and sizes of several terabytes. Even sampling less than 1% results in sampling several gigabytes of data, and collecting statistics for each partition means thousands of samplings, which cannot be ignored in terms of time. If there are several local indexes, even more work is generated.
The following explains how many samplings and collection processes occur when creating local indexes on the LOG_DAY partition from the example in โ2.2.4. Tables with Partitions Added Periodicallyโ and setting Granularity to the default ALL during collection.
- Create Indexes (create_index.sql)
SQL> CREATE INDEX DAILY_LOG_IDX_01 ON DAILY_LOG (LOG_DAY) LOCAL; SQL> CREATE INDEX DAILY_LOG_IDX_02 ON DAILY_LOG (LOG_DAY, C1) LOCAL; SQL> CREATE INDEX DAILY_LOG_IDX_03 ON DAILY_LOG (LOG_DAY, C1, C2) LOCAL; SQL> CREATE INDEX DAILY_LOG_IDX_04 ON DAILY_LOG (LOG_DAY, C3) LOCAL;
Considering leap years 2024 and 2028, the number of partitions is 365 x 10 + 2 = 3562.
- Statistics Collection
The order of statistics collection is as follows. If the number of partitions is 3562 as above, the number of samplings can be calculated.
a. Collect for Table: 1 time
b. Collect for Table partitions: 3,562 times
c. Collect for each Index: 4 times
d. Collect for each Index partition: 3,562 * 4 times
In total, 17,815 samplings and collections are performed. If options like 'FOR ALL COLUMNS SIZE 200' are used to collect histograms precisely, sampling is done for each column.
Even if ESTIMATE_PERCENT is low, performing 17,815 samplings can take hours or days. For partitions increasing by unit, spending this much time on one table is a very inefficient collection method.
Note
A strategy considering operational environment characteristics is necessary when collecting statistics in large-scale systems. For features usable in such environments, refer to "Advanced Features of Statistics".