Document Type | Technical Information

Category | Administration

Applicable Product Version | 7FS02PS

Document Number | TADTI184

Overview

This chapter explains the method for detailed collection of histograms.

Method

1. Detailed Histogram Collection

The following is an example of collecting histograms in detail.

Specify the number of histogram buckets as 10 for collection. (set_histogram.sql)

SQL> CALL DBMS_STATS.GATHER_TABLE_STATS(USER,'ARMY',
 method_opt='FOR ALL COLUMNS SIZE 10', estimate_percent=100);

Check the stored histograms. (view_histogram.sql)

SQL> @view_histogram.sql

 NAME BUCKET_CNT NULL_CNT   DISTINCT_CNT LOW_VAL                                     
HIGH_VAL                                 DENSITY AVG_COL_SIZE
---- ---------- ---------- ------------ ------------------------------------------- 
------------------------------------------ --------- ------
ID            1          0        10000 1.0000000000000000000000000000000000000E+00 
1.0000000000000000000000000000000000000E+04     .0001 2.9801
NAME          1          0         9996 AAAST ZZXVC                                       
                                      .00010004      5
JOB           1          0            5 ARCHER WIZARD                                     
                                             .2   7.67
SAL           1          0           99 1.0000000000000000000000000000000000000E+02 
9.9000000000000000000000000000000000000E+03 .01010101      2
DEPT          1          0            5 DETAIL RETHINKING .2 7.25

NAME       HIST_HEAD_ID BUCKET     END_POINT  END_POINT_ACTUAL                           
        DISTINCT_CNT
---------- ------------ ---------- ---------- --------------------------------------------
------ ------------
 ID                   26 1          10000      1.0000000000000000000000000000000000000E+04 
              10000
 NAME                 27 1          2.5432E+16 ZZXVC                                       
               9996
 JOB                  28 1          2.4569E+16 WIZARD                                     
                   5
 SAL                  29 1          9900       9.9000000000000000000000000000000000000E+03 
                 99
 DEPT                 30 1          2.3157E+16 RETHINKING                                 
                   5

Looking at the above results, since the number of buckets was not specified when calling GATHER_TABLE_STATS, the default value of 1 was applied, so each column has only one bucket.

The first query shows the histogram header (_DD_HIST_HEAD) result, which displays information such as the minimum value, maximum value, and distinct count within the entire column.

The second query shows the histogram bucket (_DD_HISTOGRAM) result, which contains information such as the maximum value and distinct count within one histogram bucket. (The minimum value can be inferred from the header and the maximum value of the previous bucket)

Analyzing the results, the ID column values are unique, so the distinct count matches the total number of rows, 10,000. For JOB and DEPT, which have a fixed number of unique values, the counts appear as expected.

Use set autotrace traceonly exp plan stat to compare estimated values with actual results.

Run the query to check the plan. (view_plan_detail_for_histogram.sql)

SQL> set lines 150
SQL> set autot traceonly exp plans
SQL> SELECT * FROM ARMY WHERE ID <= 100;

set autot off
SQL ID: 1434
Plan Hash Value: 2324995388

Execution Plan
--------------------------------------------------------------------------------
 1 TABLE ACCESS (ROWID): ARMY (Cost:7, %CPU:0, Rows:100)
 2   
INDEX (RANGE SCAN): ARMY_PK (Cost:2, %CPU:0, Rows:100)

Predicate Information
--------------------------------------------------------------------------------
2 - access: ("ARMY"."ID" <= 100) (0.010)

Execution Stat
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): ARMY (Time:.26 ms, Rows:100, CR:5, CU:0, Starts:1)
2   INDEX (RANGE SCAN): ARMY_PK (Time:.06 ms, Rows:100, CR:2, CU:0, Starts:1)

The Rows shown in the execution plan are estimates, and the Rows in the Execution Stat are the actual number of rows returned by the plan nodes. Because ID values are evenly distributed, the estimate in this case was very accurate.

Statistics Error and Actual Plan

The following is a query result with a condition on the DEPT column. (view_plan_detail_for_histogram.sql)

SQL> set lines 150
SQL> set autot traceonly exp plans
SQL> SELECT * FROM ARMY WHERE DEPT = 'PROMISE';

SQL ID: 1446
Plan Hash Value: 1631011846
Execution Plan
----------------------------------------------------------------------------------
 1  TABLE ACCESS (FULL): ARMY (Cost:211, %CPU:0, Rows:2000)

Predicate Information
----------------------------------------------------------------------------------
1 - filter: ("ARMY"."DEPT" = 'PROMISE') (0.200)

Execution Stat
-----------------------------------------------------------------------------------
1 TABLE ACCESS (FULL): ARMY (Time:5.54 ms, Rows:2500, CR:481, CU:27, Starts:1)

When there is no histogram, the optimizer assumes that values within one bucket are uniformly distributed, so the estimate for PROMISE and RETHINKING will be the same. Since there are 5 unique values out of 10,000 rows, it estimates 10,000/5 = 2,000 rows.

However, in reality, PROMISE has 2,500 rows and RETHINKING has 1,250 rows, causing an error. This example does not significantly affect the optimizer's execution plan based on statistics, but the next example can lead to performance degradation.

SQL> SELECT * FROM ARMY WHERE JOB = 'INFANTRY';
SQL ID: 1452
Plan Hash Value: 749881390

Execution Plan
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): ARMY (Cost:197, %CPU:0, Rows:2000)
2   
INDEX (RANGE SCAN): ARMY_IDX_JOB (Cost:6, %CPU:0, Rows:2000)

Predicate Information
--------------------------------------------------------------------------------
2 - access: ("ARMY"."JOB" = 'INFANTRY') (0.200)

SQL ID: 1452
Plan Hash Value: 749881390
Execution Stat
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): ARMY (Time:23.1 ms, Rows:8000, CR:472, CU:0, Starts:1)
2   
INDEX (RANGE SCAN): ARMY_IDX_JOB (Time:.68 ms, Rows:8000, CR:26, CU:0, Starts:1)

According to the histogram for the JOB column, there are 5 unique values and a total of 10,000 rows, so estimating 2,000 rows for the INFANTRY condition means the optimizer's estimate was correct.

However, in reality, there are 8,000 rows, and because it estimated 2,000, the plan used INDEX RANGE SCAN + ROWID SCAN, resulting in 8,000 ROWID SCANs actually occurring.

This is an example where incorrect statistics negatively affected the execution plan, which could lead to severe performance degradation in larger scale situations.

2. Method for Selecting Histogram Targets

The following is an example of selecting histogram targets. The CLAN table is created with the same schema and indexes as the ARMY table before performing this.

SQL> SELECT ARMY.ID, ARMY.NAME, CLAN.SAL, CLAN.DEPT FROM ARMY, CLAN
 WHERE ARMY.JOB=CLAN.JOB
 AND ARMY.JOB='INFANTRY';
 
stage  count   cpu   elapsed  current    query    disk      rows
------ ------ ------ -------- -------- -------- -------- ----------
 parse      1   0.00      0.00      0         0        0          0
 exec       1   0.00      0.00      0         0        0          0
 fetch  25744  15.09     18.44      0       996        0   64000000
 ------ ------ ------ -------- -------- -------- -------- ----------
 sum    25746  15.09     18.44      0       996        0   64000000
 
rows     excution plan
-------- --------------------------------------------------
 64000000 hash join (et=18103446, cr=0, cu=0, co=408, cpu=3, ro=4000000)
    8000   table access (rowid) ARMY(1769) (et=9511, cr=472, cu=0, co=197, cpu=0, 
ro=2000)
    8000     index (range scan) ARMY_IDX_JOB(1772) (et=677, cr=26, cu=0, co=6, cpu=0, ro=2000)
    8000   table access (rowid) CLAN(1775) (et=6843, cr=472, cu=0, co=195, cpu=0, ro=2000)
    8000     index (range scan) CLAN_IDX_JOB(1778) (et=283, cr=26, cu=0, co=6, cpu=0, ro=2000)

In the SQL trace above, rows is the actual number of output rows, and the ro= item in the execution plan is the estimated cardinality.

For both ARMY and CLAN tables, the index range scan estimate is 2,000 each, and the join result is estimated at 4,000,000, but the actual number is much larger at 64,000,000.

If such a case occurs, check using autotrace in tbsql.

SQL> set autot traceonly exp
SQL> SELECT ARMY.ID, ARMY.NAME, CLAN.SAL, CLAN.DEPT FROM ARMY, ARMY CLAN
 WHERE ARMY.JOB=CLAN.JOB
   AND ARMY.JOB='INFANTRY';
SQL> set autot off

SQL ID: 117
Plan Hash Value: 1730287211

Execution Plan
--------------------------------------------------------------------
  1 HASH JOIN (Cost:408, %CPU:3, Rows:4000000)
  2   TABLE ACCESS (ROWID): ARMY (Cost:197, %CPU:0, Rows:2000)
  3     INDEX (RANGE SCAN): ARMY_IDX_JOB (Cost:6, %CPU:0, Rows:2000)
  4   TABLE ACCESS (ROWID): CLAN (Cost:195, %CPU:0, Rows:2000)
  5     INDEX (RANGE SCAN): CLAN_IDX_JOB (Cost:6, %CPU:0, Rows:2000)
  
Predicate Information
------------------------------------------------------------------
  1 - access: ("ARMY"."JOB" = "CLAN"."JOB") (1.000)
  3 - access: ("ARMY"."JOB" = 'INFANTRY') (0.200)
  5 - access: ("ARMY"."JOB" = 'INFANTRY') (0.200)

As shown above, you can check how selectivity is determined for each condition through the Predicate Information section.

The following is an example to check how many actual rows satisfy the conditions. It shows that the statistics for the JOB column are very inaccurate.

SQL> SELECT NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME='ARMY';

  NUM_ROWS
  ----------
     10000
     
1 row selected.

SQL> SELECT COUNT(*) FROM ARMY WHERE JOB='INFANTRY';
 
 COUNT(*)
 ----------
 8000
 
1 row selected.

The above results likely stem from an insufficient number of histogram buckets, so increase the number of buckets and recollect statistics.

There is a significant performance difference between having 1 bucket and 2 or more buckets during statistics collection, but once 2 or more buckets are used, the difference is negligible regardless of the exact number. Therefore, if you cannot determine the exact number of buckets, use the maximum value (256).

Using the FOR ALL COLUMNS SIZE 256 option collects statistics with 256 buckets for all columns in the table.

SQL> CALL DBMS_STATS.GATHER_TABLE_STATS(USER,'ARMY',method_opt='FOR ALL COLUMNS SIZE 256');

If the table has too many columns, use the FOR ALL INDEXED COLUMNS option. This collects statistics only on columns with indexes, which generally control selectivity and thus is usually sufficient.

Related to

Search

Welcome to Tibero GTS!

Tibero Large-Scale System Statistics Collection Guide - 9. Histogram Detailed Collection

Overview

Method

1. Detailed Histogram Collection

2. Method for Selecting Histogram Targets

업무 외 시간 안내

Search

Welcome to Tibero GTS!

Overview

Method

1. Detailed Histogram Collection

2. Method for Selecting Histogram Targets