Document Type | Technical Information
Category | Administration
Applicable Product Version | 7FS02PS
Document Number | TADTI184
Overview
This chapter explains the method for detailed collection of histograms.
Method
1. Detailed Histogram Collection
The following is an example of collecting histograms in detail.
- Specify the number of histogram buckets as 10 for collection. (set_histogram.sql)
SQL> CALL DBMS_STATS.GATHER_TABLE_STATS(USER,'ARMY', method_opt='FOR ALL COLUMNS SIZE 10', estimate_percent=100);
- Check the stored histograms. (view_histogram.sql)
SQL> @view_histogram.sql
NAME BUCKET_CNT NULL_CNT DISTINCT_CNT LOW_VAL
HIGH_VAL DENSITY AVG_COL_SIZE
---- ---------- ---------- ------------ -------------------------------------------
------------------------------------------ --------- ------
ID 1 0 10000 1.0000000000000000000000000000000000000E+00
1.0000000000000000000000000000000000000E+04 .0001 2.9801
NAME 1 0 9996 AAAST ZZXVC
.00010004 5
JOB 1 0 5 ARCHER WIZARD
.2 7.67
SAL 1 0 99 1.0000000000000000000000000000000000000E+02
9.9000000000000000000000000000000000000E+03 .01010101 2
DEPT 1 0 5 DETAIL RETHINKING .2 7.25
NAME HIST_HEAD_ID BUCKET END_POINT END_POINT_ACTUAL
DISTINCT_CNT
---------- ------------ ---------- ---------- --------------------------------------------
------ ------------
ID 26 1 10000 1.0000000000000000000000000000000000000E+04
10000
NAME 27 1 2.5432E+16 ZZXVC
9996
JOB 28 1 2.4569E+16 WIZARD
5
SAL 29 1 9900 9.9000000000000000000000000000000000000E+03
99
DEPT 30 1 2.3157E+16 RETHINKING
5Looking at the above results, since the number of buckets was not specified when calling GATHER_TABLE_STATS, the default value of 1 was applied, so each column has only one bucket.
The first query shows the histogram header (_DD_HIST_HEAD) result, which displays information such as the minimum value, maximum value, and distinct count within the entire column.
The second query shows the histogram bucket (_DD_HISTOGRAM) result, which contains information such as the maximum value and distinct count within one histogram bucket. (The minimum value can be inferred from the header and the maximum value of the previous bucket)
Analyzing the results, the ID column values are unique, so the distinct count matches the total number of rows, 10,000. For JOB and DEPT, which have a fixed number of unique values, the counts appear as expected.
- Use set autotrace traceonly exp plan stat to compare estimated values with actual results.
Run the query to check the plan. (view_plan_detail_for_histogram.sql)
SQL> set lines 150
SQL> set autot traceonly exp plans
SQL> SELECT * FROM ARMY WHERE ID <= 100;
set autot off
SQL ID: 1434
Plan Hash Value: 2324995388
Execution Plan
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): ARMY (Cost:7, %CPU:0, Rows:100)
2
INDEX (RANGE SCAN): ARMY_PK (Cost:2, %CPU:0, Rows:100)
Predicate Information
--------------------------------------------------------------------------------
2 - access: ("ARMY"."ID" <= 100) (0.010)
Execution Stat
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): ARMY (Time:.26 ms, Rows:100, CR:5, CU:0, Starts:1)
2 INDEX (RANGE SCAN): ARMY_PK (Time:.06 ms, Rows:100, CR:2, CU:0, Starts:1)The Rows shown in the execution plan are estimates, and the Rows in the Execution Stat are the actual number of rows returned by the plan nodes. Because ID values are evenly distributed, the estimate in this case was very accurate.
- Statistics Error and Actual Plan
The following is a query result with a condition on the DEPT column. (view_plan_detail_for_histogram.sql)
SQL> set lines 150
SQL> set autot traceonly exp plans
SQL> SELECT * FROM ARMY WHERE DEPT = 'PROMISE';
SQL ID: 1446
Plan Hash Value: 1631011846
Execution Plan
----------------------------------------------------------------------------------
1 TABLE ACCESS (FULL): ARMY (Cost:211, %CPU:0, Rows:2000)
Predicate Information
----------------------------------------------------------------------------------
1 - filter: ("ARMY"."DEPT" = 'PROMISE') (0.200)
Execution Stat
-----------------------------------------------------------------------------------
1 TABLE ACCESS (FULL): ARMY (Time:5.54 ms, Rows:2500, CR:481, CU:27, Starts:1)When there is no histogram, the optimizer assumes that values within one bucket are uniformly distributed, so the estimate for PROMISE and RETHINKING will be the same. Since there are 5 unique values out of 10,000 rows, it estimates 10,000/5 = 2,000 rows.
However, in reality, PROMISE has 2,500 rows and RETHINKING has 1,250 rows, causing an error. This example does not significantly affect the optimizer's execution plan based on statistics, but the next example can lead to performance degradation.
SQL> SELECT * FROM ARMY WHERE JOB = 'INFANTRY';
SQL ID: 1452
Plan Hash Value: 749881390
Execution Plan
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): ARMY (Cost:197, %CPU:0, Rows:2000)
2
INDEX (RANGE SCAN): ARMY_IDX_JOB (Cost:6, %CPU:0, Rows:2000)
Predicate Information
--------------------------------------------------------------------------------
2 - access: ("ARMY"."JOB" = 'INFANTRY') (0.200)
SQL ID: 1452
Plan Hash Value: 749881390
Execution Stat
--------------------------------------------------------------------------------
1 TABLE ACCESS (ROWID): ARMY (Time:23.1 ms, Rows:8000, CR:472, CU:0, Starts:1)
2
INDEX (RANGE SCAN): ARMY_IDX_JOB (Time:.68 ms, Rows:8000, CR:26, CU:0, Starts:1)According to the histogram for the JOB column, there are 5 unique values and a total of 10,000 rows, so estimating 2,000 rows for the INFANTRY condition means the optimizer's estimate was correct.
However, in reality, there are 8,000 rows, and because it estimated 2,000, the plan used INDEX RANGE SCAN + ROWID SCAN, resulting in 8,000 ROWID SCANs actually occurring.
This is an example where incorrect statistics negatively affected the execution plan, which could lead to severe performance degradation in larger scale situations.
2. Method for Selecting Histogram Targets
The following is an example of selecting histogram targets. The CLAN table is created with the same schema and indexes as the ARMY table before performing this.
SQL> SELECT ARMY.ID, ARMY.NAME, CLAN.SAL, CLAN.DEPT FROM ARMY, CLAN
WHERE ARMY.JOB=CLAN.JOB
AND ARMY.JOB='INFANTRY';
stage count cpu elapsed current query disk rows
------ ------ ------ -------- -------- -------- -------- ----------
parse 1 0.00 0.00 0 0 0 0
exec 1 0.00 0.00 0 0 0 0
fetch 25744 15.09 18.44 0 996 0 64000000
------ ------ ------ -------- -------- -------- -------- ----------
sum 25746 15.09 18.44 0 996 0 64000000
rows excution plan
-------- --------------------------------------------------
64000000 hash join (et=18103446, cr=0, cu=0, co=408, cpu=3, ro=4000000)
8000 table access (rowid) ARMY(1769) (et=9511, cr=472, cu=0, co=197, cpu=0,
ro=2000)
8000 index (range scan) ARMY_IDX_JOB(1772) (et=677, cr=26, cu=0, co=6, cpu=0, ro=2000)
8000 table access (rowid) CLAN(1775) (et=6843, cr=472, cu=0, co=195, cpu=0, ro=2000)
8000 index (range scan) CLAN_IDX_JOB(1778) (et=283, cr=26, cu=0, co=6, cpu=0, ro=2000)In the SQL trace above, rows is the actual number of output rows, and the ro= item in the execution plan is the estimated cardinality.
For both ARMY and CLAN tables, the index range scan estimate is 2,000 each, and the join result is estimated at 4,000,000, but the actual number is much larger at 64,000,000.
If such a case occurs, check using autotrace in tbsql.
SQL> set autot traceonly exp
SQL> SELECT ARMY.ID, ARMY.NAME, CLAN.SAL, CLAN.DEPT FROM ARMY, ARMY CLAN
WHERE ARMY.JOB=CLAN.JOB
AND ARMY.JOB='INFANTRY';
SQL> set autot off
SQL ID: 117
Plan Hash Value: 1730287211
Execution Plan
--------------------------------------------------------------------
1 HASH JOIN (Cost:408, %CPU:3, Rows:4000000)
2 TABLE ACCESS (ROWID): ARMY (Cost:197, %CPU:0, Rows:2000)
3 INDEX (RANGE SCAN): ARMY_IDX_JOB (Cost:6, %CPU:0, Rows:2000)
4 TABLE ACCESS (ROWID): CLAN (Cost:195, %CPU:0, Rows:2000)
5 INDEX (RANGE SCAN): CLAN_IDX_JOB (Cost:6, %CPU:0, Rows:2000)
Predicate Information
------------------------------------------------------------------
1 - access: ("ARMY"."JOB" = "CLAN"."JOB") (1.000)
3 - access: ("ARMY"."JOB" = 'INFANTRY') (0.200)
5 - access: ("ARMY"."JOB" = 'INFANTRY') (0.200) As shown above, you can check how selectivity is determined for each condition through the Predicate Information section.
The following is an example to check how many actual rows satisfy the conditions. It shows that the statistics for the JOB column are very inaccurate.
SQL> SELECT NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME='ARMY';
NUM_ROWS
----------
10000
1 row selected.
SQL> SELECT COUNT(*) FROM ARMY WHERE JOB='INFANTRY';
COUNT(*)
----------
8000
1 row selected.The above results likely stem from an insufficient number of histogram buckets, so increase the number of buckets and recollect statistics.
There is a significant performance difference between having 1 bucket and 2 or more buckets during statistics collection, but once 2 or more buckets are used, the difference is negligible regardless of the exact number. Therefore, if you cannot determine the exact number of buckets, use the maximum value (256).
Using the FOR ALL COLUMNS SIZE 256 option collects statistics with 256 buckets for all columns in the table.
SQL> CALL DBMS_STATS.GATHER_TABLE_STATS(USER,'ARMY',method_opt='FOR ALL COLUMNS SIZE 256');
If the table has too many columns, use the FOR ALL INDEXED COLUMNS option. This collects statistics only on columns with indexes, which generally control selectivity and thus is usually sufficient.