Document Type | Technical Information
Category | Interface/Integration
Applicable Product Version | OpenSQL 3
Document Number | OIITI001
This document is an opensql technical document based on postgresql.
Overview
pg_repack helps perform Vacuum Full operations online without creating an exclusive lock, making it suitable for services in operation.
This document guides you through using pg_repack.
Since PostgreSQL does not have a separate reorg operation, this functionality is replaced by an extension.
Note
Due to the unique system called Vacuum, there is no separate reorg function.
You can check detailed information at the following link.
https://reorg.github.io/pg_repack/
- Online Cluster (ordered by cluster index)
- Sorted by specified column
- Online Vacuum Full
- Rebuild or rearrange only the indexes of the table
Method
Prerequisites
- Must be Super User
- The table must have a Primary Key OR a Unique Index on a NOT NULL column
- PostgreSQL development package installed (PostgreSQL-Devel)
Download
https://pgxn.org/dist/pg_repack/ is available for download.
Installation
$ cd pg_repack $ make $ make install $ psql -c "create extension pg_repack" -d [Database_name]
Usage
pg_repack [options] ... [DB_Name]
Options
The following options can be specified.
Common Options
| Option | Description |
|---|---|
| -a, --all | Run on all databases |
| -t, --table=TABLE | Run on a specific table only |
| -I, --parent-table=TABLE | Run on a specific parent table only |
| -c, --schema=SCHEMA | Run on tables in a specific schema only |
| -s, --tablespace=TBLSPC | Move processed tables to a new tablespace |
| -S, --moveidx | Move processed indexes to a new tablespace |
| -o, --order-by=COLUMNS | Sort by columns instead of cluster key |
| -n, --no-order | Run full vacuum instead of cluster |
| -N, --dry-run | Print what would be done and exit |
| -j, --jobs=NUM | Run this many parallel jobs per table |
| -i, --index=INDEX | Run on a specified index only |
| -x , - -only-indexes | Run only on indexes of the specified table |
| -T, --wait-timeout=SECS | Terminate other backends on conflict |
| -D, --no-kill-backend | Do not terminate other backends on timeout |
| -Z, --no-analyze | Do not run analyze |
| -k, --no-superuser-check | Skip superuser check |
| -C, --exclude-extension | Do not repack tables belonging to specific extensions |
Connection Options
| Option | Description |
|---|---|
| -d, --dbname=DBNAME | Name of the database to connect to |
| -h, --host=HOSTNAME | Database server host or socket directory |
| -p, --port=PORT | Database server port |
| -U, --username=USERNAME | Username to connect as |
| -w, --no-password | Do not prompt for password |
| -W, --password | Prompt for password |
Generic Options
| Option | Description |
|---|---|
| -e, --echo | Echo queries |
| -E, --elevel=LEVEL | Set output message level |
| --help | Show help |
| --version | Show version information |
Examples
Perform online CLUSTER on all clustered tables and online VACUUM FULL on all non-clustered tables in the database.
$ pg_repack test
Perform online VACUUM FULL on tables foo and bar in the database.
$ pg_repack --no-order --table foo --table bar test
Move all indexes of table foo to tablespace tbs.
$ pg_repack -d test --table foo --only-indexes --tablespace tbs
Move specified index to tablespace tbs.
$ pg_repack -d test --index idx --tablespace tbs
Limitations
- Temporary tables cannot be processed by pg_repack.
- Tables using GiST indexes cannot be processed by pg_repack.
- While pg_repack is working, DDL commands on the target table cannot be executed except VACUUM or ANALYZE. During full table repacking, ACCESS SHARE lock is held on the target table to enforce this restriction, and for versions prior to 1.1.8, no DDL commands should be attempted on the target table while pg_repack is running. In most cases, pg_repack will fail correctly and roll back, but earlier versions may cause data corruption in some cases.
Operation Details
Code level (1170 lines):
To perform full table repacking, pg_repack follows these steps:
- Create a log table to record changes made to the original table.
- Add triggers to the original table to log INSERT, UPDATE, and DELETE operations to the log table.
- Create a new table containing all rows from the original table.
- Build indexes on this new table.
- Apply all changes recorded in the log table to the new table.
- Use system catalogs to swap the tables, including indexes and toast tables.
- Drop the original table.
pg_repack holds ACCESS EXCLUSIVE locks only briefly during the initial setup phase (steps 1 and 2) and the final swap and drop phase (steps 6 and 7). For the rest of the time, pg_repack holds only ACCESS SHARE locks on the original table, allowing INSERT, UPDATE, and DELETE operations to continue.
Testing
Vacuum Full
- Vacuum Full
- After performing vacuum full in photo 2, running a select statement in photo 1 shows the select statement is waiting for a response
- Check
- Lock, dead_tuple
-- Check Lock
SELECT
locktype,
virtualtransaction,
transactionid,
nspname,
relname,
mode,
granted,
cast(date_trunc('second',query_start) AS timestamp) AS query_start,
substr(query,1,25) AS query
FROM
pg_locks
LEFT OUTER JOIN pg_class ON (pg_locks.relation = pg_class.oid)
LEFT OUTER JOIN pg_namespace ON (pg_namespace.oid = pg_class.
relnamespace),
pg_stat_activity
WHERE
NOT pg_locks.pid=pg_backend_pid() AND
pg_locks.pid=pg_stat_activity.pid ;
-- Check dead_tuple
SELECT
schemaname || '.' || relname AS table_name,
n_dead_tup,
n_live_tup,
CASE
WHEN (n_dead_tup + n_live_tup) = 0 THEN -1
ELSE ROUND((n_dead_tup::decimal / (n_dead_tup + n_live_tup)) * 100, 2)
END AS dead_tuple_ratio
FROM
pg_stat_all_tables
WHERE
schemaname NOT LIKE 'pg_%' AND
schemaname != 'information_schema'
ORDER BY
dead_tuple_ratio DESC;
pg_repack
- Select statement during pg_repack
- Since pg_repack does not take accessExclusive lock, Select statements can be executed.
- Select statements executed after pg_repack
- Update during pg_repack
- Update statements must finish before pg_repack finishes.
Otherwise, pg will execute a cancel command on the update statement.
- Update statements must finish before pg_repack finishes.
- Update statements on the table after pg_repack
- If the update statement does not finish before pg_repack finishes, pg cancels the update transaction.
-
Perform analyze after cancel
- No problem if UPDATE finishes before pg_repack finishes.
Failover
Test environment preparation
- pg_repack configured
- autovacuum = off setting
- Generate dead tuples
- DB down during pg_repack
-- Primary (192.168.1.222) nohup pg_repack -t bmsql_stock -d benchmarksql -U benchmarksql &
- Before pg_ctl stop
- After pg_ctl stop and failover
- After failover, statistics need to be updated when querying on Primary (192.168.1.223)
-- Primary (192.168.1.222) psql benchmarksql # To check dead_tuple status, perform analyze analyze public.bmsql_stock ; analyze public.bmsql_item ; analyze public.bmsql_order_line; analyze public.bmsql_new_order ; analyze public.bmsql_config ; analyze public.bmsql_warehouse ; analyze public.bmsql_history ; analyze public.bmsql_district ; analyze public.bmsql_oorder ; analyze public.bmsql_customer ;
- Compare tuples after updating statistics
- Confirmed that tables created during pg_repack remain intact
- Errors occur on Primary (192.168.1.223) after the issue occurs
- Because failover occurred during pg_repack, objects created for pg_repack remain, so all related objects must be dropped.
- List of objects to drop
- tables
- triggers
- types
- It is safe to drop the extension and recreate it.
- Running pg_repack with the -e option allows you to check detailed operations.