Document Type | Technical Information
Category | Administration
Applicable Product Version | Tibero 7 (DB 7.2.4)
Document Number | TADTI232
Overview
This article explains the basic components of Tibero Standby Cluster (TSC) and the conditions for Auto Failover along with related parameters.
1. What is TSC?
• It is an Active-Standby configuration.
• Used for database high availability, data protection, disaster recovery, and similar purposes.
• The Standby server is configured at a physically independent location.
• There are two ways to switch roles between Active and Standby. (Switchover, Failover)
• Manual command execution is required • Response to failures can be delayed.
2. What is an Observer?
• Helps detect failure situations in TSC servers and enables automatic Failover (AFO: Auto FailOver).
• Allows rapid failure response.
• The conditions for AFO vary depending on the Observer's operating mode.
3. Observer Operating Modes
Category | PROTECTIVE | DISASTER_PLAN |
|---|---|---|
Purpose | Prioritizes Primary stability. (default mode) | Preparedness for disaster situations (AFO is executed in as many situations as possible.) |
Advantages | Minimizes cases where the primary instance shuts itself down for AFO under any circumstances. | Performs AFO even if all CMs in the primary cluster are down. (Suitable for disasters like power outages and earthquakes.) |
Disadvantages | At least one CM in the primary cluster must be connected normally to the Observer for AFO to occur. (If the primary node server goes down entirely due to disasters like earthquakes or power outages, AFO will not occur.) | The primary instance may shut itself down in various cases. |
How to Set | cmrctl set --tscid <tsc_id> –mode protective | cmrctl set --tscid <tsc_id> –mode disaster_plan |
4. How to Check Observer Connection Status
$ cmrctl show --tscid <tsc_id>MODE: The operating mode of the observer.
CONN: Connection status with the respective CM.
N: Observer is not connected with the CM. Y(M): Observer is connected with the CM, and that CM is the master CM in its cluster. Y: Observer is connected with the CM, but that CM is not the master CM in its cluster. |
LOG: Sync status between Primary and Standby nodes.
N: CM is not connected to the DB. -: Displayed on the primary node. (Data transmission source) 1: Displayed on the standby node. (Indicates how many redo threads the standby instance is receiving logs from.) |
Heartbeat: The time interval for heartbeat checks between each CM and the Observer, corresponding to the _CM_OBSERVER_EXPIRE parameter (default: 60 seconds).
RCVD.TSN: The synchronized TSN value, meaningful only on the Standby node.
- Primary/Target CMs receive heartbeat messages from the Observer once every second.
- Normal TSC synchronization status
Methods
1. TSC AFO Related Parameters
_OBSERVER_EXPIRE_TIME (Default: 60 seconds)
Communicates with each CM to detect status. If no communication occurs for the set time, AFO proceeds.
* _OBSERVER_EXPIRE_TIME < CM_HEARTBEAT_EXPIRE + CM_NET_MARGIN
_CM_MAX_IO_RETRY (Default: 10 seconds)
Time counted when access to the CM file fails. After this time passes, CM shuts down the DB.
* CM file must be located in the data area.
_STANDBY_NETWORK_TIMEOUT (Default: 60 seconds)
Time to detect disconnection between Primary and Standby clusters.
* Related log: INST_TSC_INFO msg. Primary connection count is changed. 1 -> 0.
_CM_AUTOFAILOVER_WAIT_TIME (Default: 10 seconds)
Wait time before the Observer performs AFO.
* Related log: Start count down auto failover.
2. Observer Mode & AFO Occurrence by Scenario
1. CM abnormal
| PROTECTIVE | DISASTER_PLAN | ||||
|---|---|---|---|---|---|---|
[1] cm abnormal (cm process kill) | Failover | X | Failover | O | ||
Category | Primary | Target | Category | Primary | Target | |
CM process | Down | Up | CM process | Down | Up | |
DB process | Down | Up(RECO) | DB process | Down | Up(FAILOVER) | |
Notes | tibero MTHR detects cm abnormality and immediately terminates the db instance. | Notes | tibero MTHR detects cm abnormality and immediately terminates the db instance. The observer waits for ①+② before performing AFO. | |||
Related Parameters | X | Related Parameters | ① CM_OBSERVER_EXPIRE + extra 10 seconds ② _CM_AUTOFAILOVER_WAIT_TIME | |||
2. CM down
| PROTECTIVE | DISASTER_PLAN | ||||
|---|---|---|---|---|---|---|
[2] cm down (tbcm -d) | Failover | X | Failover | X | ||
Category | Primary | Target | Category | Primary | Target | |
CM process | Down | Up | CM process | Down | Up | |
DB process | Down | Up(RECO) | DB process | Down | Up(RECO) | |
Notes | tbdown abnormal is executed before cm is brought down. | Notes | tbdown abnormal is executed before cm is brought down. | |||
Related Parameters | X | Related Parameters | X | |||
3. db abnormal
| PROTECTIVE | DISASTER_PLAN | ||||
|---|---|---|---|---|---|---|
[3] db abnormal (db process kill / tbdown abnormal) | Failover | O | Failover | O | ||
Category | Primary | Target | Category | Primary | Target | |
CM process | Up | Up | CM process | Up | Up | |
DB process | Down | Up(FAILOVER) | DB process | Down | Up(FAILOVER) | |
Notes | CM immediately detects DB down and observer waits for ① before performing AFO. | Notes | CM immediately detects DB down and observer waits for ① before performing AFO. | |||
Related Parameters | ① _CM_AUTOFAILOVER_WAIT_TIME | Related Parameters | ① _CM_AUTOFAILOVER_WAIT_TIME | |||
4. db down
| PROTECTIVE | DISASTER_PLAN | ||||
|---|---|---|---|---|---|---|
[4] db down (tbdown immediate) | Failover | O | Failover | O | ||
Category | Primary | Target | Category | Primary | Target | |
CM process | Up | Up | CM process | Up | Up | |
DB process | Down | Up(FAILOVER) | DB process | Down | Up(FAILOVER) | |
Notes | CM immediately detects DB down and observer waits for ① before performing AFO. | Notes | CM immediately detects DB down and observer waits for ① before performing AFO. | |||
Related Parameters | ① _CM_AUTOFAILOVER_WAIT_TIME | Related Parameters | ① _CM_AUTOFAILOVER_WAIT_TIME | |||
5. Network Failure
| PROTECTIVE | DISASTER_PLAN | ||||
|---|---|---|---|---|---|---|
[5] network failure (ifdown) | Failover | X | Failover | O | ||
Category | Primary | Target | Category | Primary | Target | |
CM process | Down | Up | CM process | Down | Up | |
DB process | Down | Up(RECO) | DB process | Down | Up(FAILOVER) | |
Notes | • After checking the network for ① duration from Observer/primary CM, primary db & cm go down. • Standby recognizes primary cluster failure after ②. | Notes | • After checking the network for ① duration from Observer/primary CM, primary db & cm go down. • Standby recognizes primary cluster failure after ②, then observer waits for ③+④ before performing AFO. | |||
Related Parameters | ① _CM_OBSERVER_EXPIRE ② _STANDBY_NETWORK_TIMEOUT + extra 1~5 seconds | Related Parameters | ① _CM_OBSERVER_EXPIRE ② _STANDBY_NETWORK_TIMEOUT + extra 1~5 seconds ③ _CM_OBSERVER_EXPIRE + extra 10 seconds ④ CM_AUTO_FAILOVER_WAIT_TIME | |||
6. cmfile Failure
| PROTECTIVE | DISASTER_PLAN | ||||
|---|---|---|---|---|---|---|
[6] cm file failure (rm cmfile / chmod 000 cmfile) | Failover | X | Failover | O | ||
Category | Primary | Target | Category | Primary | Target | |
CM process | Up (cluster down) | Up | CM process | Up (cluster down) | Up | |
DB process | Down | Up(RECO) | DB process | Down | Up(FAILOVER) | |
Notes | • ① < ②: If ① is exceeded, CM shuts down the DB cluster. • ① > ②: If ② is exceeded, CM shuts down the DB cluster. | Notes | • ① < ②: If ① is exceeded, CM shuts down the DB cluster. • ① > ②: If ② is exceeded, CM shuts down the DB cluster. • Then waits for ③+④ before sending the AFO message. | |||
Related Parameters | ① _CM_MAX_IO_RETRY ② CM_HEARTBEAT_EXPIRE | Related Parameters | ① _CM_MAX_IO_RETRY ② CM_HEARTBEAT_EXPIRE ③ _CM_OBSERVER_EXPIRE + extra 10 seconds ④ _CM_AUTO_FAILOVER_TIME | |||