Document Type | Technical Information

Category | Administration

Applicable Product Version | Tibero 7 (DB 7.2.4)

Document Number | TADTI232

Overview

This article explains the basic components of Tibero Standby Cluster (TSC) and the conditions for Auto Failover along with related parameters.

1. What is TSC?

• It is an Active-Standby configuration.

• Used for database high availability, data protection, disaster recovery, and similar purposes.

• The Standby server is configured at a physically independent location.

• There are two ways to switch roles between Active and Standby. (Switchover, Failover)

• Manual command execution is required • Response to failures can be delayed.

2. What is an Observer?

• Helps detect failure situations in TSC servers and enables automatic Failover (AFO: Auto FailOver).

• Allows rapid failure response.

• The conditions for AFO vary depending on the Observer's operating mode.

3. Observer Operating Modes

Category	PROTECTIVE	DISASTER_PLAN
Purpose	Prioritizes Primary stability. (default mode)	Preparedness for disaster situations (AFO is executed in as many situations as possible.)
Advantages	Minimizes cases where the primary instance shuts itself down for AFO under any circumstances.	Performs AFO even if all CMs in the primary cluster are down. (Suitable for disasters like power outages and earthquakes.)
Disadvantages	At least one CM in the primary cluster must be connected normally to the Observer for AFO to occur. (If the primary node server goes down entirely due to disasters like earthquakes or power outages, AFO will not occur.)	The primary instance may shut itself down in various cases.
How to Set	cmrctl set --tscid <tsc_id> –mode protective	cmrctl set --tscid <tsc_id> –mode disaster_plan

4. How to Check Observer Connection Status

$ cmrctl show --tscid <tsc_id>

MODE: The operating mode of the observer.

CONN: Connection status with the respective CM.

N: Observer is not connected with the CM.

Y(M): Observer is connected with the CM, and that CM is the master CM in its cluster.

Y: Observer is connected with the CM, but that CM is not the master CM in its cluster.

LOG: Sync status between Primary and Standby nodes.

N: CM is not connected to the DB.

-: Displayed on the primary node. (Data transmission source)

1: Displayed on the standby node. (Indicates how many redo threads the standby instance is receiving logs from.)

Heartbeat: The time interval for heartbeat checks between each CM and the Observer, corresponding to the _CM_OBSERVER_EXPIRE parameter (default: 60 seconds).

RCVD.TSN: The synchronized TSN value, meaningful only on the Standby node.

Primary/Target CMs receive heartbeat messages from the Observer once every second.

Normal TSC synchronization status

Methods

1. TSC AFO Related Parameters

_OBSERVER_EXPIRE_TIME (Default: 60 seconds)

Communicates with each CM to detect status. If no communication occurs for the set time, AFO proceeds.

* _OBSERVER_EXPIRE_TIME < CM_HEARTBEAT_EXPIRE + CM_NET_MARGIN

_CM_MAX_IO_RETRY (Default: 10 seconds)

Time counted when access to the CM file fails. After this time passes, CM shuts down the DB.

* CM file must be located in the data area.

_STANDBY_NETWORK_TIMEOUT (Default: 60 seconds)

Time to detect disconnection between Primary and Standby clusters.

* Related log: INST_TSC_INFO msg. Primary connection count is changed. 1 -> 0.

_CM_AUTOFAILOVER_WAIT_TIME (Default: 10 seconds)

Wait time before the Observer performs AFO.

* Related log: Start count down auto failover.

2. Observer Mode & AFO Occurrence by Scenario

1. CM abnormal

	PROTECTIVE			DISASTER_PLAN
[1] cm abnormal (cm process kill)	Failover		X	Failover		O
	Category	Primary	Target	Category	Primary	Target
	CM process	Down	Up	CM process	Down	Up
	DB process	Down	Up(RECO)	DB process	Down	Up(FAILOVER)
	Notes		tibero MTHR detects cm abnormality and immediately terminates the db instance.	Notes		tibero MTHR detects cm abnormality and immediately terminates the db instance. The observer waits for ①+② before performing AFO.
	Related Parameters		X	Related Parameters		① CM_OBSERVER_EXPIRE + extra 10 seconds ② _CM_AUTOFAILOVER_WAIT_TIME

2. CM down

	PROTECTIVE			DISASTER_PLAN
[2] cm down (tbcm -d)	Failover		X	Failover		X
	Category	Primary	Target	Category	Primary	Target
	CM process	Down	Up	CM process	Down	Up
	DB process	Down	Up(RECO)	DB process	Down	Up(RECO)
	Notes		tbdown abnormal is executed before cm is brought down.	Notes		tbdown abnormal is executed before cm is brought down.
	Related Parameters		X	Related Parameters		X

3. db abnormal

	PROTECTIVE			DISASTER_PLAN
[3] db abnormal (db process kill / tbdown abnormal)	Failover		O	Failover		O
	Category	Primary	Target	Category	Primary	Target
	CM process	Up	Up	CM process	Up	Up
	DB process	Down	Up(FAILOVER)	DB process	Down	Up(FAILOVER)
	Notes		CM immediately detects DB down and observer waits for ① before performing AFO.	Notes		CM immediately detects DB down and observer waits for ① before performing AFO.
	Related Parameters		① _CM_AUTOFAILOVER_WAIT_TIME	Related Parameters		① _CM_AUTOFAILOVER_WAIT_TIME

4. db down

	PROTECTIVE			DISASTER_PLAN
[4] db down (tbdown immediate)	Failover		O	Failover		O
	Category	Primary	Target	Category	Primary	Target
	CM process	Up	Up	CM process	Up	Up
	DB process	Down	Up(FAILOVER)	DB process	Down	Up(FAILOVER)
	Notes		CM immediately detects DB down and observer waits for ① before performing AFO.	Notes		CM immediately detects DB down and observer waits for ① before performing AFO.
	Related Parameters		① _CM_AUTOFAILOVER_WAIT_TIME	Related Parameters		① _CM_AUTOFAILOVER_WAIT_TIME

5. Network Failure

	PROTECTIVE			DISASTER_PLAN
[5] network failure (ifdown)	Failover		X	Failover		O
	Category	Primary	Target	Category	Primary	Target
	CM process	Down	Up	CM process	Down	Up
	DB process	Down	Up(RECO)	DB process	Down	Up(FAILOVER)
	Notes		• After checking the network for ① duration from Observer/primary CM, primary db & cm go down. • Standby recognizes primary cluster failure after ②.	Notes		• After checking the network for ① duration from Observer/primary CM, primary db & cm go down. • Standby recognizes primary cluster failure after ②, then observer waits for ③+④ before performing AFO.
	Related Parameters		① _CM_OBSERVER_EXPIRE ② _STANDBY_NETWORK_TIMEOUT + extra 1~5 seconds	Related Parameters		① _CM_OBSERVER_EXPIRE ② _STANDBY_NETWORK_TIMEOUT + extra 1~5 seconds ③ _CM_OBSERVER_EXPIRE + extra 10 seconds ④ CM_AUTO_FAILOVER_WAIT_TIME

6. cmfile Failure

	PROTECTIVE			DISASTER_PLAN
[6] cm file failure (rm cmfile / chmod 000 cmfile)	Failover		X	Failover		O
	Category	Primary	Target	Category	Primary	Target
	CM process	Up (cluster down)	Up	CM process	Up (cluster down)	Up
	DB process	Down	Up(RECO)	DB process	Down	Up(FAILOVER)
	Notes		• ① < ②: If ① is exceeded, CM shuts down the DB cluster. • ① > ②: If ② is exceeded, CM shuts down the DB cluster.	Notes		• ① < ②: If ① is exceeded, CM shuts down the DB cluster. • ① > ②: If ② is exceeded, CM shuts down the DB cluster. • Then waits for ③+④ before sending the AFO message.
	Related Parameters		① _CM_MAX_IO_RETRY ② CM_HEARTBEAT_EXPIRE	Related Parameters		① _CM_MAX_IO_RETRY ② CM_HEARTBEAT_EXPIRE ③ _CM_OBSERVER_EXPIRE + extra 10 seconds ④ _CM_AUTO_FAILOVER_TIME

Related to

Search

Welcome to Tibero GTS!

Auto Failover Conditions and Related Parameters in TSC

Overview

1. What is TSC?

2. What is an Observer?

3. Observer Operating Modes

4. How to Check Observer Connection Status

Methods

1. TSC AFO Related Parameters

_OBSERVER_EXPIRE_TIME (Default: 60 seconds)

_CM_MAX_IO_RETRY (Default: 10 seconds)

_STANDBY_NETWORK_TIMEOUT (Default: 60 seconds)

_CM_AUTOFAILOVER_WAIT_TIME (Default: 10 seconds)

2. Observer Mode & AFO Occurrence by Scenario

1. CM abnormal

2. CM down

3. db abnormal

4. db down

5. Network Failure

6. cmfile Failure

업무 외 시간 안내

Search

Welcome to Tibero GTS!

Overview

1. What is TSC?

2. What is an Observer?

3. Observer Operating Modes

4. How to Check Observer Connection Status

Methods

1. TSC AFO Related Parameters

_OBSERVER_EXPIRE_TIME (Default: 60 seconds)

_CM_MAX_IO_RETRY (Default: 10 seconds)

_STANDBY_NETWORK_TIMEOUT (Default: 60 seconds)

_CM_AUTOFAILOVER_WAIT_TIME (Default: 10 seconds)

2. Observer Mode & AFO Occurrence by Scenario

1. CM abnormal

2. CM down

3. db abnormal

4. db down

5. Network Failure

6. cmfile Failure