Document Type | TroubleShooting
Category | Administration
Applicable Product Version | 6FS06
Document Number | TADTS043
Issue
After performing drop partition, a large number of DDD Timeout logs occurred in sys.log, causing delay in the drop partition operation and inability to perform select.
To verify the above phenomenon, at least the following logs and information should be collected.
- diag.log / diag dump
- ilog
- sys.log
- callstack / .out
- lock & session related views : v$lock, v$session, v$transaction, v$waiter_session, v$blocker_session
If it is a TAC environment, logs must be extracted from each node and viewed through global views.
The process to identify the cause of the delay through the collected logs is as follows.
sys.log / gv$lock
DDL is being performed in session 1705 on node 2, but it is waiting for session 2381 on node 3.
-- #sys.log
[2024-02-28T16:07:32.959719] [DDL-1705] [I] Executing DDL: ALTER TABLE TMUSER.TBR_TM_SAME_5GRU_CHATBOT_TREND DROP PARTITION PT_2024022511
...
[2024-02-28T16:38:44.660656] [ACF-214] [I] DDD timeouted! tid 1705, lkb 00007f9c28e5ba50, rsb 00007f9c78dec0a0, 1:16-82-1357311467
[2024-02-28T16:39:14.605657] [ACF-214] [I] DDD timeouted! tid 1705, lkb 00007f9c28e5ba50, rsb 00007f9c78dec0a0, 1:16-82-1357311467
[2024-02-28T16:39:45.319655] [ACF-214] [I] DDD timeouted! tid 1705, lkb 00007f9c28e5ba50, rsb 00007f9c78dec0a0, 1:16-82-1357311467
-- #select * from gv$lock
INST_ID TYPE ID1 ID2 LMODE REQUESTED SESS_ID CTIME
3 WLOCK_DD_OBJ 82 1357311467 1 0 2381 101347
2 WLOCK_DD_OBJ 82 1357311467 1 5 1705 56
diag dump / sys.log / gv$session
Checking the diag dump shows that thread 231 on node 3 is in a dead state.
Checking sys.log reveals a history of forcibly terminating the thread by session kill force.
For reference, the session information of 2413, which performs cleanup on behalf of the dead thread, shows that the cleanup work is not yet completed and the state is running.
-- #diag dump
Trace session. ID: 2381 (WTHR)
----------------------------------
Status : RUNNING
User name : TMUSER
V.sessid : 2381
SQL E.T. : 70604100(ms)
SQL ID : 9558962701554134066
SQL ppid : 0
Program : JOB_SCHEDULER
Wait event: NONE
Trace attached thread: 2381
----------------------------------
PID: 9572
Physical TID: 2
Thread status: DEAD
----------------------------------
Callstack dump for sessid[2381] (filename: tbsvr.callstack.9572)
----------------------------------------
-- #sys.log
[2024-02-27T12:47:07.050098] [DDL-1790] [I] Executing DDL: alter system kill session '2381,399323' force
[2024-02-27T12:47:07.050357] [FRM-1790] [I] get SESS_KILL_FORCE req 2381
[2024-02-27T12:47:07.050369] [FRM-1790] [I] SEND SESS_KILL FORCE SIGNAL to 2381
[2024-02-27T12:47:07.050413] [FRM-198] [I] SEND SESS_KILL FORCE SIGNAL to 2381
[2024-02-27T12:47:07.050515] [DDL-1790] [I] DDL execution succeeded
[2024-02-27T12:47:07.051128] [FRM-2413] [I] WTHR(2381) recovery
[2024-02-27T12:47:07.051147] [FRM-2413] [I] thr cleanup WTHR(2381) sess(2381)
-- # select * from gv$session;
INST_ID SID SERIAL# AUDSID USER# USERNAME IPADDR COMMAND STATUS SCHEMA#
SCHEMANAME TYPE SQL_ID SQL_CHILD_NUMBER PREV_SQL_ID PREV_CHILD_NUMBER SQL_ET LOGON_TIME
STATE WLOCK_WAIT WAIT_EVENT WAIT_TIME PGA_USED_MEM SQL_TRACE PROG_NAME CLIENT_PID PID WTHR_ID
OS_THR_ID OSUSER MACHINE TERMINAL MODULE
ACTION CLIENT_INFO CLIENT_IDENTIFIER PDML_ENABLED PDML_STATUS PDDL_STATUS
PQ_STATUS ROW_WAIT_OBJ_ID ROW_WAIT_FILE_NO ROW_WAIT_BLOCK_NO ROW_WAIT_ROW_NO CONSUMER_GROUP CONSUMED_CPU_TIME
3 2413 418817 4294967295 0 SYS 0 READY 0
SYS WTHR 0 2024/02/27 12:47:07
RUNNING -1 0 11384 DISABLED SO RECOVERY(WTHR) 0 9577 4
11518
NO DISABLED ENABLED
ENABLED -1 1835296
callstack
- Node 2, session 1705: This is the DDL execution session trying to acquire an exclusive lock on the object, but it is waiting because session 2381 on node 3 is holding the lock.
- Node 3, session 2381: The thread was terminated due to session kill force during the operation.
- Node 3, session 2413: Performing cleanup of the dead session 2381, and this cleanup must complete to release the lock held by 2381.
Thread 8 (Thread 0x7f8fbbff7700 (LWP 11518)): #0 0x00007fbbc2b021ad in nanosleep () from /lib64/libc.so.6 #1 0x00007fbbc2b32ec4 in usleep () from /lib64/libc.so.6 #2 0x0000000001030395 in gca_cancel (lksb=0x7fac18616270, ccc_flags=0) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_gca.c:559 #3 0x0000000001031668 in gca_recover_lock (lksb=0x7fac18616270) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_gca.c:734 #4 0x00000000010675d8 in ccc_cleanup_lksb (so=0x7fac18616270, tid=<optimized out>, flag=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_lksb.c:211 #5 0x0000000000e1b48c in sess_so_cleanup (sessid=2381) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/so_cleanup.c:251 #6 0x000000000057c884 in wthr_so_cleanup_thr () at /data/autodist/TLeft_263087/tibero6/src/tbsvr/body/tbsvr_wthr.c:3514 #7 0x00000000005a49e2 in thread_main_chk_bitmask (my_tid=2413, bitmask=...) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/body/tbsvr_wthr.c:516 #8 0x00000000005ab4cb in svr_wthr_main_internal (my_tid=2413) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/body/tbsvr_wthr.c:919 #9 0x0000000000e31de0 in wthr_init (args=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/tbsvr_cthr.c:3288 #10 0x00007fbbc4d2be25 in start_thread () from /lib64/libpthread.so.0 #11 0x00007fbbc2b3b34d in clone () from /lib64/libc.so.6
diag.log
2024/02/27 12:05:36.502 211 diag file '/data1/NMSMNDB5G3/dump/diag/tb_[CCC_WAIT]_1113_140524-12582964.s1176.n3.diag'(5444 bytes) deleted due to limit 2024/02/27 12:06:21.027 2381 trace request! sessid: 2381, trc: 1 args: 2 2024/02/27 12:06:21.027 211 trace started. id 12583232 type 1 2024/02/27 12:06:21.027 211 [nid=3][LOCAL]CCC WAIT trace rsb [49|511_00339724] for trace id 12583232 from sid 2381 2024/02/27 12:06:21.027 211 [1][CCC MASTER][49|511_00339724] shadow in grq (0): SHR-lock granted (rq=-1, bl=5) 2024/02/27 12:06:21.028 211 trace finished. id 12583232 2024/02/27 12:06:36.028 2381 trace request! sessid: 2381, trc: 1 args: 2 2024/02/27 12:06:36.028 211 trace started. id 12583233 type 1 2024/02/27 12:06:36.028 211 [nid=3][LOCAL]CCC WAIT trace rsb [49|511_00339724] for trace id 12583233 from sid 2381
diag dump
===== CCC_WAIT, TraceID: 12583270 =====
*** 2024/02/27 12:42:06.040 ***
[nid=2] trace start with rsb id [49|511_00339724] from nid: 3 (timeout cnt=144)
Trace CCC RSB(00007fa0d30ea4c0) timeout count=144
----------------------------------------
rsb<00007fa0d30ea4c0>49|511_00339724:nid 3,f 20000,rf 2,mgr 3,mbl 5,lw 0,wf 5,bf 0,rt(0), cf 0,cgr 3,cbl 5,chi -1,ht(0,0),wt(0,0),dt(0,0), ogt(0,0) wr_nid -1 lw 0 snd 1,rcv 2,bh<0000000000000000>
[0/0] msg type: 2
[1/7] msg type: 0
[2/6] msg type: 0
[3/5] msg type: 0
[4/4] msg type: 0
[5/3] msg type: 0
[6/2] msg type: 0
[7/1] msg type: 0
lkb(00007f9fcc25d9e0):st 1,gr 3,rq -1,bl 5,mt 0,nid 3,exf 10000,f 8000000,sbf 0,rcv 1,snd 1,htsn(0,0),rsb<00007fa0d30ea4c0>[49|511_00339724]
lksb(00007fa141dbf720): owner WTHR(tid 1306, node 2, pid 4136), status -22,flags 0x0, lkb 00007f9fcc25d9e0, priv 00007f9be5219da8
No bh in this lcb(00007fa0d30ea670), rsb(00007fa0d30ea4c0)
Trace session. ID: 1306 (WTHR)
----------------------------------
Status : SESS_CLEANUP
User name : TMUSER
V.sessid : 1306
Program : 5GNMS_PCCLIENT.EXE
Wait event: NONE
Trace attached thread: 1306
----------------------------------
PID: 4136
Physical TID: 7
Thread status: DEAD
----------------------------------
Callstack dump for sessid[1306] (filename: tbsvr.callstack.4136)
----------------------------------
...
sys.log
It can be confirmed that a SIGSEGV occurred during so recovery for 1306.
[2024-02-23T17:02:30.025691] [FRM-1306] [I] Signal 'SIGSEGV'(11) caught (TID = 1306, OS_THR_ID = 7367, PID = 4136): si_addr = 0000000000001fd0 : the cause of the signal can be found in a process callstack file. ... [2024-02-23T17:03:55.995824] [FRM-2408] [I] WTHR(1306) recovery [2024-02-23T17:03:55.995870] [FRM-2408] [I] thr cleanup WTHR(1306) sess(1306) [2024-02-23T17:03:55.995939] [FRM-2408] [I] Signal 'SIGSEGV'(11) caught (TID = 2408, OS_THR_ID = 7474, PID = 4297): si_addr = 0000000000000190 : the cause of the signal can be found in a process callstack file.
callstack
Checking the tbsvr.callstack.4136 file reveals that SIGSEGV occurred and that so recovery for 1306 was attempted by session 2408 but failed again with SIGSEGV, as confirmed by the callstack.
Thread 5 (Thread 0x7f7f83ff7700 (LWP 7367)): #0 0x00007fab9c4781ad in nanosleep () from /lib64/libc.so.6 #1 0x00007fab9c4a8ec4 in usleep () from /lib64/libc.so.6 #2 0x0000000000e4b812 in sync_sig_handler (signo=11, siginfo=0x7f7f7800fd70, context=0x7f7f7800fc40) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/tbsvr_signal.c:1587 #3 <signal handler called> #4 0x00007fab9c509339 in _memmove_ssse3_back () from /lib64/libc.so.6 #5 0x0000000001cfec06 in sm_stat_new (csr=0x7f9bd978b390, is_pe=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ex/sql_monitor.c:315 #6 0x0000000000e12c8b in refresh_sql_monitor_stats () at /data/autodist/TLeft_263087/tibero6/src/tbsvr/include/chk_sesskill.h:32 #7 bitq_read_internal (bitmask=..., timeout=0, event_type=WE_ACF_MTX_RW, id1=-1, id2=-1) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/bitqueue.c:453 #8 0x0000000000f33060 in wait_mtx (l=<optimized out>, status=MTX_STS_READ_WAITING, use_so=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/acf/acf_mtx.c:158 #9 0x0000000000f3412a in mtx_rdlock_ex (l=0x7f9897642118, nowait=0 '\000', use_so=224 '\340') at /data/autodist/TLeft_263087/tibero6/src/tbsvr/acf/acf_mtx.c:222 #10 0x00000000010433e1 in ccc_unlock (lksb=0x7fa141dbf720, flags=0) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_lock.c:4735 #11 0x0000000001031a4f in gca_unlock_internal (buf=0x7f9be5219da8, ccc_flags=8160) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_gca.c:442 #12 0x00000000018c2760 in tcbuf_pin_cleanup (so=<optimized out>, sessid=<optimized out>, flag=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/tc/tc_buf.c:1369 #13 0x0000000000e19a20 in sess_csr_so_cleanup (so_list=0x7f9bd978b3a8, sessid=1306) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/so_cleanup.c:158 #14 0x0000000001cd99c0 in csr_close (csr=0x7f9bd978b390, is_recovery=1 '\001') at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ex/csr.c:1802 #15 0x0000000000e1f1ce in so_csr_marker_cleanup (so=0x7f994f482a30, sessid=<optimized out>, flag=224 '\340') at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/so_registry.c:163 #16 0x0000000000e18010 in so_cleanup_after (id=1306, so_list=0x7f7fb4d54ad0, marker=0x7f9bd97843e0) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/so_cleanup.c:59 #17 0x00000000005ac84f in svr_wthr_main_internal (my_tid=1306) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/body/tbsvr_wthr.c:955 #18 0x0000000000e31de0 in wthr_init (args=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/tbsvr_cthr.c:3288 #19 0x00007fab9e6a1e25 in start_thread () from /lib64/libpthread.so.0 #20 0x00007fab9c4b134d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f7f8a7c7700 (LWP 7474)): #0 0x00007fab9c4781ad in nanosleep () from /lib64/libc.so.6 #1 0x00007fab9c4a8ec4 in usleep () from /lib64/libc.so.6 #2 0x0000000000e4b812 in sync_sig_handler (signo=11, siginfo=0x7f7f7400fd70, context=0x7f7f7400fc40) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/tbsvr_signal.c:1587 #3 <signal handler called> #4 clone_exstat (stat=0x7fab88c75018, exn=0x7fab88c75018) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ex/sql_monitor.c:204 #5 0x0000000001cfeb7d in sm_stat_new (csr=0x7f9bd978b390, is_pe=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ex/sql_monitor.c:309 #6 0x0000000000e12c8b in refresh_sql_monitor_stats () at /data/autodist/TLeft_263087/tibero6/src/tbsvr/include/chk_sesskill.h:32 #7 bitq_read_internal (bitmask=..., timeout=0, event_type=WE_ACF_MTX_RW, id1=-1, id2=-1) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/bitqueue.c:453 #8 0x0000000000f33060 in wait_mtx (l=<optimized out>, status=MTX_STS_READ_WAITING, use_so=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/acf/acf_mtx.c:158 #9 0x0000000000f3412a in mtx_rdlock_ex (l=0x7f9897642118, nowait=0 '\000', use_so=95 '') at /data/autodist/TLeft_263087/tibero6/src/tbsvr/acf/acf_mtx.c:222 #10 0x0000000001041e71 in ccc_cancel (lksb=0x7fa141dbf720, flags=0) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_lock.c:5063 #11 0x00000000010303a0 in gca_cancel (lksb=0x7fa141dbf720, ccc_flags=0) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_gca.c:553 #12 0x0000000001031668 in gca_recover_lock (lksb=0x7fa141dbf720) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_gca.c:734 #13 0x00000000010675d8 in ccc_cleanup_lksb (so=0x7fa141dbf720, tid=<optimized out>, flag=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ccc/ccc_lksb.c:211 #14 0x0000000000e19a20 in sess_csr_so_cleanup (so_list=0x7f9bd978b3a8, sessid=1306) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/so_cleanup.c:158 #15 0x0000000001cd94be in csr_close (csr=0x7f9bd978b390, is_recovery=1 '\001') at /data/autodist/TLeft_263087/tibero6/src/tbsvr/ex/csr.c:1566 #16 0x0000000000e1f1ce in so_csr_marker_cleanup (so=0x7fab88c75018, sessid=<optimized out>, flag=95 '_') at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/so_registry.c:163 #17 0x0000000000e1b48c in sess_so_cleanup (sessid=1306) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/so_cleanup.c:251 #18 0x000000000057c884 in wthr_so_cleanup_thr () at /data/autodist/TLeft_263087/tibero6/src/tbsvr/body/tbsvr_wthr.c:3514 #19 0x00000000005a49e2 in thread_main_chk_bitmask (my_tid=2408, bitmask=...) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/body/tbsvr_wthr.c:516 #20 0x00000000005ab4cb in svr_wthr_main_internal (my_tid=2408) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/body/tbsvr_wthr.c:919 #21 0x0000000000e31de0 in wthr_init (args=<optimized out>) at /data/autodist/TLeft_263087/tibero6/src/tbsvr/frame/tbsvr_cthr.c:3288 #22 0x00007fab9e6a1e25 in start_thread () from /lib64/libpthread.so.0 #23 0x00007fab9c4b134d in clone () from /lib64/libc.so.6
Cause
When a
session kill request (signal) is delivered to the target session for termination, that session performs so recovery (= so cleanup) to cancel ongoing work or release owned resources.For example, when session A receives a
session kill signal and thread B attempts so recovery to handle it, a SIGSEGV error occurs causing failure.Then thread C tries so recovery again for session A, but similarly fails due to
SIGSEGV error.Due to these repeated failures, the resources held by session A were not properly released, leaving the locks held by that session unreleased.
In other words, the drop partition operation delay and inability to perform select occurred because resource release was not completed properly due to SIGSEGV errors during so recovery.
Solutions
Apply the patch to resolve the issue. (Applied patch: 264981c)
CautionApply the patch through technical support provided by TmaxTibero.