Document Type | Troubleshooting

Category | Monitoring/Inspection

Applicable Product Versions | 5SP1FS01, 5SP1FS02, 5SP1FS03, 5SP1FS04, 5SP1FS06, 6FS01, 6FS02, 6FS03, 6FS04, 6FS05, 6FS06, 6FS07, 6FS07PS, 7FS01, 7FS02, 7FS02PS

Document Number | TMOTS011

Issue

The server, which was operating normally, became inaccessible or, even if accessible, experienced a sudden drop in performance. Additionally, the database running within the server also became significantly slower than usual.

Cause

In the Linux kernel, if abnormal memory access by a process is detected, a memory dump (Core Dump) is performed, causing a sudden increase in resource usage that can lead to system performance degradation.

The Linux kernel monitors abnormal access in real-time for memory protection. When a problem occurs, it logs Segmentation Fault or General Protection messages in /var/log/messages and terminates the process with SIGSEGV (segmentation violation) while performing a memory dump (dumping core).

Processes generally use the following two types of memory areas:

VSS (Virtual Set Size): The entire memory area mapped by the process

RSS (Resident Set Size): The area actually in use

When forcibly terminating a process that attempted abnormal memory access, the Linux kernel does not know which memory section caused the issue, so it generates a dump for the entire VSS area.

During the memory dump, CPU, Memory, and DISK I/O usage occur depending on the size of the VSS. The larger the VSS, the more resource usage spikes relatively.

Checking CPU Usage

Use the sar command with the -q option to check core usage.

The ldavg-1, ldavg-5, ldavg-15 values indicate the number of cores working.
(ldavg refers to Load Average values for 1 minute, 5 minutes, and 15 minutes)

On a 24-core server, it can be seen that the number of requested cores compared to allocated cores at the problem time is high, which can be considered abnormal.

$ sar -q
15:20:01     runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
15:20:01        25      7690     20.82     17.63     15.00         0
15:30:01        23      7684     19.23     18.55     16.61         0
15:40:01        13      7694     14.25     14.82     15.41         0
16:26:37         8      7782    148.43    165.40    171.44         0 <- Problem time
16:30:01        10      7789    127.00    146.79    162.90         0 <- Problem time
16:40:01         6      7785     92.69    104.66    133.27         1 <- Problem time
16:50:01        16      7816     62.64     75.12    105.34         0 <- Problem time
17:00:01        15      7815     11.63     27.52     67.23         1

Checking Memory (Swap Out/In)

Use the sar command with the -W option to check swap usage.

When memory usage increases sharply, the swap area is used.

Sending memory from main memory to swap area is called swap out, and returning memory from swap area to main memory is called swap in.

Swap out (pswpout) and swap in (pswpin) show very high values at the problem time.

$ sar -W
15:30:01    pswpin/s pswpout/s
15:30:01      0.00      0.00
15:40:01      0.00      0.00
16:26:37    276.53   1627.42<- Problem time
16:30:01     75.30   2330.37<- Problem time
16:40:01    764.02   2450.31<- Problem time
16:50:01    599.91   1903.95<- Problem time
17:00:01    573.69   1108.47<- Problem time
17:10:01     98.03      0.00
17:20:01    102.21      0.00

Checking Linux Kernel Detection (/var/log/message)

The mxg_tib (PID: 70809) process receives the SIGSEGV signal and terminates abnormally.

Apr 30 15:33:08 BPOTDB01 abrt-hook-ccpp: Process 70809 (mxg_tib) of user 1002 killed by SIGSEGV - dumping core
Apr 30 15:33:09 BPOTDB01 abrt-server: Executable '/home/maxgauge/semas241/bin/mxg_tib' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Apr 30 15:33:09 BPOTDB01 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2025-04-30-15:33:08-70809' exited with 1
Apr 30 15:33:09 BPOTDB01 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2025-04-30-15:33:08-70809'

Afterward, logs indicate that the database process performing key server tasks has no response for more than 120 seconds.

Apr 30 15:52:40 BPOTDB01 kernel: INFO: task tbsvr:22059 blocked for more than 120 seconds.
Apr 30 15:52:40 BPOTDB01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 30 15:52:40 BPOTDB01 kernel: tbsvr           D ffff8d91c015acc0     0 22059  21530 0x00000080
Apr 30 15:52:40 BPOTDB01 kernel: Call Trace:
Apr 30 15:52:40 BPOTDB01 kernel: [<ffffffff83f87169>] schedule+0x29/0x70
Apr 30 15:52:40 BPOTDB01 kernel: [<ffffffff83f88b55>] rwsem_down_read_failed+0x105/0x1c0
Apr 30 15:52:40 BPOTDB01 kernel: [<ffffffff83b97528>] call_rwsem_down_read_failed+0x18/0x30
Apr 30 15:52:40 BPOTDB01 kernel: [<ffffffff83f86450>] down_read+0x20/0x40
Apr 30 15:52:40 BPOTDB01 kernel: [<ffffffff83f8e8fd>] __do_page_fault+0x4bd/0x500
Apr 30 15:52:40 BPOTDB01 kernel: [<ffffffff83f8e975>] do_page_fault+0x35/0x90
Apr 30 15:52:40 BPOTDB01 kernel: [<ffffffff83f8a778>] page_fault+0x28/0x30
Apr 30 15:52:40 BPOTDB01 kernel: INFO: task tbsvr:25337 blocked for more than 120 seconds.
Apr 30 15:52:40 BPOTDB01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 30 15:52:40 BPOTDB01 kernel: tbsvr           D ffff8d91c005acc0     0 25337  21530 0x00000080
Apr 30 15:52:40 BPOTDB01 kernel: Call Trace:
Apr 30 15:52:40 BPOTDB01 kernel: [<ffffffffc065ce4e>] ? bond_start_xmit+0x1be/0x420 [bonding]
... omitted ...

It can be confirmed that the max_tib process detected abnormal memory and was forcibly terminated by the Linux kernel with a memory dump performed.

Process information monitored at the problem time.

The max_tib process running at the time had very high VSS, causing the memory dump to take a long time and resource usage to increase sharply.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
70809 maxgauge  20   0  130.7g 650664 632136 S  18.2  0.2   9:20.34 mxg_tib -c semas241 -r -D

Solutions

The Linux kernel performs memory dumps on abnormal memory access, but in most cases, a full dump is not necessary.

Dumping the entire VSS causes excessive resource usage (CPU, memory, disk I/O), leading to server performance degradation and disk thrashing.

Therefore, it is recommended to enable core dump size limitation settings.

Processes Constituting the Database

In the Tibero database, when abnormal behavior is detected internally, its own monitoring process recognizes this and automatically records BackTrace logs.

Log path: $TB_HOME/instance/$TB_SID

Log file: tbsvr.callstack.%PID

Note

How to Limit Core Dumps of Processes with Abnormal Memory Access in the Linux Kernel

Modify the /etc/security/limits.conf configuration file.

The changes will only take effect after the process is restarted.

Checking Memory Dump Size

Identify the PID of the process you want to check.

Check the memory dump size applied through the process PID.

The "Max core file size" item corresponds to the memory dump size. If it is set to unlimited, it means there is no limit.

$ ps -ef |grep tbsvr 
tibero     1111133       1  0 May14 pts/3    00:00:26 tbsvr          -t NORMAL -SVR_SID psource1
tibero     1111134 1111133  0 May14 pts/3    00:00:00 /tibero/tibero_engine/bin/tblistener -n 11 -t NORMAL -SVR_SID psource1
tibero     1111135 1111133  0 May14 pts/3    00:00:00 tbsvr_MGWP     -t NORMAL -SVR_SID psource1
tibero     1111136 1111133  0 May14 pts/3    00:00:00 tbsvr_FGWP000  -t NORMAL -SVR_SID psource1
tibero     1111137 1111133  0 May14 pts/3    00:00:00 tbsvr_FGWP001  -t NORMAL -SVR_SID psource1
tibero     1111138 1111133  0 May14 pts/3    00:00:00 tbsvr_FGWP002  -t NORMAL -SVR_SID psource1
... omitted ...


$ cat /proc/1111133/limits 
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes 
Max open files            1048576              1048576              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       159739               159739               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Applying Memory Dump Size Limit

It can be applied per user on the server.

Since the database is configured under the tibero user on the server, apply the memory dump size limit to the tibero account.

After modifying the configuration file, changes do not apply immediately. They will be applied after restarting the process.

# cat /etc/security/limits.conf
tibero          soft    core           0
tibero          hard    core           0

!! Restart the database !! 


$ ps -ef|grep tbsvr
tibero     1583929       1 14 06:49 pts/1    00:00:00 tbsvr          -t NORMAL -SVR_SID psource1
tibero     1583936 1583929  0 06:50 pts/1    00:00:00 tbsvr_MGWP     -t NORMAL -SVR_SID psource1
tibero     1583937 1583929  0 06:50 pts/1    00:00:00 tbsvr_FGWP000  -t NORMAL -SVR_SID psource1
tibero     1583938 1583929  0 06:50 pts/1    00:00:00 tbsvr_FGWP001  -t NORMAL -SVR_SID psource1
tibero     1583939 1583929  0 06:50 pts/1    00:00:00 tbsvr_FGWP002  -t NORMAL -SVR_SID psource1
tibero     1583940 1583929  0 06:50 pts/1    00:00:00 tbsvr_FGWP003  -t NORMAL -SVR_SID psource1
tibero     1583941 1583929  0 06:50 pts/1    00:00:00 tbsvr_FGWP004  -t NORMAL -SVR_SID psource1
tibero     1583942 1583929  0 06:50 pts/1    00:00:00 tbsvr_FGWP005  -t NORMAL -SVR_SID psource1
tibero     1583943 1583929  0 06:50 pts/1    00:00:00 tbsvr_FGWP006  -t NORMAL -SVR_SID psource1


$ cat /proc/1111133/limits 
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    0                    bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes 
Max open files            1048576              1048576              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       159739               159739               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Related to

Search

Welcome to Tibero GTS!

Sudden Performance Degradation of Linux Server

Issue

Cause

Checking CPU Usage

Checking Memory (Swap Out/In)

Checking Linux Kernel Detection (/var/log/message)

Solutions

Processes Constituting the Database

업무 외 시간 안내

Search

Welcome to Tibero GTS!

Issue

Cause

Checking CPU Usage

Checking Memory (Swap Out/In)

Checking Linux Kernel Detection (/var/log/message)

Solutions

Processes Constituting the Database