Document Type | Troubleshooting
Category | Administration
Applicable Product Versions | 5SP1FS01, 5SP1FS02, 5SP1FS03, 5SP1FS04, 5SP1FS06, 6FS01, 6FS02, 6FS03, 6FS04, 6FS05, 6FS06, 6FS07, 6FS07PS, 7FS01, 7FS02, 7FS02PS
Document Number | TADTS015
Issue
This is a phenomenon where specific string data is not displayed correctly.
- Case 1: When the data looks the same during query but actually contains different data
- Case 2: When inserting content from documents such as Hancom Office directly through the AP, certain special characters are not displayed correctly
-- Example of the phenomenon
drop table tibero.pk_test
create table tibero.pk_test(id1 varchar2(10), id2 varchar2(10), data1 varchar2(10), data2 varchar2(10))
alter table tibero.pk_test add constraint pk primary key(id1, id2)
insert into tibero.pk_test values ('key1','key2',1,1)
insert into tibero.pk_test values ('key1 ','key2',1,1)
insert into tibero.pk_test values ('key1 ','key2',1,1)
insert into tibero.pk_test values ('key1๓ฐฑ','key2',1,1) -- Broken character is a special character copied/pasted from Hangul
insert into tibero.pk_test values ('key1๓ฐฒ','key2',1,1) -- Broken character is a special character copied/pasted from Hangul
commit
desc tibero.pk_test
COLUMN_NAME TYPE CONSTRAINT
ID1 VARCHAR(10) PRIMARY KEY
ID2 VARCHAR(10) PRIMARY KEY
D1 NUMBER D2 NUMBER
INDEX_NAME TYPE COLUMN_NAME
PK1 NORMAL ID1 ID2
select id1,id2 from tibero.pk_test
ID1 ID2
1 key1 key2
2 key1 key2
3 key1ใ key2 -- It is difficult to visually distinguish the differences in data for rows 1, 2, and 3
4 key1๓ฐฑ key2
5 key1๓ฐฒ key2 -- The last character of ID1 in rows 4 and 5 is broken
Cause
- Case 1: Full-width space characters, half-width space characters, and tab characters are all different characters, but it is difficult to recognize the difference when querying.
(Rows 1, 2, and 3 in the example above)
- Case 2: Hancom Office uses an area in Unicode with address values that are unused, where Hangul-specific characters are assigned. In this case, Hancom Office reads the byte values and displays the corresponding characters, but in UTF8 and others, these are unused areas, so the characters are not displayed correctly. Since these are unsupported characters, there is no suitable character for the value during insert, so they are not displayed properly, but the value itself is retained and inserted as is.
NoteThis is a case where the input is stored exactly as entered by the user. The character set reads the address code and outputs the character corresponding to it.
Solutions
This is not an error but a characteristic of character set processing. If necessary, check the byte code using to_char(rawtohex(string)) for analysis.
The value "key1" has a byte value of 6B657931 1 : E38080 = Full-width space character used in Japanese character sets 2 : 20 = space 3 : 09 = tab character 4 : F3B08AB1 = Unused area in UTF8 / U+F02B1 in Unicode unused area / Used as a square box with number 1 in Hangul. Looks the same as 5 but actually a different character is inserted 5 : F3B08AB2 = Unused area in UTF8 / U+F02B2 in Unicode unused area / Used as a square box with number 2 in Hangul. Looks the same as 4 but actually a different character is inserted select id1,id2,length(id1),lengthb(id1),to_char(rawtohex(id1)) from tibero.pk_test ID1 ID2 LENGTH(ID1) LENGTHB(ID1) TO_CHAR(RAWTOHEX(ID1)) 1 key1ใ key2 5 7 6B657931E38080 2 key1 key2 5 5 6B65793120 3 key1 key2 5 5 6B65793109 4 key1๓ฐฑ key2 5 8 6B657931F3B08AB1 5 key1๓ฐฒ key2 5 8 6B657931F3B08AB2