- Open Access
- Authors : Le Hoang Thi My
- Paper ID : IJERTV12IS060030
- Volume & Issue : Volume 12, Issue 06 (June 2023)
- Published (First Online): 19-06-2023
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
The solution to sort entries in the Vietnamese-Ede bilingual vocabulary database
Le Hoang Thi My
University of Technology and Education The University of Danang
Danang, Vietnam
Abstract The process of querying data in the vocabulary database, the work of arranging the data in ascending or descending order of each language is a criterion that should be considered in the study of building a vocabulary database. The implementation of sorting an English data table in alphabetical order with the Order by clause in the SQL statement is simple. Because the ASCII encoding and database management systems are used with the English alphabet. However, with the ethnic minority languages of Vietnam in general and the Ede language in particular, the implementation of alphabetical ordering has not been supported and has not received the attention of scientists. Therefore, when writing a database programmer for Ede language, it is difficult to present a data table arranged in alphabetical order. To solve this problem for Ede language in the lexical database, the article proposes a solution to sort Ede entries in the Vietnamese-Ede bilingual lexical database, in order to contribute to the search and control. investigate and manage data, build Ede data tables in alphabetical order of Ede language..
Keywords: Ede language processing, Unicode encoding, entry sorting, vocabulary database, data query
-
INTRODUCTION
All information processing activities on computers are related to text editor. Sorting is the process of rearranging the elements of a certain set of objects in a certain order such as ascending or descending for a sequence of numbers, alphabetically for words. Sorting work is often applied in Informatics applications with purposes such as: sorting data in computers for convenient searching, arranging processing results to print out on reports . To solve this problem for Vietnamese and Ede languages in the Vietnamese-Ede vocabulary database [3], [4], The paper proposes a solution to arrange items in the Vietnamese-Ede vocabulary database, the content of the solution is as follows:
-
First, encode Vietnamese and Ede letters into a continuum to allow string comparison in programming applications.
-
Moves the encoded entries into the array. Sort the array in alphabetical order.
-
Pass the index of the array after being sorted into the sorted index property in the datastore corresponding to the item decoded from the array.
Thus, later when performing sorting of items in the data warehouse instead of sorting by item attributes, we perform sorting by the sort index attribute.
-
-
METHOD OF ENCODING VIETNAMESE AND EDE LETTERS
-
Encoding Vietnamese letters
Each Vietnamese letter will be mapped into a continuous region in the Unicode en-coding. Areas selected for mapping range from 1F00:1F5E. The basis for choosing this region is because this is a continuous area containing characters and characters in this area do not appear in Vietnamese documents. Table I is a table that maps Viet-namese letters in alphabetical order to the extended Greek character area in the Unicode encoding.
For example, the entry from the school is encrypted
.
TABLE I. MAPPING VIETNAMESE LETTERS INTO THE EXTENDED
Vietnamese characters
a
Ã
ã
á
â
b
c
d
e
è
é
ê
f
g
h
i
ì
Ã
j
k
l
m
n
o
ò
õ
ó
ô
p
q
r
s
t
u
ù
ú
v
w
x
y
ý
z
Extended greek character area
1F00
1F01
1F02
1F03
1F04
1F05
1F06
1F07
1F08
1F09
1F0A
1F0B
1F0C
1F0D
1F0E
1F0F
1F10
1F11
1F12
1F13
1F14
1F15
1F16
1F17
1F18
1F19
1F1A
1F1B
1F1C
1F1D
1F1E
1F1F
1F20
1F21
F22
1F23
1F24
1F25
1F26
1F27
1F28
1F29
1F2A
1F2B
1F2C
1F2D
1F2E
1F2F
1F30
1F31
1F32
1F33
1F34
1F35
1F36
1F37
1F38
1F39
1F3A
1F3B
1F3C
1F3D
1F3E
1F3F
1F40
1F41
1F42
1F43
1F44
1F45
1F46
1F47
1F48
1F49
F4A
1F4B
1F4C
1F4D
1F4E
1F4F
1F50
1F51
1F52
1F53
1F54
1F55
1F56
1F57
1F58
1F59
1F5A
1F5B
1F5C
GREEK CHARACTER AREA
w
y
-
Ede language encoding
Extended greek character area
1F00
1F01
1F02
1F03
1F04
1F05
1F06
1F07
1F08
1F09
1F0A
1F0B
1F0C
1F0D
1F0E
1F0F
1F10
1F11
1F12
1F13
1F14
1F15
1F16
1F17
1F18
1F19
1F1A
1F1B
1F1C
1F1D
1F1E
1F1F
1F20
1F21
F22
1F23
1F24
1F25
The Ede alphabet is also classified into the Latin family, with 76 Ede characters including uppercase and lowercase characters as shown in Table II [Error! Reference source not found.], [[6]]. Of which 68 characters are the basic components of almost all Unicode f
included in the Unicode encoding. [Error! Reference source not found.].
TABLE II. EDE ALPHABET
Uppercase
Lowercase
Where:
Each letter of the Ede language is also mapped into a continuous region in the Unicode encoding. The area selected for mapping has a range from1F00:1F25. The basis for choosing this region is because it is a continuous region containing characters and characters in this region do not appear in Ede documents.
Unlike Vietnamese letters, Ede letters must be converted to a combination code with two characters before being encoded, so that they can be considered as one character when sorted. The rules for converting letters ( to a character are shown in Table III.
TABLE III. REGULATIONS TO CONVERT THE LETTER EDE IN THE FORM OF A COMBINATION CODE INTO 1 CHARACTER.
Consonant
Uppercase
B
D
G
H
J
K
L
M
N
Ñ
P
R
S
T
W
Y
Lowercase
b
d
g
h
j
k
l
m
n
ñ
p
r
s
t
w
y
Vowel
Uppercase
A
Â
E
Ê
I
O
Ô
U
Lowercase
a
â
e
ê
i
o
ô
u
-
-
EXPERIMENTAL SORTING OF WORD ITEMS IN THE LEXICAL DATABASE
In order to arrange the items in alphabetical order, we experiment with four basic sorting methods: bubble sort; insertion sort; sort select; quick sort [5], to select the sorting method used to sort the items in the lexicon. Based on the execution time after the experiments, we choose the sorting method to include the solution of sorting items in the Vietnamese-Ede vocabulary database. Experimental results on 4 samples, with 10 times per sample, according to 4 sorting methods, are shown in Table V. The details of the experiments are shown in Table VI.
Through the results of the tests on Vietnamese and Ede samples in Table 5, this is the basis for the paper to choose the quick sort method as the sorting method for the array containing the items after being encoded.
pattern
Number of experiments
Execution time ( second)
Bubble sort
Sort select directly
Insert sort
Quick sort
9.297 Ede entries
10
0:0:02.820
0:0:01.479
0:0:00.657
0:0:00.106
17.968 Ede
entries
10
0:0:09.477
0:0:04.315
0:0:04.240
0:0:0.188
11.358
Vietnamese entries
10
0:0:02.290
0:0:02.286
0:0:00.268
0:0:00.265
34.375
Vietnamese entries
10
0:1:14.227
0:0:02.286
0:0:13.450
0:0:00.760
TABLE V. TEST RESULTS BY 4 SORTING METHODS
Ede letter with 2 characters
Alternative character
!
@
#
$
TABLE VI. DETAILS OF ATTEMPTS WITH 4 SORTING METHODS
Pattern
Number of tries
Execution time ( second)
Bubble sort
Sort select directly
Insert sort
Quick sort
9.297
Ede entries
1
0:0:02.952
0:0:01.492
0:0:00.603
0:0:00.100
2
0:0:02.961
0:0:01.510
0:0:00.664
0:0:00.099
3
0:0:02.783
0:0:01.500
0:0:00.595
0:0:00.103
4
0:0:02.901
0:0:01.479
0:0:00.624
0:0:00.111
5
0:0:02.696
0:0:01.495
0:0:00.631
0:0:00.110
6
0:0:02.705
0:0:01.450
0:0:00.587
0:0:00.104
7
0:0:02.670
0:0:01.540
0:0:00.715
0:0:00.099
8
0:0:03.008
0:0:01.483
0:0:00.703
0:0:00.111
9
0:0:02.725
0:0:01.423
0:0:00.723
0:0:00.110
The mapping of Ede letters and corresponding conversion characters to the extended Greek character area is shown in Table IV.
TABLE IV. MAPPING THE LETTER EDE INTO THE EXTENDED GREEK CHARACTER AREA
EDE LANGUAGE CHARACTERS
a
â
b
d
e
ê
!
g
h
i
j
k
l
m
n
ñ
o
ô
@
#
p
r
s
t
u
$
10
0:0:02.804
0:0:01.414
0:0:00.730
0:0:00.117
Average
0:0:02.820
0:0:01.479
0:0:00.657
0:0:00.106
17.968
Ede entries
1
0:0:09.925
0:0:04.484
0:0:04.829
0:0:0.162
2
0:0:08.757
0:0:04.420
0:0:04.807
0:0:0.163
3
0:0:08.539
0:0:04.699
0:0:03.490
0:0:0.207
4
0:0:09.811
0:0:05.045
0:0:03.802
0:0:0.196
5
0:0:09.371
0:0:03.874
0:0:03.725
0:0:0.165
6
0:0:10.452
0:0:03.900
0:0:04.463
0:0:0.162
7
0:0:09.145
0:0:04.124
0:0:04.845
0:0:0.199
8
0:0:09.067
0:0:03.889
0:0:04.876
0:0:0.197
9
0:0:10.217
0:0:04.405
0:0:03.741
0:0:0.230
10
0:0:09.487
0:0:04.318
0:0:03.829
0:0:0.205
Average
0:0:02.290
0:0:02.286
0:0:00.268
0:0:00.265
11.358
Vietna_ mese entries
1
0:0:02.046
0:0:01.920
0:0:00.239
0:0:0.340
2
0:0:02.028
0:0:02.091
0:0:00.247
0:0:0.250
3
0:0:02.511
0:0:02.300
0:0:00.309
0:0:0.225
4
0:0:02.542
0:0:02.165
0:0:00.330
0:0:0.240
5
0:0:01.918
0:0:01.991
0:0:00.235
0:0:0.234
6
0:0:02.090
0:0:02.741
0:0:00.257
0:0:0.286
7
0:0:02.418
0:0:02.314
0:0:00.235
0:0:0.220
8
0:0:02.433
0:0:02.223
0:0:00.343
0:0:0.350
9
0:0:02.345
0:0:02.870
0:0:00.252
0:0:0.241
10
0:0:02.576
0:0:02.240
0:0:00.232
0:0:0.267
34.375
Vietna_ mese entries
1
0:1:14.166
0:0:18.910
0:0:12.168
0:0:0.795
2
0:1:13.985
0:0:19.425
0:0:13.462
0:0:0.686
3
0:1:14.374
0:0:17.841
0:0:14.679
0:0:0.826
4
0:1:13.956
0:0:19.410
0:0:14.835
0:0:0.748
5
0:1:14.126
0:0:21.091
0:0:12.963
0:0:0.795
6
0:1:14.212
0:0:17.862
0:0:12.651
0:0:0.875
7
0:1:14.028
0:0:20.420
0:0:13.806
0:0:0.842
8
0:1:13.825
0:0:18.798
0:0:13.868
0:0:0.592
9
0:1:15.006
0:0:18.688
0:0:12.731
0:0:0.717
10
0:1:14.589
0:0:19.983
0:0:13.338
0:0:0.733
Average
0:1:14.227
0:0:02.286
0:0:13.450
0:0:00.760
Fig 1. Result of executing sort command with Vietnamese entries
-
EXPERIMENTAL RESULTS
Currently, if with the Order by clause of the query statement, the results are sorted in alphabetical order of Vietnamese string attributes. With the accented alphabetic
l i ul f lp b i l arrangement in Vietnamese. The result when using the Order by clause in the SQL statement is shown in Figures 1 and 2. Figure 1 shows the results when executing the query Select Viet From VIET Order by Viet. Figure 2 shows the results when executing the query Select Viet From VIET Order by CS_SX. The CS_SX attribute is an added attribute according to the above solution for sorting items in the Viet-Ede datastore.
Fig 2. Result of executing the command to sort Vietnamese entries with the sort index when coded
With Ede letters, the same situation is encountered in Vietnamese. In addition, Ede language also has a case of handling letters that are combined in the form of a combination code. The results when using the Order by clause in the SQL statement are shown in Figures 3 and 4. Figure 3 shows the result when executing the query Select Ede From EDE Order by Ede. Figure 4 shows the result when executing the query Select Ede From EDE Order by CS_SX. The CS_SX attribute is an added attribute according to the above solution for sorting entries in the Viet-Ede datastore.
Fig 3. The result of executing the sort command with Ede entries
alphabetical order in the data query statement with data sorting.
In the next orientation, the paper will apply this solution to integrate into applica-tions that edit tables such as Winword, Excel to arrange columns or rows in Ede data tables.
Fig 4. The result of executing the command to sort items from the Ede entries with the sorting index when encrypted
-
CONCLUSION
The solution to sort items in the Vietnamese-Ede biingual vocabulary database has been sorted on the attribute containing Vietnamese entries and Ede entries. The results are sorted according to the Vietnamese and Ede alphabetical order when using the Order by clause in the SQL query statement in the Viet-Ede datastore.
This solution contributes to solving the problem of arranging Vietnamese entries and Ede entries in the Vietnamese-Ede bilingual vocabulary database in
REFERENCES
[1] Doan Van Phuc: Ede phonetics, Social science HÃ Ni, 1996. [2] Hoang Thi My Le, Vilavong Souksan, Phan Huy Khanh: Using Unicode in Encoding the Vietnamese Ethnic Minority Languages, Applying for the EDe Language, Proceeding of the International Conference on Knowledge and System Engineering, KSE 2013, HaNoi, pp. 137-148, 2013. [3] Hoang Thi My Le Phan Huy Khanh: The solution to build a Vietnamese-Ede bilingual vocabulary database based on the Vietnamese-Ede interaction model, No 5 (2), pp. 3640, 2017.
[4] Le Hoang Thi My, Khanh Phan Huy: Deploying environment for processing Ede ethnic minority language in Vietnam, IEEE International Conference on System Science and Engineering (ICSSE), 2017.. [5] Robert Sedgewich: Algorithm, NXBKH & KT, 2003, https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-840.pdf [6] Department of Education and Training DakLak: Ede Grammar, Education publisher, 2011.