Visualization algorithm based on FDR control testing for dimension reduction of textual data

Pyo, Sung Inn; Ahn, Soohyun; Kwon, Soon Sun

DC Field	Value	Language
dc.contributor.author	Pyo, Sung Inn	-
dc.contributor.author	Ahn, Soohyun	-
dc.contributor.author	Kwon, Soon Sun	-
dc.date.issued	2025-04-11	-
dc.identifier.issn	2514-9318	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/38236	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105002314320&origin=inward	-
dc.description.abstract	Purpose: Visualizing relations of textual data requires dimension reduction to increase the interpretability of output. However, traditional dimension reduction methods have some limitations, such as the loss of feature information during extraction or projection in dimension reduction and uncertain results due to the mixture of word labels. In this study, we develop the textual data visualization algorithm using statistical methods to present statistical inferences on the data. We also construct the algorithm in a way that the user can analyze textual data easily. Design/methodology/approach: Unstructured data, such as textual data, is sensitive to choosing analysis methods. In addition, textual data is generally large-sized and sparse. Considering such characteristics, we applied latent Dirichlet allocation to separate data to minimize the loss of information, and false discover rate (FDR) control to reduce dimension in a statistical way. Findings: The relation of textual data can be derived in a one-click way, and the output can be interpreted without background information, with separated topics. Originality/value: The algorithm is constructed based on the Korean language. However, any language can be used without linguistic information. This study can be an example of usage and flow, which using not well-known dimension reduction methods can replace traditional methods.	-
dc.description.sponsorship	Funding: This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1A6A1A10044950 and NO.4299990414389, Ajou mathematical sciences team for future leaders).	-
dc.language.iso	eng	-
dc.publisher	Emerald Publishing	-
dc.subject.mesh	Dimension reduction	-
dc.subject.mesh	False discovery rate	-
dc.subject.mesh	False discovery rate control	-
dc.subject.mesh	Korean text data analyze	-
dc.subject.mesh	Rate controls	-
dc.subject.mesh	Semantics networks	-
dc.subject.mesh	Text data	-
dc.subject.mesh	Text-mining	-
dc.subject.mesh	Textual data	-
dc.subject.mesh	Visualization algorithms	-
dc.title	Visualization algorithm based on FDR control testing for dimension reduction of textual data	-
dc.type	Article	-
dc.citation.endPage	361	-
dc.citation.number	2	-
dc.citation.startPage	338	-
dc.citation.title	Data Technologies and Applications	-
dc.citation.volume	59	-
dc.identifier.bibliographicCitation	Data Technologies and Applications, Vol.59 No.2, pp.338-361	-
dc.identifier.doi	10.1108/dta-04-2024-0373	-
dc.identifier.scopusid	2-s2.0-105002314320	-
dc.identifier.url	https://www.emeraldinsight.com/loi/dta	-
dc.subject.keyword	Dimension reduction	-
dc.subject.keyword	False discovery rate control	-
dc.subject.keyword	Korean text data analysis	-
dc.subject.keyword	Semantic network	-
dc.subject.keyword	Text mining	-
dc.subject.keyword	Visualization	-
dc.type.other	Article	-
dc.identifier.pissn	25149288	-
dc.description.isoa	false	-
dc.subject.subarea	Information Systems	-
dc.subject.subarea	Library and Information Sciences	-

Show simple item record

qrcode

트윗하기

Related Researcher

Ahn, Soohyun안수현: Department of Mathematics

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download