Ajou University repository

Efficient imputation of missing data using the information of local space defined by the geometric one-class classifier
Citations

SCOPUS

3

Citation Export

Publication Year
2024-05-15
Publisher
Elsevier Ltd
Citation
Expert Systems with Applications, Vol.242
Keyword
Composite fuzzy modelHyper-rectangleImputationLocal spaceMissing dataOne-class classifier
Mesh Keyword
Actual systemComposite fuzzy modelFuzzy modelingHyperrectanglesImputationImputation methodsLocal spacesMissing dataOne-class classifierOverfitting
All Science Classification Codes (ASJC)
Engineering (all)Computer Science ApplicationsArtificial Intelligence
Abstract
Datasets gathered from actual systems may include missing data owing to unintentional faults, such as the breakdown of equipment as well as intentional reasons such as sampling inspection. Because missing data can result in incorrect and distorted results when analyzed, they should be addressed before the analysis is performed. Imputation of missing data involves replacing missing entries of data with values calculated from observed features, which is a more reasonable alternative than simple methods, including a complete case analysis. Although various imputation methods exist for missing data, most ignore the local space around it, which may be closely related to missing values. Furthermore, the imputation method, which can partially reflect local relationships, is susceptible to overfitting and has parameter tuning issues owing to the lack of a systematic definition of the local space. Thus, we propose a composite fuzzy hyper-rectangle (H-RTGL) imputation (CFHRI) method with the following characteristics: (i) it defines the local space using an H-RTGL-based one-class classifier to thoroughly describe the data of the target class, and (ii) it imputes the missing entries using a fuzzy model comprising imputation models calculated from H-RTGLs. These features enable CFHRI to formulate the local space adjacent to missing data systematically and alleviate the hazards of overfitting into a certain region of the dataset. We validated our method based on numerical experiments conducted using a dataset gathered from an actual system and comparison of the imputation performance of our method with that of other imputation methods. CFHRI showed statistically significant improvement in 5 datasets among 7 datasets used, and around 10% enhanced in terms of Mean Absolute Error (MAE). Moreover, we could achieve 3–5% of increased classification accuracy of imputed dataset, which indicates CFHRI can be a useful pre-processor of dataset whose purpose is classification.
ISSN
0957-4174
Language
eng
URI
https://dspace.ajou.ac.kr/dev/handle/2018.oak/33835
DOI
https://doi.org/10.1016/j.eswa.2023.122775
Fulltext

Type
Article
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2017R1A2B4009841).
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Choi, Jin Young Image
Choi, Jin Young최진영
Department of Industrial Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.