Publications

Detailed Information

missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data

DC Field Value Language
dc.contributor.authorJin, Heejin-
dc.contributor.authorJung, Surin-
dc.contributor.authorWon, Sungho-
dc.date.accessioned2022-09-30T05:58:41Z-
dc.date.available2022-09-30T05:58:41Z-
dc.date.created2022-06-07-
dc.date.issued2022-06-
dc.identifier.citationGenes & Genomics, Vol.44 No.6, pp.651-658-
dc.identifier.issn1976-9571-
dc.identifier.urihttps://hdl.handle.net/10371/185068-
dc.description.abstractBackground Missing data are a common problem in large-scale datasets and its appropriate handling is crucial for data analyses. Missingness can be categorized as (1) missing completely at random (MCAR), (2) missing at random (MAR), and (3) missing not at random (MNAR). Different missingness mechanisms require different imputation strategies. Multiple imputation, an approach for averaging outcomes across multiple imputed data, is more suitable than single imputation for dealing with various missing mechanisms. missForest, a nonparametric missing value imputation strategy using random forest, is one of the most prevalent multiple imputation methods for missing-data because it can be applied to mixed-type data and does not require distributional assumptions. However, a recent study found that missForest can produce biased results for non-normal data. In addition, missForest is computationally expensive. Objective Therefore, we aimed to further develop the missForest algorithm by combining a binary particle swarm optimization (BPSO)-based feature-selection strategy. Methods The BPSO is an evolutionary algorithm that is well known for global optimization and computational efficiency. By using the BPSO-based feature selection step prior to imputing missing values with missForest, the imputation accuracy for continuous variables could be increased by pruning redundant variables. Results In this study, missForest with BPSO (BPSOmf) showed better imputation accuracy than missForest alone with respect to continuous variables by feature selection prior to the imputation step. Conclusions BPSOmf is an appropriate and robust method when the imputation target data consist mainly of continuous variables.-
dc.language영어-
dc.publisher한국유전학회-
dc.titlemissForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data-
dc.typeArticle-
dc.identifier.doi10.1007/s13258-022-01247-8-
dc.citation.journaltitleGenes & Genomics-
dc.identifier.wosid000781221300001-
dc.identifier.scopusid2-s2.0-85127655061-
dc.citation.endpage658-
dc.citation.number6-
dc.citation.startpage651-
dc.citation.volume44-
dc.description.isOpenAccessN-
dc.contributor.affiliatedAuthorWon, Sungho-
dc.type.docTypeArticle-
dc.description.journalClass1-
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share