RE-CLUSTERING DOCUMENTS TO ENHANCE SEARCH ACCURACY WITH IMBALANCED ABBREVIATION DATA

Re-Clustering Documents to Enhance Search Accuracy with Imbalanced Abbreviation Data

Re-Clustering Documents to Enhance Search Accuracy with Imbalanced Abbreviation Data

Blog Article

Abbreviation ambiguity poses significant challenges t jazelle bracelet when searching academic literature.This study evaluated the accuracy of clustering algorithms on imbalanced datasets with varying ratios of target groups.A corpus consisting of 1052 papers focused on the study of abbreviations.The "MSA" dataset was clustered using TF-IDF, cosine similarity, and k-means.Clustering performance declined as the ratios in the target group deviated from balanced thresholds.

A re-clustering method was introduced, involving the selective exclusion of non-target clusters.Re-clustering improved accuracy and F1 scores in macallan hip flask most scenarios, demonstrating particular stability with higher cluster counts.The re-clustering performance of comparisons was stronger when compared to k-means and self-adaptive methods.The study highlights issues stemming from data imbalance and presents an effective strategy for enhancing abbreviation search efficiency.

Report this page