abstract |
The invention discloses an improved SMOTE resampling method aimed at unbalanced data classification. This method first uses the K-Means method to cluster the minority class samples in the sample set, and deletes the noise sample class whose centroid of each cluster is closest to the majority class samples after clustering, and then uses the KNN method in each cluster to Class clusters are divided into three classes and noise sample classes are removed. Finally, a random number is input in each cluster, and a sample set is selected for SMOTE method oversampling according to the proportion relationship between the random number and the sample set type in the cluster. Compared with the traditional SMOTE method, the improved K-Means-SMOTE method proposed by the present invention significantly improves the effect in predicting the complaint model of the Internet TV set-top box user. |