abstract |
The feature selection and cluster sampling integrated binary classification method for unbalanced data provided by the present invention, the steps include: deleting incomplete data, noise data and unusable data in the collected data set to obtain a training set D; Based on the improved RELIEF-F method to achieve feature selection, the feature weight set W={w(1),...,w(j),...,w(J)} is obtained; the data in the training set D is Clustering, divide the training set D into K clusters; construct a balanced training data set to obtain K balanced sub-training sets D 1 ,...,D K ; train K respectively for D 1 ,...,D K After the arrival of new data, the recognition results are obtained through K trained base classifiers, and then the category of the test sample is determined by the voting method and the principle of minority obeying the majority. The feature selection and cluster sampling integrated binary classification method for unbalanced data can effectively improve the classification accuracy of unbalanced data sets. |