Affiliations: Department of Computer Science and Engineering,
University of Kalyani, Kalyani, India | Department of Computer Science and Engineering,
Jadavpur University, Kolkata, India | Machine Intelligence Unit, Indian Statistical
Institute, Kolkata, India
Note: [] Corresponding author: Anirban Mukhopadhyay, Department of
Computer Science and Engineering, University of Kalyani, Kalyani 741235, India.
E-mail: [email protected]
Abstract: Microarray technology facilitates the monitoring of the expression
levels of thousands of genes over different experimental conditions
simultaneously. Clustering is a popular data mining tool which can be applied
to microarray gene expression data to identify co-expressed genes. Most of the
traditional clustering methods optimize a single clustering goodness criterion
and thus may not be capable of performing well on all kinds of datasets.
Motivated by this, in this article, a multiobjective clustering technique that
optimizes cluster compactness and separation simultaneously, has been improved
through a novel support vector machine classification based cluster ensemble
method. The superiority of MOCSVMEN (MultiObjective Clustering with Support
Vector Machine based ENsemble) has been established by comparing its
performance with that of several well known existing microarray data clustering
algorithms. Two real-life benchmark gene expression datasets have been used for
testing the comparative performances of different algorithms. A recently
developed metric, called Biological Homogeneity Index (BHI), which computes the
clustering goodness with respect to functional annotation, has been used for
the comparison purpose.
Keywords: Multiobjective clustering, support vector machine, cluster ensemble, microarray gene expression data