Abstract
The notion of defining a cluster as a component in a mixture model was put forth by Tiedeman in 1955; since then, the use of mixture models for clustering has grown into an important subfield of classification. Considering the volume of work within this field over the past decade, which seems equal to all of that which went before, a review of work to date is timely. First, the definition of a cluster is discussed and some historical context for model-based clustering is provided. Then, starting with Gaussian mixtures, the evolution of model-based clustering is traced, from the famous paper by Wolfe in 1965 to work that is currently available only in preprint form. This review ends with a look ahead to the next decade or so.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
AITKEN, A.C. (1926), “A Series Formula for the Roots of Algebraic and Transcendental Equations”, Proceedings of the Royal Society of Edinburgh, 45, 14–22.
AITKIN, M., and WILSON, G.T. (1980), “Mixture Models, Outliers, and the EM Algorithm”, Technometrics, 22(3), 325–331.
ANDERLUCCI, L., and VIROLI, C. (2015), “Covariance Pattern Mixture Models for Multivariate Longitudinal Data”, The Annals of Applied Statistics, 9(2), 777–800.
ANDREWS, J.L., and MCNICHOLAS, P.D. (2011a), “Extending Mixtures of Multivariate t-Factor Analyzers”, Statistics and Computing, 21(3), 361–373.
ANDREWS, J.L., and MCNICHOLAS, P.D. (2011b), “Mixtures of Modified t-Factor Analyzers for Model-Based Clustering, Classification, and Discriminant Analysis”, Journal of Statistical Planning and Inference, 141(4), 1479–1486.
ANDREWS, J.L., and MCNICHOLAS, P.D. (2012), “Model-Based Clustering, Classification, and Discriminant Analysis Via Mixtures of Multivariate t-Distributions: The tEIGEN Family”, Statistics and Computing, 22(5), 1021–1029.
ANDREWS, J.L., and MCNICHOLAS, P.D. (2013), vscc: Variable Selection for Clustering and Classification, R Package Version 0.2.
ANDREWS, J.L., and MCNICHOLAS, P.D. (2014), “Variable Selection for Clustering and Classification”, Journal of Classification, 31(2), 136–153.
ANDREWS, J.L., MCNICHOLAS, P.D., and SUBEDI, S. (2011), “Model-Based Classification Via Mixtures of Multivariate t-Distributions”, Computational Statistics and Data Analysis, 55(1), 520–529.
ANDREWS, J.L.,WICKINS, J.R., BOERS, N.M., and MCNICHOLAS, P.D. (2015), teigen: Model-Based Clustering and Classification with the Multivariate t Distribution, R Package Version 2.1.0.
ATTIAS, H. (2000), “A Variational Bayesian Framework for Graphical Models”, in Advances in Neural Information Processing Systems, Volume 12, MIT Press, pp. 209–215.
AZZALINI, A., BROWNE, R.P., GENTON, M.G., and MCNICHOLAS, P.D. (2016), “On Nomenclature for, and the Relative Merits of, Two Formulations of Skew Distributions”, Statistics and Probability Letters, 110, 201–206.
AZZALINI, A., and CAPITANIO, A. (1999), “Statistical Applications of the Multivariate Skew Normal Distribution”, Journal of the Royal Statistical Society: Series B, 61(3), 579–602.
AZZALINI, A., and CAPITANIO, A. (2003), “Distributions Generated by Perturbation of Symmetry with Emphasis on a Multivariate Skew t Distribution”, Journal of the Royal Statistical Society: Series B, 65(2), 367–389.
AZZALINI, A. (2014), The Skew-Normal and Related Families, with the collaboration of A. Capitanio, IMS monographs, Cambridge: Cambridge University Press.
AZZALINI, A., and VALLE, A.D. (1996), “The Multivariate Skew-Normal Distribution”, Biometrika / 83, 715–726.
BAEK, J., and MCLACHLAN, G.J. (2008), “Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data”, Technical Report NI08018-SCH, Preprint Series of the Isaac Newton Institute for Mathematical Sciences, Cambridge.
BAEK, J., and MCLACHLAN, G.J. (2011), “Mixtures of Common t-Factor Analyzers for Clustering High-Dimensional Microarray Data”, Bioinformatics, 27, 1269–1276.
BAEK, J., MCLACHLAN, G.J., and FLACK, L.K. (2010), “Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1298–1309.
BANFIELD, J.D., and RAFTERY, A.E. (1993), “Model-Based Gaussian and Non-Gaussian Clustering”, Biometrics, 49(3), 803–821.
BARNDORFF-NIELSEN,O.E. (1997), “Normal Inverse Gaussian Distributions and Stochastic Volatility Modelling”, Scandinavian Journal of Statistics, 24(1), 1–13.
BARTLETT,M.S. (1953), “Factor Analysis in Psychology as a Statistician Sees It”, in Uppsala Symposium on Psychological Factor Analysis, Number 3 in Nordisk Psykologi’s Monograph Series, Copenhagen: Ejnar Mundsgaards, pp. 23–34.
BAUDRY, J.-P. (2015), “Estimation and Model Selection for Model-Based Clustering with the Conditional Classification Likelihood”, Electronic Journal of Statistics, 9, 1041–1077.
BAUM, L.E., PETRIE, T., SOULES, G., and WEISS, N. (1970), “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains”, Annals of Mathematical Statistics, 41, 164–171.
BDIRI, T., BOUGUILA, N., and ZIOU, D. (2016), “Variational Bayesian Inference for Infinite Generalized Inverted Dirichlet Mixtures with Feature Selection and Its Application to Clustering”, Applied Intelligence, 44(3), 507–525.
BENSMAIL, H., CELEUX, G., RAFTERY, A.E., and ROBERT, C.P. (1997), “Inference in Model-Based Cluster Analysis”, Statistics and Computing, 7(1), 1–10.
BHATTACHARYA, S., and MCNICHOLAS, P.D. (2014), “A LASSO-Penalized BIC for Mixture Model Selection”, Advances in Data Analysis and Classification, 8(1), 45–61.
BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2000), “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719–725.
BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2003), “Choosing Starting Values for the EM Algorithm for Getting the Highest Likelihood in Multivariate Gaussian Mixture Models”, Computational Statistics and Data Analysis, 41, 561–575.
BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2010), “Exact and Monte Carlo Calculations of Integrated Likelihoods for the Latent Class Model”, Journal of Statistical Planning and Inference, 140(11), 2991–3002.
BIERNACKI, C., CELEUX, G., GOVAERT, G., and LANGROGNET, F. (2006), “Model-Based Cluster and Discriminant Analysis with the MIXMOD Software”, Computational Statistics and Data Analysis, 51(2), 587–600.
BOUVEYRON, C., and BRUNET-SAUMARD, C. (2014), “Model-Based Clustering of High-Dimensional Data: A Review”, Computational Statistics and Data Analysis, 71, 52–78.
BOUVEYRON, C., CELEUX, G., and Girard, S. (2011), “Intrinsic Dimension Estimation by Maximum Likelihood in Isotropic Probabilistic PCA”, Pattern Recognition Letters, 32(14), 1706–1713.
BOUVEYRON, C., GIRARD, S., and SCHMID, C. (2007a), “High-Dimensional Data Clustering”, Computational Statistics and Data Analysis, 52(1), 502–519.
BOUVEYRON, C., GIRARD, S., and SCHMID, C. (2007b), “High Dimensional Discriminant Analysis”, Communications in Statistics – Theory and Methods, 36(14), 2607–2623.
BRANCO, M.D., and DEY, D.K. (2001), “A General Class of Multivariate Skew-Elliptical Distributions”, Journal of Multivariate Analysis, 79, 99–113.
BROWNE, R.P., and MCNICHOLAS, P.D. (2012), “Model-Based Clustering and Classification of Data with Mixed Type”, Journal of Statistical Planning and Inference, 142(11), 2976–2984.
BROWNE, R.P., and MCNICHOLAS, P.D. (2014a), “Estimating Common Principal Components in High Dimensions”, Advances in Data Analysis and Classification, 8(2), 217–226.
BROWNE, R.P., and MCNICHOLAS, P.D. (2014b), mixture: Mixture Models for Clustering and Classification, R Package Version 1.1.
BROWNE, R.P., and P. D. MCNICHOLAS, P.D. (2014c), “Orthogonal Stiefel Manifold Optimization for Eigen-Decomposed Covariance Parameter Estimation in Mixture Models”, Statistics and Computing, 24(2), 203–210.
BROWNE, R.P., and MCNICHOLAS, P.D. (2015), “A Mixture of Generalized Hyperbolic Distributions”, Canadian Journal of Statistics, 43(2), 176–198.
BROWNE, R.P., MCNICHOLAS, P.D., and SPARLING, M.D. (2012), “Model-Based Learning Using a Mixture of Mixtures of Gaussian and Uniform Distributions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 814–817.
CAGNONE, S., and VIROLI, C. (2012), “A Factor Mixture AnalysisModel for Multivariate Binary Data”, Statistical Modelling, 12(3), 257–277.
CAMPBELL, N.A. (1984), “Mixture Models and Atypical Values”, Mathematical Geology, 16(5), 465–477.
CARVALHO, C., CHANG, J., LUCAS, J., NEVINS, J., WANG, Q., and WEST, M. (2008), “High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics”, Journal of the American Statistical Association, 103(484), 1438–1456.
CATELL, R.B. (1949), “‘R’ and Other Coefficients of Pattern Similarity”, Psychometrika, 14, 279–298.
CELEUX, G., and GOVAERT, G. (1991), “Clustering Criteria for Discrete Data and Latent Class Models”, Journal of Classification, 8(2), 157–176.
CELEUX, G., and GOVAERT, G. (1995), “Gaussian Parsimonious Clustering Models”, Pattern Recognition, 28(5), 781–793.
CORDUNEANU, A., and BISHOP, C.M. (2001), “Variational Bayesian Model Selection for Mixture Distributions”, in Artificial Intelligence and Statistics, Los Altos, CA: Morgan Kaufmann, pp. 27–34.
CORETTO, P., and HENNIG, C. (2015), “Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison with Other Methods for Robust Gaussian Clustering”, arXiv preprint arXiv:1405.1299v3.
CORMACK, R.M. (1971), “A Review of Classification (With Discussion)”, Journal of the Royal Statistical Society: Series A, 34, 321–367.
DANG, U.J., BROWNE, R.P., and MCNICHOLAS, P.D. (2015), “Mixtures of Multivariate Power Exponential Distributions”, Biometrics, 71(4), 1081–1089.
DASGUPTA, A., and RAFTERY, A.E. (1998), “Detecting Features in Spatial Point Processes with Clutter ViaModel-Based Clustering”, Journal of the American Statistical Association, 93, 294–302.
DAY, N.E. (1969), “Estimating the Components of a Mixture of Normal Distributions”, Biometrika, 56, 463–474.
DE LA CRUZ-MESÍA, R., QUINTANA, R.A., and MARSHALL, G. (2008), “Model-Based Clustering for Longitudinal data”, Computational Statistics and Data Analysis, 52(3), 1441–1457.
DE VEAUX, R.D., and KRIEGER, A.M. (1990), “Robust Estimation of a Normal Mixture”, Statistics and Probability Letters, 10(1), 1–7.
DEAN, N., RAFTERY, A.E., and SCRUCCA, L. (2012), clustvarsel: Variable Selection for Model-Based Clustering, R package version 2.0.
DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data Via the EM Algorithm”, Journal of the Royal Statistical Society: Series B, 39(1), 1–38.
DI LASCIO, F.M.L., and GIANNERINI, S. (2012), “A Copula-Based Algorithm for Discovering Patterns of Dependent Observations”, Journal of Classification, 29(1), 50–75.
EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21, 362–375.
EVERITT, B.S., and HAND, D.J. (1981), Finite Mixture Distributions, Monographs on Applied Probability and Statistics, London: Chapman and Hall.
EVERITT, B.S., LANDAU, S., LEESE, M., and STAHL, D. (2011), Cluster Analysis (5th ed.), Chichester: John Wiley & Sons.
FABRIGAR, L.R., WEGENER, D.T., MACCALLUM, R.C., and STRAHAN, E.J. (1999), “Evaluating the Use of Exploratory Factor Analysis in Psychological Research”, Psychological Methods, 4(3), 272–299.
FLURY, B. (1988), Common Principal Components and Related Multivariate Models, New York: Wiley.
FRALEY, C., and RAFTERY, A.E. (1998), “How Many Clusters? Which Clustering Methods? Answers Via Model-Based Cluster Analysis”, The Computer Journal, 41(8), 578–588.
FRALEY, C., and RAFTERY, A.E. (1999), “MCLUST: Software for Model-Based Cluster Analysis”, Journal of Classification, 16, 297–306.
FRALEY, C., and RAFTERY, A.E. (2002a), “MCLUST: Software for Model-Based Clustering, Density Estimation, and Discriminant Analysis”, Technical Report 415, University of Washington, Department of Statistics.
FRALEY, C., and RAFTERY, A.E. (2002b), “Model-Based Clustering, Discriminant Analysis, and Density Estimation”, Journal of the American Statistical Association, 97(458), 611–631.
FRANCZAK, B.C., BROWNE, R.P., and MCNICHOLAS, P.D. (2014), “Mixtures of Shifted Asymmetric Laplace Distributions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.
FRIEDMAN, H.P., and RUBIN, J. (1967), “On Some Invariant Criteria for Grouping Data”, Journal of the American Statistical Association, 62, 1159–1178.
FRITZ, H., GARCÍA-ESCUDERO, L.A., and MAYO-ISCAR, A. (2012), “tclust: An R Package for a Trimming Approach to Cluster Analysis”, Journal of Statistical Software, 47(12), 1–26.
FRÜHWIRTH-SCHNATTER, S. (2006), Finite Mixture and Markov Switching Models, New York: Springer-Verlag.
GALIMBERTI, G., MONTANARI, A., and VIROLI, C. (2009), “Penalized Factor Mixture Analysis for Variable Selection in Clustered Data”, Computational Statistics and Data Analysis, 53, 4301–4310.
GARCÍA-ESCUDERO, L.A.,GORDALIZA,A., MATRN, C., andMAYO-ISCAR,A. (2008), “A General Trimming Approach to Robust Cluster Analysis”, The Annals of Statistics, 36(3), 1324–1345.
GERSHENFELD, N. (1997), “Nonlinear Inference and Cluster-Weighted Modeling”, Annals of the New York Academy of Sciences, 808(1), 18–24.
GHAHRAMANI, Z., and HINTON, G.E. (1997), “The EM Algorithm for Factor Analyzers”, Technical Report CRG-TR-96-1, University of Toronto, Toronto, Canada.
GOLLINI, I., and MURPHY, T.B. (2014), “Mixture of Latent Trait Analyzers for Model-Based Clustering of Categorical Data”, Statistics and Computing, 24(4), 569–588.
GÓMEZ, E., GÓMEZ-VIILEGAS, M.A., and MARIN, J.M. (1998), “A Multivariate Generalization of the Power Exponential Family of Distributions”, Communications in Statistics – Theory and Methods, 27(3), 589–600.
GÓMEZ-SÁ NCHEZ-MANZANO, E., GÓMEZ-VILLEGAS, M.A., and Marín, J.M. (2008), “Multivariate Exponential Power Distributions as Mixtures of Normal Distributions with Bayesian Applications”, Communications in Statistics – Theory and Methods, 37(6), 972–985.
GOODMAN, L. (1974), “Exploratory Latent Structure Analysis Using Both Identifiable and Unidentifiable Models”, Biometrika, 61(2), 215–231.
GORDON, A.D. (1981), Classification, London: Chapman and Hall.
GRESELIN, F., and INGRASSIA, S. (2010), “Constrained Monotone EM Algorithms for Mixtures of Multivariate t-Distributions”, Statistics and Computing, 20(1), 9–22.
HATHAWAY, R.J. (1985), “A Constrained Formulation of Maximum Likelihood Estimation for Normal Mixture Distributions”, The Annals of Statistics, 13(2), 795–800.
HEISER, W.J. (1995), “Recent Advances in Descriptive Multivariate Analysis”, in Convergent Computation by Iterative Majorization: Theory and Applications in Multidimensional Data Analysis, ed. W.J. Krzanowski, Oxford: Oxford University Press, pp. 157–189.
HENNIG, C. (2000), “Identifiablity of Models for Clusterwise Linear Regression”, Journal of Classification, 17(2), 273–296.
HENNIG, C. (2004), “Breakdown Points for Maximum Likelihood Estimators of Location-Scale Mixtures”, The Annals of Statistics, 32(4), 1313–1340.
HENNIG, C. (2015), “What are the True Clusters?”, Pattern Recognition Letters, 64, 53–62.
HORN, J.L. (1965), “A Rationale and Technique for Estimating the Number of Factors in Factor Analysis”, Psychometrika, 30, 179–185.
HU, W. (2005), Calibration of Multivariate Generalized Hyperbolic Distributions Using the EM Algorithm, with Applications in Risk Management, Portfolio Optimization and Portfolio Credit Risk, Ph. D. thesis, The Florida State University, Tallahassee.
HUBER, P.J. (1964), “Robust Estimation of a Location Parameter”, The Annals of Mathematical Statistics, 35, 73–101.
HUBER, P.J. (1981), Robust Statistics, New York: Wiley.
HUMBERT, S., SUBEDI, S., COHN, J., ZENG, B., BI, Y.-M., CHEN, X., ZHU, T., MCNICHOLAS, P.D., and ROTHSTEIN, S.J. (2013), “Genome-Wide Expression Profiling of Maize in Response to Individual and Combined Water and Nitrogen Stresses”, BMC Genetics, 14(3).
HUMPHREYS, L.G., and ILGEN, D.R. (1969), “Note on a Criterion for the Number of Common Factors”, Educational and Psychological Measurements, 29, 571–578.
HUMPHREYS, L.G., and MONTANELLI, R.G. JR. (1975), “An Investigation of the Parallel Analysis Criterion for Determining the Number of Common Factors”, Multivariate Behavioral Research, 10, 193–205.
INGRASSIA, S., MINOTTI, S.C., and PUNZO, A. (2014), “Model-Based Clustering Via Linear Cluster-Weighted Models”, Computational Statistics and Data Analysis, 71, 159–182.
INGRASSIA, S., MINOTTI, S.C., PUNZO, A., and VITTADINI, G. (2015), “The Generalized Linear Mixed Cluster-Weighted Model”, Journal of Classification, 32(1), 85–113.
INGRASSIA, S., MINOTTI, S.C., and VITTADINI, G. (2012), “Local Statistical Modeling Via the Cluster-Weighted Approach with Elliptical Distributions”, Journal of Classification, 29(3), 363–401.
INGRASSIA, S., and PUNZO, A. (2015), “Decision Boundaries for Mixtures of Regressions”, Journal of the Korean Statistical Society, 44(2), 295–306.
JAAKKOLA, T.S., and JORDAN, M.I. (2000), “Bayesian Parameter Estimation Via Variational Methods”, Statistics and Computing, 10(1), 25–37.
JAIN, S., and NEAL, R.M. (2004), “A Split-Merge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model”, Journal of Computational and Graphical Statistics, 13(1), 158–182.
JAJUGA, K., and PAPLA, D. (2006), “Copula Functions in Model Based Clustering”, in From Data and Information Analysis to Knowledge Engineering, Studies in Classification, Data Analysis, and Knowledge Organization, eds. M. Spiliopoulou, R. Kruse, C. Borgelt, A.N¨urnberger, and W. Gaul, Berlin, Heidelberg: Springer, pp. 603–613.
JORDAN, M.I., ZGHAHRAMANI, Z., JAAKKOLA, T.S., and SAUL, L.K. (1999), “An Introduction to Variational Methods for Graphical Models”, Machine Learning, 37, 183–233.
JÖRESKOG, K.G. (1990), “New Developments in LISREL: Analysis of Ordinal Variables Using Polychoric Correlations and Weighted Least Squares”, Quality and Quantity, 24(4), 387–404.
KARLIS, D., and SANTOURIAN, A. (2009), “Model-Based Clustering with Non-Elliptically Contoured Distributions”, Statistics and Computing, 19(1), 73–83.
KASS, R.E., and RAFTERY, A.E. (1995), “Bayes Factors”, Journal of the American Statistical Association, 90(430), 773–795.
KERIBIN, C. (2000), “Consistent Estimation of the Order of Mixture Models”, Sankhyā. The Indian Journal of Statistics. Series A, 62(1), 49–66.
KHARIN, Y. (1996), Robustness in Statistical Pattern Recognition, Dordrecht: Kluwer.
KOSMIDIS, I., and KARLIS, D. (2015), “Model-Based Clustering Using Copulas with Applications”, arXiv preprint arXiv:1404.4077v5.
KOTZ, S., KOZUBOWSKI, T.J., and PODGORSKI, K. (2001), The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance (1st ed.), Boston: Burkhäuser.
LAWLEY, D.N., and MAXWELL, A.E. (1962), “Factor Analysis as a Statistical Method”, Journal of the Royal Statistical Society: Series D, 12(3), 209–229.
LEE, S., and MCLACHLAN, G.J. (2011), “On the Fitting of Mixtures of Multivariate Skew t-distributions Via the EM Algorithm”, arXiv:1109.4706.
LEE, S., and MCLACHLAN, G.J.(2014), “Finite Mixtures of Multivariate Skew t-Distributions: Some Recent and New Results”, Statistics and Computing, 24, 181–202.
LEE, S.X., and MCLACHLAN, G.J. (2013a), “Model-Based Clustering and Classification with Non-Normal Mixture Distributions”, Statistical Methods and Applications, 22(4), 427–454.
LEE, S.X., and MCLACHLAN, G.J. (2013b), “On Mixtures of Skew Normal and Skew t-Distributions”, Advances in Data Analysis and Classification, 7(3), 241–266.
LEISCH, F. (2004), “Flexmix: A General Framework For Finite Mixture Models And Latent Class Regression in R”, Journal of Statistical Software, 11(8), 1–18.
LEROUX, B.G. (1992), “Consistent Estimation of a Mixing Distribution”, The Annals of Statistics, 20(3), 1350–1360.
LI, J. (2005), “Clustering Based on a Multi-Layer Mixture Model”, Journal of Computational and Graphical Statistics, 14(3), 547–568.
LI, K.C. (1991), “Sliced Inverse Regression for Dimension Reduction (With Discussion)”, Journal of the American Statistical Association, 86, 316–342.
LI, K.C. (2000), “High Dimensional Data Analysis Via the SIR/PHD Approach”, Unpublished manuscript.
LIN, T.-I. (2009), “Maximum Likelihood Estimation for Multivariate Skew Normal Mixture Models”, Journal of Multivariate Analysis, 100, 257–265.
LIN, T.-I. (2010), “Robust Mixture Modeling Using Multivariate Skew t Distributions”, Statistics and Computing, 20(3), 343–356.
LIN, T.-I., MCLACHLAN, G.J., and LEE, S.X. (2016), “Extending Mixtures of Factor Models Using the Restricted Multivariate Skew-Normal Distribution”, Journal of Multivariate Analysis, 143, 398–413.
LIN, T.-I., MCNicholas, P.D., and HSIU, J.H. (2014), “Capturing Patterns Via Parsimonious t Mixture Models”, Statistics and Probability Letters, 88, 80–87.
LOPES, H.F., and WEST, M. (2004), “Bayesian Model Assessment in Factor Analysis”, Statistica Sinica, 14, 41–67.
MARBAC, M., BIERNACKI, C., and VANDEWALLE, V. (2014), “Finite Mixture Model of Conditional Dependencies Modes to Cluster Categorical Data”, arXiv preprint arXiv:1402.5103.
MARBAC, M., BIERNACKI, C., and VANDEWALLE, V. (2015), “Model-Based Clustering of Gaussian Copulas for Mixed Data”, arXiv preprint arXiv:1405.1299v3.
MARKATOU, M. (2000), “Mixture Models, Robustness, and the Weighted Likelihood Methodology”, Biometrics, 56(2), 483–486.
MAUGIS, C. (2009), “The Selvarclust Software”, www.math.univ-toulouse.fr/~maugis/SelvarClustHomepage.html.
MAUGIS, C., CELEUX, G., and MARTIN-MAGNIETTE, M.-L. (2009a), “Variable Selection for Clustering with Gaussian Mixture Models”, Biometrics, 65(3), 701–709.
MAUGIS, C., CELEUX, G., and MARTIN-MAGNIETTE, M.-L. (2009b), “Variable Selection in Model-Based Clustering: A General Variable Role Modeling”, Computational Statistics and Data Analysis, 53(11), 3872–3882.
MCGRORY, C., and TITTERINGTON, D. (2007), “Variational Approximations in Bayesian Model Selection for Finite Mixture Distributions”, Computational Statistics and Data Analysis, 51(11), 5352–5367.
MCLACHLAN, G.J., and BASFORD, K.E. (1988), Mixture Models: Inference and Applications to Clustering, New York: Marcel Dekker Inc.
MCLACHLAN, G.J., BEAN, R.W., and JONES, L.B.-T. (2007), “Extension of the Mixture of Factor Analyzers Model to Incorporate the Multivariate t-Distribution”, Computational Statistics and Data Analysis, 51(11), 5327–5338.
MCLACHLAN, G.J., and KRISHNAN, T. (2008), The EM Algorithm and Extensions (2nd ed.), New York: Wiley.
MCLACHLAN, G.J., and PEEL, D. (1998), “Robust Cluster Analysis Via Mixtures of Multivariate t-Distributions”, in Lecture Notes in Computer Science, Volume 1451, Berlin: Springer-Verlag, pp. 658–666.
MCLACHLAN, G.J., and PEEL, D. (2000a), Finite Mixture Models, New York: John Wiley & Sons.
MCLACHLAN, G.J., and PEEL, D. (2000b), “Mixtures of Factor Analyzers”, in Proceedings of the Seventh International Conference on Machine Learning, San Francisco, Morgan Kaufmann, pp. 599–606.
MCNEIL, A.J., FREY, R., and EMBRECHTS, P. (2005), Quantitative Risk Management: Concepts, Techniques and Tools., Princeton: Princeton University Press.
MCNICHOLAS, P.D. (2013), “Model-Based Clustering and Classification Via Mixtures of Multivariate t-Distributions”, in Statistical Models for Data Analysis, Studies in Classification, Data Analysis, and Knowledge Organization, eds. P. Giudici, S. Ingrassia, and M. Vichi, Switzerland: Springer International Publishing.
MCNICHOLAS, P.D. (2016), Mixture Model-Based Classification, Boca Raton FL: Chapman & Hall/CRC Press.
MCNICHOLAS, P.D., and BROWNE, R.P. (2013), “Discussion of ‘How to Find an Appropriate Clustering for Mixed-Type Variables with Application to Socio-Economic Stratification’ by Hennig and Liao”, Journal of the Royal Statistical Society: Series C, 62(3), 352–353.
MCNICHOLAS, P.D., ELSHERBINY, A., MCDAID, A.F., and MURPHY, T.B. (2015), pgmm: Parsimonious Gaussian Mixture Models, R Package Version 1.2.
MCNICHOLAS, P.D., JAMPANI, K.R., and SUBEDI, S. (2015), longclust: Model-Based Clustering and Classification for Longitudinal Data, R Package Version 1.2.
MCNICHOLAS, P.D., and MURPHY, T.B. (2005), “Parsimonious Gaussian Mixture Models”, Technical Report 05/11, Department of Statistics, Trinity College Dublin, Dublin, Ireland.
MCNICHOLAS, P.D., and MURPHY, T.B. (2008), “Parsimonious Gaussian Mixture Models”, Statistics and Computing, 18(3), 285–296.
MCNICHOLAS, P.D., and MURPHY, T.B. (2010a), “Model-Based Clustering of Longitudinal Data”, Canadian Journal of Statistics, 38(1), 153–168.
MCNICHOLAS, P.D., and MURPHY, T.B. (2010b), “Model-Based Clustering of Microarray Expression Data Via Latent Gaussian Mixture Models”, Bioinformatics, 26(21), 2705–2712.
MCNICHOLAS, P.D., and SUBEDI, S. (2012), “Clustering Gene Expression Time Course Data Using Mixtures of Multivariate t-Distributions”, Journal of Statistical Planning and Inference, 142(5), 1114–1127.
MCNICHOLAS, S.M., MCNICHOLAS, P.D., and BROWNE, R.P. (2014), “Mixtures of Variance-Gamma Distributions”, arxiv preprint arXiv:1309.2695v2.
MCPARLAND, D., GORMLEY, I.C., MCCORMICK, T.H., CLARK, S.J., KABUDULA, C.W., and COLLINSON, M.A. (2014), “Clustering South African Households Based on Their Asset Status Using Latent Variable Models”, The Annals of Applied Statistics, 8(2), 747–776.
MCQUITTY, L.L. (1956), “Agreement Analysis: A Method of Classifying Subjects According to Their Patterns of Responses”, British Journal of Statistical Psychology, 9, 5–16.
MELNYKOV, V. (2016), “Model-Based Biclustering of Clickstream Data”, Computational Statistics and Data Analysis, 93, 31–45.
MENG, X.-L., and RUBIN, D.B. (1993), “Maximum Likelihood Estimation Via the ECM Algorithm: A General Framework”, Biometrika, 80, 267–278.
MENG, X.-L., and VAN DYK, D. (1997), “The EM Algorithm—An Old Folk Song Sung to a Fast New Tune (With Discussion)”, Journal of the Royal Statistical Society: Series B, 59(3), 511–567.
MONTANARI, A., and VIROLI, C. (2010a), “Heteroscedastic Factor Mixture Analysis”, Statistical Modelling, 10(4), 441–460.
MONTANARI, A., and VIROLI, C. (2010b), “A Skew-Normal Factor Model for the Analysis of Student Satisfaction Towards University Courses”, Journal of Applied Statistics, 43, 473–487.
MONTANARI, A., and VIROLI, C. (2011), “Maximum Likelihood Estimation of Mixture of Factor Analyzers”, Computational Statistics and Data Analysis, 55, 2712–2723.
MONTANELLI, R.G., JR., and HUMPHREYS, L.G. (1976), “Latent Roots of Random Data Correlation Matrices with Squared Multiple Correlations on the Diagonal: A Monte Carlo Study”, Psychometrika, 41, 341–348.
MORRIS, K., and MCNICHOLAS, P.D. (2013), “Dimension Reduction for Model-Based Clustering ViaMixtures of Shifted Asymmetric Laplace Distributions”, Statistics and Probability Letters, 83(9), 2088–2093, Erratum 2014, 85,168.
MORRIS, K., and MCNICHOLAS, P.D. (2016), “Clustering, Classification, Discriminant Analysis, and Dimension Reduction Via Generalized Hyperbolic Mixtures”, Computational Statistics and Data Analysis, 97, 133–150.
MORRIS, K., MCNICHOLAS, P.D., and SCRUCCA, L. (2013), “Dimension Reduction for Model-Based Clustering Via Mixtures of Multivariate t-Distributions”, Advances in Data Analysis and Classification, 7(3), 321–338.
MURRAY, P.M., BROWNE, R.B., and MCNICHOLAS, P.D. (2014a), “Mixtures of Skew-t Factor Analyzers”, Computational Statistics and Data Analysis, 77, 326–335.
MURRAY, P.M., MCNICHOLAS, P.D., and BROWNE, R.B. (2014b), “A Mixture of Common Skew-t Factor Analyzers”, Stat, 3(1), 68–82.
MUTHEN, B., and ASPAROUHOV, T. (2006), “Item Response Mixture Modeling: Application to Tobacco Dependence Criteria”, Addictive Behaviors, 31, 1050–1066.
O’HAGAN, A., MURPHY, T.B., GORMLEY, I.C., MCNICHOLAS, P.D., and KARLIS, D. (2016), “Clustering with the Multivariate Normal Inverse Gaussian Distribution”, Computational Statistics and Data Analysis, 93, 18–30.
ORCHARD, T., and WOODBURY, M.A. (1972), “A Missing Information Principle: Theory and Applications”, in Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Theory of Statistics, eds. L.M. Le Cam, J. Neyman, and E.L. Scott, Berkeley: University of California Press, pp. 697–715.
PAN, J., and MACKENZIE, G. (2003), “On Modelling Mean-Covariance Structures in Longitudinal Studies”, Biometrika, 90(1), 239–244.
PEARSON, K. (1894), “Contributions to the Mathematical Theory of Evolution”, Philosophical Transactions of the Royal Society, Part A, 185, 71–110.
PEEL, D., and MCLACHLAN, G.J. (2000), “Robust Mixture Modelling Using the t Distribution”, Statistics and Computing, 10(4), 339–348.
POURAHMADI, M. (1999), “Joint Mean-Covariance Models with Applications to Longitudinal Data: Unconstrained Parameterisation”, Biometrika, 86(3), 677–690.
POURAHMADI, M. (2000), “Maximum Likelihood Estimation of Generalised Linear Models for Multivariate Normal Covariance Matrix”, Biometrika, 87(2), 425–435.
POURAHMADI, M., DANIELS, M., and PARK, T. (2007), “Simultaneous Modelling of the Cholesky Decomposition of Several Covariance Matrices”, Journal of Multivariate Analysis, 98, 568–587.
PUNZO, A. (2014), “Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model”, Statistical Modelling, 14(3), 257–291.
PUNZO, A., and INGRASSIA, S. (2015a), “Clustering Bivariate Mixed-Type Data Via the Cluster-Weighted Model”, Computational Statistics. To appear.
PUNZO, A., and INGRASSIA, S. (2015b), “Parsimonious Generalized Linear Gaussian Cluster-Weighted Models”, in, Advances in Statistical Models for Data Analysis, Studies in Classification, Data Analysis and Knowledge Organization, Switzerland, eds. I. Morlini, T. Minerva, and M. Vichi, Springer International Publishing, pp. 201–209.
PUNZO, A., and MCNICHOLAS, P.D. (2014a), “Robust Clustering in Regression Analysis Via the Contaminated Gaussian Cluster-Weighted Model”, arXiv preprint arXiv:1409.6019v1.
PUNZO, A., and MCNICHOLAS, P.D. (2014b), “Robust High-Dimensional Modeling with the Contaminated Gaussian Distribution”, arXiv preprint arXiv:1408.2128v1.
PUNZO, A., and MCNICHOLAS, P.D. (2016), “Parsimonious Mixtures of Multivariate Contaminated Normal Distributions”, Biometrical Journal. To appear.
R CORE TEAM (2015), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
RAFTERY, A.E. (1995), “Bayesian Model Selection in Social Research (With Discussion)”, Sociological Methodology, 25, 111–193.
RAFTERY, A.E., and DEAN, N. (2006), “Variable Selection for Model-Based Clustering”, Journal of the American Statistical Association, 101(473), 168–178.
RANALLI, M., and ROCCI, R. (2016),“Mixture Methods for Ordinal Data: A Pairwise Likelihood Approach”, Statistics and Computing, 26(1), 529–547.
RAO, C.R. (1952), Advanced Statistical Methods in Biometric Research, New York: John Wiley and Sons, Inc.
RAU, A., MAUGIS-RABUSSEAU, C., MARTIN-MAGNIETTE, M.-L, and CELEUX, G. (2015), “Co-expression Analysis of High-Throughput Transcriptome Sequencing Data with Poisson Mixture Models”, Bioinformatics, 31(9), 1420–1427.
SAHU, K., DEY, D.K., and BRANCO, M.D. (2003), “A New Class of Multivariate Skew Distributions with Applications to Bayesian Regression Models”, Canadian Journal of Statistics, 31(2), 129–150. Corrigendum: Vol. 37 (2009), 301–302.
SCHÖNER, B. (2000), Probabilistic Characterization and Synthesis of Complex Data Driven Systems, Ph. D. thesis, Cambridge MA: MIT.
SCHROETER, P., VESIN, J., LANGENBERGER, T., and MEULI, R. (1998), “Robust Parameter Estimation of Intensity Distributions for BrainMagnetic Resonance Images”, IEEE Transactions on Medical Imaging, 17(2), 172–186.
SCHWARZ, G. (1978), “Estimating the Dimension of a Model”, The Annals of Statistics, 6(2), 461–464.
SCOTT, A.J., and SYMONS, M.J. (1971), “Clustering Methods Based on Likelihood Ratio Criteria”, Biometrics, 27, 387–397.
SCRUCCA, L. (2010), “Dimension Reduction for Model-Based Clustering”, Statistics and Computing, 20(4), 471–484.
SCRUCCA, L. (2014), “Graphical Tools for Model-Based Mixture Discriminant Analysis”, Advances in Data Analysis and Classification, 8(2), 147–165.
SHIREMAN, E., STEINLEY, D., and BRUSCO, M.J. (2015), “Examining the Effect of Initialization Strategies on the Performance of Gaussian Mixture Modeling”, Behavior Research Methods.
SPEARMAN, C. (1904), “The Proof and Measurement of Association Between Two Things”, American Journal of Psychology, 15, 72–101.
SPEARMAN, C. (1927), The Abilities of Man: Their Nature and Measurement, London: MacMillan and Co., Limited.
STEANE, M.A., MCNICHOLAS, P.D., and YADA, R. (2012), “Model-Based Classification Via Mixtures of Multivariate t-Factor Analyzers”, Communications in Statistics – Simulation and Computation, 41(4), 510–523.
STEELE, R.J., and RAFTERY, A.E. (2010), “Performance of Bayesian Model Selection Criteria for Gaussian Mixture Models”, in Frontiers of Statistical Decision Making and Bayesian Analysis, Vol, 2, New York: Springer, pp. 113–130.
STEPHENSEN, W. (1953), The Study of Behavior, Chicago: University of Chicago Press.
SUBEDI, S., and MCNICHOLAS, P.D. (2014), “Variational Bayes Approximations for Clustering Via Mixtures of Normal Inverse Gaussian Distributions”, Advances in Data Analysis and Classification, 8(2), 167–193.
SUBEDI, S., and MCNICHOLAS, P.D. (2016), “A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting”, arXiv preprint arXiv:1306.5368v2.
SUBEDI, S., PUNZO, A., INGRASSIA, S., and MCNICHOLAS, P.D. (2013), “Clustering and Classification Via Cluster-Weighted Factor Analyzers”, Advances in Data Analysis and Classification, 7(1), 5–40.
SUBEDI, S., PUNZO, A., INGRASSIA, S., and MCNICHOLAS, P.D. (2015), “Cluster-Weighted t-Factor Analyzers for Robust Model-Based Clustering and Dimension Reduction”, Statistical Methods and Applications, 24(4), 623–649.
SUNDBERG, R. (1974), “Maximum Likelihood Theory for Incomplete Data from an Exponential Family”, Scandinavian Journal of Statistics, 1(2), 49–58.
TANG, Y., BROWNE, R.P., and MCNICHOLAS, P.D. (2015), “Model-Based Clustering of High-Dimensional Binary Data”, Computational Statistics and Data Analysis, 87, 84–101.
TESCHENDORFF, A., WANG, Y., BARBOSA-MORAIS, J., BRENTON, N., and CALDAS, C. (2005), “A Variational Bayesian Mixture Modelling Framework for Cluster Analysis of Gene-Expression Data”, Bioinformatics, 21(13), 3025–3033.
TIEDEMAN, D.V. (1955), “On the Study of Types”, in Symposium on Pattern Analysis, ed. S.B. Sells, Randolph Field, Texas: Air University, U.S.A.F. School of Aviation Medicine, pp. 1–14.
TIPPING, M.E. (1999), “Probabilistic Visualization of High-Dimensional Binary Data”, Advances in Neural Information Processing Systems (11), 592–598.
TIPPING, M.E., and BISHOP, C.M. (1997), “Mixtures of Probabilistic Principal Component Analysers”, Technical Report NCRG/97/003, Aston University (Neural Computing Research Group), Birmingham, UK.
TIPPING, M.E., and BISHOP, C.M. (1999), “Mixtures of Probabilistic Principal Component Analysers”, Neural Computation, 11(2), 443–482.
TITTERINGTON, D.M., SMITH, A.F.M, and MAKOV, U.E. (1985), Statistical Analysis of Finite Mixture Distributions, Chichester: John Wiley & Sons.
TORTORA, C., MCNICHOLAS, P.D., and BROWNE, R.P. (2015), “A Mixture of Generalized Hyperbolic Factor Analyzers”, Advances in Data Analysis and Classification. To appear.
TRYON, R.C. (1939), Cluster Analysis, Ann Arbor: Edwards Brothers.
TRYON, R.C. (1955), “Identification of Social Areas by Cluster Analysis”, in University of California Publications in Psychology, Volume 8, Berkeley: University of California Press.
VERMUNT, J.K. (2003), “Multilevel Latent Class Models”, Sociological Methodology, 33(1), 213–239.
VERMUNT, J.K. (2007), “Multilevel Mixture Item Response Theory Models: An Application in Education Testing”, in Proceedings of the 56th Session of the International Statistical Institute, Lisbon, Portugal, pp. 22–28.
VIROLI, C. (2010), “Dimensionally Reduced Model-Based Clustering Through Mixtures of Factor Mixture Analyzers”, Journal of Classification, 27(3), 363–388.
VRAC, M., BILLARD, L., DIDAY, E., and CHEDIN, A. (2012), “Copula Analysis of Mixture Models”, Computational Statistics, 27(3), 427–457.
VRBIK, I., and MCNICHOLAS, P.D. (2012), “Analytic Calculations for the EM Algorithm for Multivariate Skew-t Mixture Models”, Statistics and Probability Letters, 82(6), 1169–1174.
VRBIK, I., and MCNICHOLAS, P.D. (2014), “Parsimonious Skew Mixture Models for Model-Based Clustering and Classification”, Computational Statistics and Data Analysis, 71, 196–210.
VRBIK, I., and MCNICHOLAS, P.D. (2015), “Fractionally-Supervised Classification”, Journal of Classification, 32(3), 359–381.
WANG, Q., CARVALHO, C., LUCAS, J., and WEST, M. (2007), “BFRM: Bayesian Factor Regression Modelling”, Bulletin of the International Society for Bayesian Analysis, 14(2), 4–5.
WATERHOUSE, S., MACKAY, D., and ROBINSON, T. (1996), “Bayesian Methods for Mixture of Experts”, in Advances in Neural Information Processing Systems, Vol. 8. Cambridge, MA: MIT Press.
WEI, Y., and MCNICHOLAS, P.D. (2015), “Mixture Model Averaging for Clustering”, Advances in Data Analysis and Classification, 9(2), 197–217.
WEST, M. (2003), “Bayesian Factor Regression Models in the ‘Large p, Small n’ Paradigm”, in Bayesian Statistics, Volume 7, eds. J.M. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, Oxford: Oxford University Press, pp. 723–732.
WOLFE, J.H. (1963), “Object Cluster Analysis of Social Areas”, Master’s thesis, University of California, Berkeley.
WOLFE, J.H. (1965), “A Computer Program for the Maximum Likelihood Analysis of Types”, Technical Bulletin 65–15, U.S. Naval Personnel Research Activity.
WOLFE, J.H. (1970), “Pattern Clustering by Multivariate Mixture Analysis”, Multivariate Behavioral Research, 5, 329–350.
YOSHIDA, R., HIGUCHI, T., and IMOTO, S. (2004), “A Mixed Factors Model for Dimension Reduction and Extraction of a Group Structure in Gene Expression Data”, in Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, pp. 161–172.
YOSHIDA, R., HIGUCHI, T., IMOTO, S., and MIYANO, S. (2006), “ArrayCluster: An Analytic Tool for Clustering, Data Visualization and Module Finder on Gene Expression Profiles”, Bioinformatics, 22, 1538–1539.
ZHOU, H., and LANGE, K.L. (2010), “On the Bumpy Road to the Dominant Mode”, Scandinavian Journal of Statistics, 37(4), 612–631.
Author information
Authors and Affiliations
Corresponding author
Additional information
Model-based clustering. The author is grateful to Chapman & Hall/CRC Press for allowing some text and figures from his monograph (McNicholas 2016) to be used in this review paper. The author is thankful for the helpful comments of an anonymous reviewer and the Editor. The work is partly supported by the Canada Research Chairs program
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
McNicholas, P.D. Model-Based Clustering. J Classif 33, 331–373 (2016). https://doi.org/10.1007/s00357-016-9211-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-016-9211-9