Skip to Main content Skip to Navigation

Sparse and discriminative clustering for complex data. An application to cytology.

Abstract : The main topics of this manuscript are sparsity and discrimination for modeling complex data. In a first part, we focus on the GMM context: we introduce a new family of probabilistic models which both clusters and finds a discriminative subspace chosen such as it best discriminates the groups. A family of 12 Discriminative Latent Mixture (DLM) models is introduced and is based on three ideas: firstly, the actual data live in a latent subspace with an intrinsic dimension lower than the dimension of the observed space; secondly, a subspace of K-1 dimensions is theoretically sufficient to discriminate K groups; thirdly, the observation space and the latent one are linked by a linear transformation. An estimation procedure, named Fisher-EM is proposed and improves, most of the time, clustering performances owing to the use of a discriminative subspace. As each axis, spanning the discriminative subspace, is a linear combination of all original variables, we therefore proposed 3 different methods based on a penalized criterion in order to ease the interpretation results. In particular, it allows to introduce sparsity directly in the loadings of the projection matrix which enables also to make variable selection for clustering. In a second part, we focus on the seriation context. We propose a dissimilarity measure based on a common neighborhood which allows to deal with noisy data and overlapping groups. A forward stepwise seriation algorithm, called the PB-Clus algorithm, is introduced and allows to obtain a block representation form of the data. This tool enables to reveal the intrinsic structure of data even in the case of noisy data, outliers, overlapping and non-Gaussian groups. Both methods have been validated on a biological application based on the cancer cell detection.
Document type :
Complete list of metadata
Contributor : Camille Brunet Connect in order to contact the contributor
Submitted on : Friday, February 17, 2012 - 11:21:49 AM
Last modification on : Friday, October 23, 2020 - 4:37:10 PM
Long-term archiving on: : Thursday, November 22, 2012 - 1:00:08 PM


  • HAL Id : tel-00671333, version 1



Camille Brunet. Sparse and discriminative clustering for complex data. An application to cytology.. Applications [stat.AP]. Université d'Evry-Val d'Essonne, 2011. English. ⟨tel-00671333⟩



Record views


Files downloads