DataClustering module#

Class for multiscale graph-based data clustering.

This class provides an interface for multiscale graph-based data clustering [1] with PyGenStability.

param metric:

The distance metric to use. The distance function can be ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

type metric:

str or function, default=’euclidean’

param graph_method:

Method to construct graph from sample-by-feature matrix:

‘knn-mst’ will use k-Nearest Neighbor graph combined with Miniumus Spanning Tree.
‘cknn-mst’ will use Continunous k-Nearest Neighbor graph [2] combined with
Miniumus Spanning Tree.
‘precomputed’ assumes that data is already provided as adjacency matrix of a
sparse graph.

type graph_method:

{‘knn-mst’, ‘cknn-mst’, ‘precomputed’}, default=’cknn-mst’

param k:

Number of neighbors considered in graph construction. This parameter is expected to be positive.

type k:

int, default=5

param delta:

Density parameter for Continunous k-Nearest Neighbor graph. This parameter is expected to be positive.

type delta:

float, default=1.0

param distance_threshold:

Optional thresholding of distance matrix.

type distance_threshold:

float, optional

param pgs_kwargs:

Parameters for PyGenStability, see documentation. Some possible arguments:

constructor (str/function): name of the generalized Markov Stability constructor,
or custom constructor function. It must have two arguments, graph and scale.
min_scale (float): minimum Markov scale
max_scale (float): maximum Markov scale
n_scale (int): number of scale steps
with_spectral_gap (bool): normalise scale by spectral gap

type pgs_kwargs:

dict, optional

pygenstability.DataClustering.adjacency_#

Sparse adjacency matrix of constructed graph.

Type:: sparse matrix of shape (n_samples, n_samples)

pygenstability.DataClustering.results_#

PyGenStability results dictionary, see documentation for all arguments.

Type:: dict

pygenstability.DataClustering.labels_#

List of robust partitions identified with optimal scale selection.

Type:: list of ndarray

References