PyGenStability module#

PyGenStability code to solve generalized Markov Stability including Markov stability.

The generalized Markov Stability is of the form

\[Q_{gen}(t,H) = \mathrm{Tr} \left [H^T \left (F(t)-\sum_{k=1}^m v_{2k-1} v_{2k}^T\right)H\right]\]

where \(F(t)\) is the quality matrix and \(v_k\) are null model vectors. The choice of the quality matrix and null model vectors are arbitrary in the generalized Markov Stability setting, and can be parametrised via built-in constructors, or specified by the user via the constructor module.

pygenstability.pygenstability.run(graph=None, constructor='linearized', min_scale=-2.0, max_scale=0.5, n_scale=20, log_scale=True, scales=None, n_tries=100, with_all_tries=False, with_NVI=True, n_NVI=20, with_postprocessing=True, with_ttprime=True, with_spectral_gap=False, exp_comp_mode='spectral', result_file='results.pkl', n_workers=4, tqdm_disable=False, with_optimal_scales=True, optimal_scales_kwargs=None, method='louvain', constructor_kwargs=None)[source]#

This is the main function to compute graph clustering across scales with Markov Stability.

This function needs a graph object as an adjacency matrix encoded with scipy.csgraph. The default settings will provide a fast and generic run with linearized Markov Stability, which corresponds to modularity with a scale parameter. Other built-in constructors are available to perform Markov Stability with matrix exponential computations. Custom constructors can be added via the constructor module. Additional parameters can be used to set the range and number of scales, number of trials for generalized Markov Stability optimisation, with Louvain or Leiden algorithm.

Parameters:
  • graph (scipy.csgraph) – graph to cluster, if None, the constructor cannot be a str

  • constructor (str/function) – name of the generalized Markov Stability constructor, or custom constructor function. It must have two arguments, graph and scale.

  • min_scale (float) – minimum Markov scale

  • max_scale (float) – maximum Markov scale

  • n_scale (int) – number of scale steps

  • log_scale (bool) – use linear or log scales for scales

  • scales (array) – custom scale vector, if provided, it will override the other scale arguments

  • n_tries (int) – number of generalized Markov Stability optimisation evaluations

  • with_all_tries (bools) – store all partitions with stability values found in different optimisation evaluations

  • with_NVI (bool) – compute NVI(t) between generalized Markov Stability optimisations at each scale t

  • n_NVI (int) – number of randomly chosen generalized Markov Stability optimisations to estimate NVI

  • with_postprocessing (bool) – apply the final postprocessing step

  • with_ttprime (bool) – compute the NVI(t,tprime) matrix to compare scales t and tprime

  • with_spectral_gap (bool) – normalise scale by spectral gap

  • exp_comp_mode (str) – mode to compute matrix exponential, can be expm or spectral

  • result_file (str) – path to the result file

  • n_workers (int) – number of workers for multiprocessing

  • tqdm_disable (bool) – disable progress bars

  • with_optimal_scales (bool) – apply optimal scale selection algorithm

  • optimal_scales_kwargs (dict) – kwargs to pass to optimal scale selection, see optimal_scale module.

  • method (str) – optimiation method, louvain or leiden

  • constructor_kwargs (dict) – additional kwargs to pass to constructor prepare method

Returns:

Results dict with the following entries
  • ’run_params’: dict with parameters used to run the code

  • ’scales’: scales of the scan

  • ’number_of_communities’: number of communities at each scale

  • ’stability’: value of stability cost function at each scale

  • ’community_id’: community node labels at each scale

  • ’all_tries’: all community node labels with stability values found in different

    optimisation evaluations at each scale (included if with_all_tries==True)

  • ’NVI’: NVI(t) at each scale

  • ’ttprime’: NVI(t,tprime) matrix

  • ’block_nvi’: block NVI curve (included if with_optimal_scales==True)

  • ’selected_partitions’: selected partitions (included if with_optimal_scales==True)

pygenstability.pygenstability.evaluate_NVI(index_pair, partitions)[source]#

Evaluations of Normalized Variation of Information (NVI).

NVI is defined for two partitions \(p1\) and \(p2\) as:

\[NVI = \frac{E(p1) + E(p2) - 2MI(p1, p2)}{JE(p1,p2)}\]

where \(E\) is the entropy, \(JE\) the joint entropy and \(MI\) the mutual information.

Parameters:
  • index_pair (list) – list of two indices to select pairs of partitions

  • partitions (list) – list of partitions

Returns:

float, Normalized Variation Information