pygpcca.GPCCA

class pygpcca.GPCCA(P, eta=None, z='LM', method='brandts')[source]

G-PCCA spectral clustering method with optimized memberships [Reuter18], [Reuter19].

Clusters the dominant m Schur vectors of a transition matrix.

This algorithm generates a fuzzy clustering such that the resulting membership functions are as crisp (characteristic) as possible.

Parameters:
  • P (Union[ndarray[Any, dtype[Any]], spmatrix]) – The transition matrix (row-stochastic).

  • eta (Optional[ndarray[Any, dtype[Any]]]) –

    The input probability distribution of the (micro)states. In theory eta can be an arbitrary distribution as long as it is a valid probability distribution (i.e., sums up to 1). A neutral and valid choice would be the uniform distribution (default).

    In case of a reversible transition matrix, the stationary distribution can (but don’t has to) be used here. In case of a non-reversible P, some initial or average distribution of the states might be chosen instead of the uniform distribution.

    Vector of shape (n,) which sums to 1. If None, uniform distribution is used.

  • z (Literal['LM', 'LR']) –

    Specifies which portion of the eigenvalue spectrum of P is to be sought. The returned invariant subspace of P will be associated with this part of the spectrum. Valid options are:

    • ’LM’: largest magnitude (default).

    • ’LR’: largest real part.

  • method (str) –

    Which method to use to determine the invariant subspace. Valid options are:

    • ’brandts’: perform a full Schur decomposition of P utilizing scipy.linalg.schur() (without the intrinsic sorting option, since it is flawed) and sort the returned Schur form R and Schur vector matrix Q afterwards using a routine published by Brandts [Brandts02]. This is well tested and thus the default method, although it is also the slowest choice.

    • ’krylov’: calculate an orthonormal basis of the subspace associated with the m dominant eigenvalues of P using the Krylov-Schur method as implemented in SLEPc. This is the fastest choice and especially suitable for very large P, but it is still experimental.

    See the installation instructions for more information.

References

If you use this code or parts of it, please cite [Reuter19].

Methods

minChi(m_min, m_max)

Calculate the minChi indicator (see [Reuter18]) for every \(m \in [m_{min},m_{max}]\).

optimize(m)

Full G-PCCA spectral clustering method with optimized memberships [Reuter18], [Reuter19].

Attributes

coarse_grained_input_distribution

Coarse grained input distribution of shape (n_m,).

coarse_grained_stationary_probability

Coarse grained stationary distribution of shape (n_m,).

coarse_grained_transition_matrix

Coarse grained transition matrix of shape (n_m, n_m).

crispness_values

Vector of crispness values for clustering into the requested cluster numbers.

dominant_eigenvalues

Dominant n_m eigenvalues of P.

input_distribution

Input probability distribution of the (micro)states.

macrostate_assignment

Crisp clustering using G-PCCA.

macrostate_sets

Crisp clustering using G-PCCA.

memberships

Array of shape (n, n_m) containing the membership \(\chi_{ij}\) (or probability) of each microstate \(i\) (to be assigned) to each macrostate or cluster \(j\).

n_m

Optimal number of clusters or macrostates to group the n microstates into.

optimal_crispness

Crispness for clustering into n_m clusters.

rotation_matrix

Optimized rotation matrix \(A\).

schur_matrix

Ordered top left part of shape (n_m, n_m) of the real Schur matrix of \(P\).

schur_vectors

Array \(Q\) of shape (n, n_m) with n_m sorted Schur vectors in the columns.

stationary_probability

Stationary probability distribution \(\pi\) of the microstates.

top_eigenvalues

Top m respective m_max eigenvalues of P.

transition_matrix

Row-stochastic transition matrix P.