pygpcca.gpcca_coarsegrain

pygpcca.gpcca_coarsegrain(P, m, eta=None, z='LM', method='brandts')[source]

Coarse-grain the transition matrix P into m sets using G-PCCA [Reuter18], [Reuter19].

Performs optimized spectral clustering via G-PCCA and coarse-grains P such that the dominant Perron eigenvalues are preserved using:

\[P_c = (\chi^T D \chi)^{-1} (\chi^T D P \chi)\]

with \(D\) being a diagonal matrix with eta on its diagonal [Reuter18], [Reuter19].

Parameters
  • P (Union[ndarray, spmatrix]) – The transition matrix (row-stochastic).

  • m (Union[int, Tuple[int, int], List[int], Dict[str, int]]) –

    The number of clusters or a range where a search for potentially optimal cluster numbers is performed. Valid options are:

    • int: number of clusters to group into.

    • tuple: minimal and maximal number of clusters.

    • dict: minimal and maximal number of clusters given as {'m_min': int, 'm_max': int}.

  • eta (Optional[ndarray]) –

    The input probability distribution of the (micro)states. In theory eta can be an arbitrary distribution as long as it is a valid probability distribution (i.e., sums up to 1). A neutral and valid choice would be the uniform distribution (default).

    In case of a reversible transition matrix, the stationary distribution can (but don’t has to) be used here. In case of a non-reversible P, some initial or average distribution of the states might be chosen instead of the uniform distribution.

    Vector of shape (n,) which sums to 1. If None (default), uniform distribution is used.

  • z (str) –

    Specifies which portion of the eigenvalue spectrum of P is to be sought. The returned invariant subspace of P will be associated with this part of the spectrum. Valid options are:

    • ’LM’: largest magnitude (default).

    • ’LR’: largest real part.

  • method (str) –

    Which method to use to determine the invariant subspace. Valid options are:

    • ’brandts’: perform a full Schur decomposition of P utilizing scipy.linalg.schur() (without the intrinsic sorting option, since it is flawed) and sort the returned Schur form R and Schur vector matrix Q afterwards using a routine published by Brandts [Brandts02]. This is well tested and thus the default method, although it is also the slowest choice.

    • ’krylov’: calculate an orthonormal basis of the subspace associated with the m dominant eigenvalues of P using the Krylov-Schur method as implemented in SLEPc. This is the fastest choice and especially suitable for very large P, but it is still experimental.

    See the installation instructions for more information.

Return type

ndarray

Returns

The coarse-grained row-stochastic transition matrix.

References

If you use this code or parts of it, please cite [Reuter19].