Models
Overview
tMAVEN provides many built-in models that you can run on your data. Generally, they are divided into several categories. On this page, we discuss each model, and then provide timings for how long they take to run on a typical dataset.
Description of Models
Mixture Models
threshold
A threshold is applied and datapoints on each side of the threshold are clustered together.
kmeans
The K-means clustering algorithm is use to cluster the datapoints into \(N_{states}\).
mlgmm
A maximum likelihood Gaussian mixture model (GMM) clustering algorithm is use to cluster the datapoints into \(N_{states}\).
vbgmm
A variational Bayes Gaussian mixture model (GMM) clustering algorithm is use to cluster the datapoints into \(N_{states}\).
Individual HMMs
mlhmm
This is a separate maximum likelihood HMM for each trajectory. No ensemble model is generated, therefore no statistics are provided and much of the plotting functionality cannot be performed.
vbhmm
This is a separate variational Bayes HMM (i.e., vbFRET) for each trajectory. No ensemble model is generated, therefore no statistics are provided and much of the plotting functionality cannot be performed.
Composite HMMs
Composite HMMs created by modeling each trajectory with its own HMM. We do this by using model selection (where appropriate; see below). Thus we run the one through \(N_{states}\) models for each trajectory, select the best (max evidence) of those models, and then do some version of clustering of the individual states found across all of the trajectories using \(N_{states}\) number of clusters. After assigning the individual states to the clustered global states, HMM parameters are assembled to create a composite HMM.
kmeans_mlhmm
A maximum likelihood HMM is run on each trajectory using \(N_{states}\), and then K-means with \(N_{states}\) is used to cluster the emission states. No model selection is performed, so the individual trajectory models often have spurious states – especially when kinetics are relatively slow.
kmeans_vbhmm
A variational Bayes HMM (i.e., vbFRET) is run on each trajectory using model selection from one to \(N_{states}\), and then K-means with \(N_{states}\) is used to cluster the emission states.
vbgmm_vbhmm
A variational Bayes HMM (i.e., vbFRET) is run on each trajectory using model selection from one to \(N_{states}\), and then a variational Bayes GMM is used with model selection from one to \(N_{states}\) to cluster the emission states.
threshold_vbhmm
A variational Bayes HMM (i.e., vbFRET) is run on each trajectory using model selection from one to \(N_{states}\), and then a threshold is applied to the emission states to cluster them into two groups.
threshold_vbconhmm
A global variational Bayes HMM is run on all trajectories using \(N_{states}\), and then a threshold is applied to the emission states to cluster them into two groups.
Global HMMs
vbconhmm
This is a global variational Bayes HMM. It is conceptually similar to vbFRET, but all of the trajectories and assumed to be idependent and identically distributed (IID). This means that they will all obey the same HMM.
ebhmm
This is an empirical Bayes HMM (i.e., ebFRET). The model provided in the empirical prior. This is a pseudo-global method in that it also models each trajectory individually. The idealized (Viterbi) paths in the plot are from the individual posteriors. Parameters are from the empirical prior.
Model Selection
These are variations of several models discussed above. Specifically for the Bayesian-based methods, we use the maximum evidence or evidence lower bound (ELBO) to identify the optimal number of states. This works by running the same type of model, each time using a different number of states. The variation with the largest evidence/ELBO is chosen as the best model. Generally, you want to run the one state through at least the six state model.
vbgmm_modelselection
This is a variational Bayes GMM (mixture model) with model selection from one to \(N_{states}\).
vbhmm_modelselection
This is a variational Bayes HMM (i.e., vbFRET) with model selection from one to \(N_{states}\).
vbconhmm_modelselection
This is a global variational Bayes HMM (i.e., global vbFRET) with model selection from one to \(N_{states}\).
vbgmm_vbhmm_modelselection
This is a composite variational Bayes HMM (i.e., composite vbFRET). Model selection with the vbhmm is performed from one to \(N_{states}\), and then model selection for the number of cluster states in the GMM is performed from one to \(N_{states}\)
Mixture Models
Timing
Use test_timing.py
to run all of the models on the test dataset (L1-tRNA; ribosomal complex with tRNA\(^{Phe}\) at 25 C) composed of 406 trajectories. All models are two states. All model selection is done with one through six states. Note: the first time you run a model, it must be ‘just-in-time’ (JIT) compiled, and this will take a few seconds. Any additional time you run the model, it will not take this long. These timings are the median of 5 runs, and so do NOT include the JIT compiling time.
Apple M2 Pro
Mixture | Time (s) |
---|---|
threshold | 0.107 |
kmeans | 0.167 |
mlgmm | 1.024 |
vbgmm | 0.287 |
HMM | Time (s) |
---|---|
mlhmm | 0.453 |
vbhmm | 0.508 |
Composite | Time (s) |
---|---|
vbconhmm | 8.230 |
ebhmm | 6.479 |
Global | Time (s) |
---|---|
kmeans_mlhmm | 0.529 |
kmeans_vbhmm | 1.783 |
vbgmm_vbhmm | 1.913 |
threshold_vbhmm | 1.671 |
threshold_vbconhmm | 14.339 |
w/ Model Selection(1-6) | Time (s) |
---|---|
vbgmm_modelselection | 3.663 |
vbhmm_modelselection | 13.550 |
vbconhmm_modelselection | 101.481 |
vbgmm_vbhmm_modelselection | 41.001 |