| Title: | R Wrapper for the Mixture-Models Python Library |
|---|---|
| Description: | First R package enabling mixture models for high-dimensional data through gradient-based optimization with Automatic Differentiation (AD). Provides an R interface to the 'Mixture-Models' Python package (Kasa et al., 2024) via reticulate. Unlike traditional EM-based approaches (e.g., mclust, flexmix), this package uses AD and gradient-based optimization (including second-order Newton-CG) to fit Gaussian Mixture Models (GMM), Mixture of Factor Analyzers (MFA), Parsimonious GMM (PGMM), MCLUST family, and t-mixture models without requiring stringent modeling constraints, making it suitable for high-dimensional settings where the number of parameters exceeds the sample size. Reference: Kasa, S. R., Yijie, H., Kasa, S. K., & Rajan, V. (2024). Mixture-Models: a one-stop Python Library for Model-based Clustering using various Mixture Models. arXiv preprint arXiv:2402.10229. |
| Authors: | Siva Rajesh Kasa [aut, cre, cph] |
| Maintainer: | Siva Rajesh Kasa <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.20 |
| Built: | 2026-05-31 10:33:11 UTC |
| Source: | https://github.com/kasakh/mixturemodelsr |
Compute AIC for fitted model
mm_aic(fit)mm_aic(fit)
fit |
An mm_fit object |
Numeric AIC value
## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) mm_aic(fit) ## End(Not run)## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) mm_aic(fit) ## End(Not run)
Compute BIC for fitted model
mm_bic(fit)mm_bic(fit)
fit |
An mm_fit object |
Numeric BIC value
## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) mm_bic(fit) ## End(Not run)## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) mm_bic(fit) ## End(Not run)
Fits a GMM with a common covariance matrix across all components.
mm_gmm_constrained_fit( x, k, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ... )mm_gmm_constrained_fit( x, k, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ... )
x |
Numeric matrix or data.frame (rows = observations, columns = features) |
k |
Number of mixture components |
optimizer |
Optimizer name: "Newton-CG" (default), "grad_descent", "rms_prop", or "adam" |
scale |
Initialization scale parameter (default 1.0) |
use_kmeans |
Logical, whether to use k-means for initializing component means (default TRUE) |
... |
Additional arguments passed to Python init_params() or fit() methods |
An mm_fit object
## Not run: fit <- mm_gmm_constrained_fit(iris[, 1:4], k = 3) mm_bic(fit) ## End(Not run)## Not run: fit <- mm_gmm_constrained_fit(iris[, 1:4], k = 3) mm_bic(fit) ## End(Not run)
Fits a Gaussian mixture model to data using gradient-based optimization. This is a wrapper around the Python Mixture-Models GMM implementation.
mm_gmm_fit(x, k, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ...)mm_gmm_fit(x, k, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ...)
x |
Numeric matrix or data.frame (rows = observations, columns = features) |
k |
Number of mixture components |
optimizer |
Optimizer name: "Newton-CG" (default), "grad_descent", "rms_prop", or "adam" |
scale |
Initialization scale parameter (default 1.0) |
use_kmeans |
Logical, whether to use k-means for initializing component means (default TRUE) |
... |
Additional arguments passed to Python init_params() or fit() methods |
An mm_fit object containing:
py_model |
Python model object |
params_store |
Full optimization path |
params |
Final fitted parameters |
model_name |
Model family name |
k |
Number of components |
call |
Original function call |
n |
Sample size |
d |
Number of dimensions |
optimizer |
Optimizer used |
## Not run: # Setup (first time only) mm_setup() # Fit a 3-component GMM on iris data fit <- mm_gmm_fit(iris[, 1:4], k = 3) print(fit) # Get cluster labels labels <- mm_predict(fit) # Model selection mm_bic(fit) mm_aic(fit) ## End(Not run)## Not run: # Setup (first time only) mm_setup() # Fit a 3-component GMM on iris data fit <- mm_gmm_fit(iris[, 1:4], k = 3) print(fit) # Get cluster labels labels <- mm_predict(fit) # Model selection mm_bic(fit) mm_aic(fit) ## End(Not run)
Compute log-likelihood for fitted model
mm_likelihood(fit)mm_likelihood(fit)
fit |
An mm_fit object |
Numeric log-likelihood value
## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) mm_likelihood(fit) ## End(Not run)## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) mm_likelihood(fit) ## End(Not run)
Fits a model from the MCLUST family of constrained Gaussian mixture models. MCLUST models specify different parameterizations of the covariance structure.
mm_mclust_fit( x, k, model_type = NULL, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ... )mm_mclust_fit( x, k, model_type = NULL, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ... )
x |
Numeric matrix or data.frame (rows = observations, columns = features) |
k |
Number of mixture components |
model_type |
Character string specifying the MCLUST model type. Common types include: "EII", "VII", "EEI", "VEI", "EVI", "VVI", "EEE", "VEE", "EVE", "VVE", "EEV", "VEV", "EVV", "VVV" If NULL, uses default from Python implementation. |
optimizer |
Optimizer name: "Newton-CG" (default), "grad_descent", "rms_prop", or "adam" |
scale |
Initialization scale parameter (default 1.0) |
use_kmeans |
Logical, whether to use k-means for initializing component means (default TRUE) |
... |
Additional arguments passed to Python init_params() or fit() methods |
MCLUST model types follow a three-letter naming convention:
First letter: Volume (E=equal across components, V=variable across components)
Second letter: Shape (E=equal, V=variable, I=spherical/identity)
Third letter: Orientation (E=equal, V=variable, I=axis-aligned/identity)
Common model types:
"EII": Spherical, equal volume
"VII": Spherical, variable volume
"EEE": Ellipsoidal, equal volume, shape, and orientation
"VVV": Ellipsoidal, variable volume, shape, and orientation (most flexible)
An mm_fit object
## Not run: # Fit MCLUST with VVV (most flexible) model fit <- mm_mclust_fit(iris[, 1:4], k = 3, model_type = "VVV") mm_bic(fit) # Fit MCLUST with spherical, equal volume model fit_eii <- mm_mclust_fit(iris[, 1:4], k = 3, model_type = "EII") mm_bic(fit_eii) # Compare models labels <- mm_predict(fit) table(labels, iris$Species) ## End(Not run)## Not run: # Fit MCLUST with VVV (most flexible) model fit <- mm_mclust_fit(iris[, 1:4], k = 3, model_type = "VVV") mm_bic(fit) # Fit MCLUST with spherical, equal volume model fit_eii <- mm_mclust_fit(iris[, 1:4], k = 3, model_type = "EII") mm_bic(fit_eii) # Compare models labels <- mm_predict(fit) table(labels, iris$Species) ## End(Not run)
Fits a mixture of factor analyzers model to data using gradient-based optimization.
mm_mfa_fit( x, k, q = NULL, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ... )mm_mfa_fit( x, k, q = NULL, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ... )
x |
Numeric matrix or data.frame (rows = observations, columns = features) |
k |
Number of mixture components |
q |
Number of latent factors (NULL for automatic selection) |
optimizer |
Optimizer name: "Newton-CG" (default), "grad_descent", "rms_prop", or "adam" |
scale |
Initialization scale parameter (default 1.0) |
use_kmeans |
Logical, whether to use k-means for initializing component means (default TRUE) |
... |
Additional arguments passed to Python init_params() or fit() methods |
An mm_fit object
## Not run: # Fit MFA with 3 components and 2 latent factors fit <- mm_mfa_fit(iris[, 1:4], k = 3, q = 2) mm_bic(fit) # Get cluster labels labels <- mm_predict(fit) ## End(Not run)## Not run: # Fit MFA with 3 components and 2 latent factors fit <- mm_mfa_fit(iris[, 1:4], k = 3, q = 2) mm_bic(fit) # Get cluster labels labels <- mm_predict(fit) ## End(Not run)
Extracts the parameter values from a fitted mixture model.
mm_params(fit, convert = TRUE)mm_params(fit, convert = TRUE)
fit |
An mm_fit object |
convert |
Logical, whether to convert Python objects to R (default TRUE) |
Parameter object (structure depends on model family)
## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) params <- mm_params(fit) ## End(Not run)## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) params <- mm_params(fit) ## End(Not run)
Fits a parsimonious GMM with constraints on the covariance structure.
mm_pgmm_fit( x, k, model_type = NULL, q = NULL, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ... )mm_pgmm_fit( x, k, model_type = NULL, q = NULL, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ... )
x |
Numeric matrix or data.frame (rows = observations, columns = features) |
k |
Number of mixture components |
model_type |
Character string specifying the PGMM model type (e.g., "VVV", "EEE", "VEV"). If NULL, uses default from Python implementation. |
q |
Number of latent factors (NULL for automatic selection) |
optimizer |
Optimizer name: "Newton-CG" (default), "grad_descent", "rms_prop", or "adam" |
scale |
Initialization scale parameter (default 1.0) |
use_kmeans |
Logical, whether to use k-means for initializing component means (default TRUE) |
... |
Additional arguments passed to Python init_params() or fit() methods |
PGMM model types follow the mclust naming convention:
First letter: Volume (E=equal, V=variable)
Second letter: Shape (E=equal, V=variable)
Third letter: Orientation (E=equal, V=variable, I=identity)
An mm_fit object
## Not run: # Fit PGMM with variable volume, shape, and orientation fit <- mm_pgmm_fit(iris[, 1:4], k = 3, model_type = "VVV") mm_bic(fit) ## End(Not run)## Not run: # Fit PGMM with variable volume, shape, and orientation fit <- mm_pgmm_fit(iris[, 1:4], k = 3, model_type = "VVV") mm_bic(fit) ## End(Not run)
Predicts cluster membership labels for observations using a fitted mixture model.
mm_predict(fit, newx = NULL)mm_predict(fit, newx = NULL)
fit |
An mm_fit object returned by mm_*_fit functions |
newx |
Optional matrix or data.frame of new observations. If NULL, predictions are made on the training data. |
Integer vector of cluster labels (0-indexed from Python, converted to 1-indexed for R)
## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) labels <- mm_predict(fit) table(labels, iris$Species) ## End(Not run)## Not run: fit <- mm_gmm_fit(iris[,1:4], k = 3) labels <- mm_predict(fit) table(labels, iris$Species) ## End(Not run)
Prints detailed information about the Python environment configuration for mixturemodelsr. Useful for troubleshooting installation issues.
mm_py_info()mm_py_info()
Invisibly returns a list with diagnostic information
## Not run: mm_py_info() ## End(Not run)## Not run: mm_py_info() ## End(Not run)
Check if Python module is available
mm_python_available()mm_python_available()
Logical indicating if either mixture_models or Mixture_Models is importable
One-time setup for R users. Provisions a dedicated conda environment with Python 3.10 + NumPy 1.23.5 and installs Mixture-Models==0.0.8.
mm_setup(force = FALSE)mm_setup(force = FALSE)
force |
Logical, force reinstallation by deleting and recreating the env |
TRUE invisibly on success
Fits a mixture of multivariate t-distributions to data using gradient-based optimization. T-distributions are more robust to outliers than Gaussian distributions.
mm_tmm_fit(x, k, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ...)mm_tmm_fit(x, k, optimizer = "Newton-CG", scale = 1, use_kmeans = TRUE, ...)
x |
Numeric matrix or data.frame (rows = observations, columns = features) |
k |
Number of mixture components |
optimizer |
Optimizer name: "Newton-CG" (default), "grad_descent", "rms_prop", or "adam" |
scale |
Initialization scale parameter (default 1.0) |
use_kmeans |
Logical, whether to use k-means for initializing component means (default TRUE) |
... |
Additional arguments passed to Python init_params() or fit() methods |
T-mixture models use multivariate t-distributions instead of Gaussians, making them more robust to outliers. Each component has its own degrees of freedom parameter, allowing different tail behaviors.
An mm_fit object
## Not run: # Fit TMM with 3 components (robust to outliers) fit <- mm_tmm_fit(iris[, 1:4], k = 3) mm_bic(fit) # Get cluster labels labels <- mm_predict(fit) table(labels, iris$Species) ## End(Not run)## Not run: # Fit TMM with 3 components (robust to outliers) fit <- mm_tmm_fit(iris[, 1:4], k = 3) mm_bic(fit) # Get cluster labels labels <- mm_predict(fit) table(labels, iris$Species) ## End(Not run)
Print method for mm_fit objects
## S3 method for class 'mm_fit' print(x, ...)## S3 method for class 'mm_fit' print(x, ...)
x |
An mm_fit object |
... |
Additional arguments (ignored) |