| Title: | Unified Interface for Machine Learning Models |
|---|---|
| Description: | Provides a unified R6-based interface for various machine learning models with automatic interface detection, consistent cross-validation, model interpretations via numerical derivatives, and visualization. Supports both regression and classification tasks with any model function that follows R's standard modeling conventions (formula or matrix interface). |
| Authors: | T. Moudiki [aut, cre] |
| Maintainer: | T. Moudiki <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.0 |
| Built: | 2026-05-11 02:36:27 UTC |
| Source: | https://github.com/Techtonique/unifiedml |
Provides a unified R6-based interface for various machine learning models with automatic interface detection, consistent cross-validation, model interpretations via numerical derivatives, and visualization. Supports both regression and classification tasks with any model function that follows R's standard modeling conventions (formula or matrix interface).
Index of help topics:
benchmark Benchmark Multiple Models with Cross-Validation
and Model-Specific Parameters
cross_val_score Cross-Validation for Model Objects
extract_probabilities Extract probability predictions from any R
model in a standardised format
formula_to_matrix Convert a formula-based model to a matrix
interface
matrix_to_formula Convert a matrix-based model to a formula
interface
Model Unified Machine Learning Interface using R6
train_test_split Split data into training and test sets
unifiedml-package Unified Interface for Machine Learning Models
T. Moudiki <[email protected]>
T. Moudiki [aut, cre]
Perform k-fold cross-validation on a list of models, using model-specific parameters. Supports verbose messages and a progress bar.
benchmark( models, X, y, cv = 5L, scoring = NULL, params = NULL, cl = NULL, show_progress = FALSE, verbose = TRUE )benchmark( models, X, y, cv = 5L, scoring = NULL, params = NULL, cl = NULL, show_progress = FALSE, verbose = TRUE )
models |
A named list of |
X |
A data frame or matrix of predictors. |
y |
A vector of outcomes (factor for classification, numeric for regression). |
cv |
Integer, number of cross-validation folds (default 5). |
scoring |
Scoring metric: "rmse", "mae", "accuracy", or "f1" (default: auto-detected based on task) |
params |
Optional named list of lists, each sublist containing extra arguments
to pass to the corresponding model's |
cl |
Optional number of clusters for parallel processing |
show_progress |
Logical, whether to show a progress bar (default TRUE). |
verbose |
Logical, whether to print messages about each model (default TRUE). |
A list containing the CV scores for each model.
## Not run: library(randomForest) X <- iris[, 1:4] y <- iris$Species models <- list( glm = Model$new(caret::train), rf = Model$new(randomForest::randomForest), xgb = Model$new(caret::train) ) params <- list( glm = list(method = "glmnet", tuneGrid = data.frame(alpha = 0, lambda = 0.01), trControl = trainControl(method = "none")), rf = list(ntree = 150), xgb = list(method = "xgbTree", tuneGrid = data.frame(nrounds = 150, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1), trControl = trainControl(method = "none")) ) results <- benchmark(models, X, y, cv = 5, params = params, show_progress = TRUE, verbose = TRUE) print(results) ## End(Not run)## Not run: library(randomForest) X <- iris[, 1:4] y <- iris$Species models <- list( glm = Model$new(caret::train), rf = Model$new(randomForest::randomForest), xgb = Model$new(caret::train) ) params <- list( glm = list(method = "glmnet", tuneGrid = data.frame(alpha = 0, lambda = 0.01), trControl = trainControl(method = "none")), rf = list(ntree = 150), xgb = list(method = "xgbTree", tuneGrid = data.frame(nrounds = 150, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1), trControl = trainControl(method = "none")) ) results <- benchmark(models, X, y, cv = 5, params = params, show_progress = TRUE, verbose = TRUE) print(results) ## End(Not run)
Perform k-fold cross-validation with consistent scoring metrics across different model types. The scoring metric is automatically selected based on the detected task type.
cross_val_score( model, X, y, cv = 5, scoring = NULL, show_progress = TRUE, verbose = TRUE, cl = NULL, seed = 123, fit_params = NULL, predict_params = NULL )cross_val_score( model, X, y, cv = 5, scoring = NULL, show_progress = TRUE, verbose = TRUE, cl = NULL, seed = 123, fit_params = NULL, predict_params = NULL )
model |
A Model object |
X |
Feature matrix or data.frame |
y |
Target vector (type determines regression vs classification) |
cv |
Number of cross-validation folds (default: 5) |
scoring |
Scoring metric: "rmse", "mae", "accuracy", "f1", or a
custom function with signature |
show_progress |
Whether to show progress bar (default: TRUE) in sequential mode |
verbose |
logical flag enabling verbose messages (default: TRUE) in parallel mode |
cl |
Optional number of clusters for parallel processing
If using |
seed |
Reproducibility seed |
fit_params |
A list of additional arguments passed to model$fit() |
predict_params |
A list of additional arguments passed to model$predict() |
Vector of cross-validation scores for each fold
## Not run: library(glmnet) X <- matrix(rnorm(100), ncol = 4) y <- 2*X[,1] - 1.5*X[,2] + rnorm(25) # numeric -> regression mod <- Model$new(glmnet::glmnet) (cv_scores <- cross_val_score(mod, X, y, cv = 5)) # auto-uses RMSE mean(cv_scores) # Average RMSE cross_val_score(mod, X, y, fit_params = list(alpha = 0, lambda = 0.1), predict_params = list(type = "response")) cross_val_score(mod, X, y, fit_params = list(alpha = 0.5, lambda = 0.1), predict_params = list(type = "response")) # Custom scoring: R-squared r2 <- function(true, pred) { ss_res <- sum((true - pred)^2) ss_tot <- sum((true - mean(true))^2) 1 - ss_res / ss_tot } (cv_scores4 <- cross_val_score(mod, X, y, cv = 5, scoring = r2)) mean(cv_scores4) # Average R² # Classification with accuracy scoring data(iris) X_class <- iris[, 1:4] y_class <- iris$Species # factor -> classification mod2 <- Model$new(e1071::svm) (cv_scores2 <- cross_val_score(mod2, X_class, y_class, cv = 5)) # auto-uses accuracy mean(cv_scores2) # Average accuracy iris_bin <- iris[iris$Species != "virginica", ] X_bin <- iris_bin[, 1:4] y_bin <- droplevels(iris_bin$Species) (cv_scores3 <- cross_val_score(mod2, X_bin, y_bin, cv = 3, scoring="f1", fit_params=list(kernel="polynomial"))) mean(cv_scores3) # Average F1 ## End(Not run)## Not run: library(glmnet) X <- matrix(rnorm(100), ncol = 4) y <- 2*X[,1] - 1.5*X[,2] + rnorm(25) # numeric -> regression mod <- Model$new(glmnet::glmnet) (cv_scores <- cross_val_score(mod, X, y, cv = 5)) # auto-uses RMSE mean(cv_scores) # Average RMSE cross_val_score(mod, X, y, fit_params = list(alpha = 0, lambda = 0.1), predict_params = list(type = "response")) cross_val_score(mod, X, y, fit_params = list(alpha = 0.5, lambda = 0.1), predict_params = list(type = "response")) # Custom scoring: R-squared r2 <- function(true, pred) { ss_res <- sum((true - pred)^2) ss_tot <- sum((true - mean(true))^2) 1 - ss_res / ss_tot } (cv_scores4 <- cross_val_score(mod, X, y, cv = 5, scoring = r2)) mean(cv_scores4) # Average R² # Classification with accuracy scoring data(iris) X_class <- iris[, 1:4] y_class <- iris$Species # factor -> classification mod2 <- Model$new(e1071::svm) (cv_scores2 <- cross_val_score(mod2, X_class, y_class, cv = 5)) # auto-uses accuracy mean(cv_scores2) # Average accuracy iris_bin <- iris[iris$Species != "virginica", ] X_bin <- iris_bin[, 1:4] y_bin <- droplevels(iris_bin$Species) (cv_scores3 <- cross_val_score(mod2, X_bin, y_bin, cv = 3, scoring="f1", fit_params=list(kernel="polynomial"))) mean(cv_scores3) # Average F1 ## End(Not run)
Extract probability predictions from any R model in a standardised format
extract_probabilities(model, X, y_train = NULL, verbose = FALSE)extract_probabilities(model, X, y_train = NULL, verbose = FALSE)
model |
Fitted model object (any class) |
X |
Feature matrix or data.frame for predictions |
y_train |
Optional training labels used to name output columns |
verbose |
Print diagnostic information (default: FALSE) |
Numeric matrix of shape n_samples x n_classes with column names
as class labels. Attributes extraction_method and
model_class record how predictions were obtained.
Wraps a model function that expects formula + data so it can be called
with a plain X (data.frame or matrix) and y (vector).
Factors in X are preserved; special column names are safely backtick-quoted
in the generated formula so they survive the formula parser.
formula_to_matrix(fit_func, predict_func = stats::predict)formula_to_matrix(fit_func, predict_func = stats::predict)
fit_func |
A model-fitting function whose first two arguments are
|
predict_func |
A prediction function with signature
|
A named list with two elements:
fit(X, y, weights, ...)Fits the model. X is a
data.frame (or coercible matrix), y is the response vector.
Extra arguments are forwarded to fit_func.
predict(model, newdata, ...)Generates predictions.
newdata must have the same columns as the X used in
fit. Extra arguments are forwarded to predict_func.
lm_matrix <- formula_to_matrix(lm) X <- data.frame(wt = mtcars$wt, hp = mtcars$hp, cyl = factor(mtcars$cyl)) y <- mtcars$mpg model <- lm_matrix$fit(X, y) lm_matrix$predict(model, X[1:5, ])lm_matrix <- formula_to_matrix(lm) X <- data.frame(wt = mtcars$wt, hp = mtcars$hp, cyl = factor(mtcars$cyl)) y <- mtcars$mpg model <- lm_matrix$fit(X, y) lm_matrix$predict(model, X[1:5, ])
Wraps a model function that expects a numeric matrix X and a response
vector y (like glmnet::glmnet) so it can be called with the
familiar formula + data interface. The formula is expanded via
model.matrix, which handles factor dummy-coding,
interactions, and inline transformations automatically.
matrix_to_formula( fit_func, predict_func = function(model, newX, ...) stats::predict(model, newdata = newX, ...) )matrix_to_formula( fit_func, predict_func = function(model, newX, ...) stats::predict(model, newdata = newX, ...) )
fit_func |
A model-fitting function whose first two positional
arguments are |
predict_func |
A prediction function with signature
|
A named list with two elements:
fit(formula, data, ...)Fits the model. The formula is
expanded with model.matrix; the intercept column is dropped
before passing to fit_func (add it back via ... if your
model needs it). Extra arguments are forwarded to fit_func.
predict(model, newdata, ...)Generates predictions.
newdata is expanded with the same model.matrix terms
captured at fit time. Extra arguments are forwarded to
predict_func.
## Not run: glmnet_formula <- matrix_to_formula( fit_func = glmnet::glmnet, predict_func = function(model, newX, ...) { glmnet::predict.glmnet(model, newx = newX, s = 0.01, ...) } ) model <- glmnet_formula$fit(mpg ~ wt + hp + factor(cyl), data = mtcars) glmnet_formula$predict(model, newdata = mtcars[1:5, ]) ## End(Not run)## Not run: glmnet_formula <- matrix_to_formula( fit_func = glmnet::glmnet, predict_func = function(model, newX, ...) { glmnet::predict.glmnet(model, newx = newX, s = 0.01, ...) } ) model <- glmnet_formula$fit(mpg ~ wt + hp + factor(cyl), data = mtcars) glmnet_formula$predict(model, newdata = mtcars[1:5, ]) ## End(Not run)
Provides a consistent interface for various machine learning models in R, with automatic detection of formula vs matrix interfaces, built-in cross-validation, model interpretability, and visualization.
An R6 class that provides a unified interface for regression and classification models with automatic interface detection, cross-validation, and interpretability features. The task type (regression vs classification) is automatically detected from the response variable type.
model_fnThe modeling function (e.g., glmnet::glmnet, randomForest::randomForest)
fittedThe fitted model object
taskType of task: "regression" or "classification" (automatically detected)
X_trainTraining features matrix
y_trainTraining target vector
new()
Initialize a new Model
Model$new(model_fn)
model_fnA modeling function (e.g., glmnet, randomForest, svm)
A new Model object
fit()
Fit the model to training data
Automatically detects task type (regression vs classification) based on the type of the response variable y. Numeric y -> regression, factor y -> classification.
Model$fit(X, y, ...)
XFeature matrix or data.frame
yTarget vector (numeric for regression, factor for classification)
...Additional arguments passed to the model function
self (invisible) for method chaining
predict()
Generate predictions from fitted model
Model$predict(X, ...)
XFeature matrix for prediction
...Additional arguments passed to predict function
Vector of predictions
predict_proba()
Predict probabilities from fitted model
Model$predict_proba(X)
XFeature matrix for prediction
print()
Print model information
Model$print()
self (invisible) for method chaining
summary()
Compute numerical derivatives and statistical significance
Uses finite differences to compute approximate partial derivatives for each feature, providing model-agnostic interpretability.
Model$summary(h = 0.01, alpha = 0.05)
hStep size for finite differences (default: 0.01)
alphaSignificance level for p-values (default: 0.05)
The method computes numerical derivatives using central differences.
Statistical significance is assessed using t-tests on the derivative estimates across samples.
A data.frame with derivative statistics (invisible)
plot()
Create partial dependence plot for a feature
Visualizes the relationship between a feature and the predicted outcome while holding other features at their mean values.
Model$plot(feature = 1, n_points = 100)
featureIndex or name of feature to plot
n_pointsNumber of points for the grid (default: 100)
self (invisible) for method chaining
clone_model()
Create a deep copy of the model
Useful for cross-validation and parallel processing where multiple independent model instances are needed.
Model$clone_model()
A new Model object with same configuration
clone()
The objects of this class are cloneable with this method.
Model$clone(deep = FALSE)
deepWhether to make a deep clone.
Your Name
# Regression example with glmnet library(glmnet) X <- matrix(rnorm(100), ncol = 4) y <- 2*X[,1] - 1.5*X[,2] + rnorm(25) # numeric -> regression mod <- Model$new(glmnet::glmnet) mod$fit(X, y, alpha = 0, lambda = 0.1) mod$summary() predictions <- mod$predict(X) # Classification example data(iris) iris_binary <- iris[iris$Species %in% c("setosa", "versicolor"), ] X_class <- as.matrix(iris_binary[, 1:4]) y_class <- droplevels(iris_binary$Species) # factor -> classification mod2 <- Model$new(e1071::svm) mod2$fit(X_class, y_class, kernel = "radial") predictions <- mod2$predict(X_class) mod2$predict_proba(X_class)# Regression example with glmnet library(glmnet) X <- matrix(rnorm(100), ncol = 4) y <- 2*X[,1] - 1.5*X[,2] + rnorm(25) # numeric -> regression mod <- Model$new(glmnet::glmnet) mod$fit(X, y, alpha = 0, lambda = 0.1) mod$summary() predictions <- mod$predict(X) # Classification example data(iris) iris_binary <- iris[iris$Species %in% c("setosa", "versicolor"), ] X_class <- as.matrix(iris_binary[, 1:4]) y_class <- droplevels(iris_binary$Species) # factor -> classification mod2 <- Model$new(e1071::svm) mod2$fit(X_class, y_class, kernel = "radial") predictions <- mod2$predict(X_class) mod2$predict_proba(X_class)
Randomly splits a feature matrix or data.frame and its corresponding response vector into training and test subsets.
train_test_split(X, y, test_size = 0.2, seed = NULL)train_test_split(X, y, test_size = 0.2, seed = NULL)
X |
A matrix or data.frame of features. |
y |
A vector of responses (numeric or factor). Must have the same
number of rows as |
test_size |
Proportion of observations to use as the test set.
A number in (0, 1). Default is |
seed |
An optional integer random seed for reproducibility. If
|
A named list with four elements:
X_train |
Training features (same type as |
X_test |
Test features (same type as |
y_train |
Training response. |
y_test |
Test response. |
# matrix input X <- iris[, 1:4] y <- iris$Species d <- unifiedml::train_test_split(X, y, test_size = 0.3, seed = 42) dim(d$X_train) # 105 x 4 dim(d$X_test) # 45 x 4 # data.frame input d2 <- unifiedml::train_test_split(iris[, 1:4], iris$Species, test_size = 0.2) is.data.frame(d2$X_train) # TRUE# matrix input X <- iris[, 1:4] y <- iris$Species d <- unifiedml::train_test_split(X, y, test_size = 0.3, seed = 42) dim(d$X_train) # 105 x 4 dim(d$X_test) # 45 x 4 # data.frame input d2 <- unifiedml::train_test_split(iris[, 1:4], iris$Species, test_size = 0.2) is.data.frame(d2$X_train) # TRUE