Package 'unifiedml' reference manual

Title:	Unified Interface for Machine Learning Models
Description:	Provides a unified R6-based interface for various machine learning models with automatic interface detection, consistent cross-validation, model interpretations via numerical derivatives, and visualization. Supports both regression and classification tasks with any model function that follows R's standard modeling conventions (formula or matrix interface).
Authors:	T. Moudiki [aut, cre]
Maintainer:	T. Moudiki <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.0
Built:	2026-07-10 09:43:13 UTC
Source:	https://github.com/Techtonique/unifiedml

Unified Interface for Machine Learning Models

Description

Provides a unified R6-based interface for various machine learning models with automatic interface detection, consistent cross-validation, model interpretations via numerical derivatives, and visualization. Supports both regression and classification tasks with any model function that follows R's standard modeling conventions (formula or matrix interface).

Package Content

Index of help topics:

benchmark               Benchmark Multiple Models with Cross-Validation
                        and Model-Specific Parameters
cross_val_score         Cross-Validation for Model Objects
extract_probabilities   Extract probability predictions from any R
                        model in a standardised format
formula_to_matrix       Convert a formula-based model to a matrix
                        interface
matrix_to_formula       Convert a matrix-based model to a formula
                        interface
Model                   Unified Machine Learning Interface using R6
train_test_split        Split data into training and test sets
unifiedml-package       Unified Interface for Machine Learning Models

Maintainer

T. Moudiki <[email protected]>

Author(s)

T. Moudiki [aut, cre]

Benchmark Multiple Models with Cross-Validation and Model-Specific Parameters

Description

Perform k-fold cross-validation on a list of models, using model-specific parameters. Supports verbose messages and a progress bar.

Usage

benchmark(
  models,
  X,
  y,
  cv = 5L,
  scoring = NULL,
  params = NULL,
  cl = NULL,
  show_progress = FALSE,
  verbose = TRUE
)
benchmark(
  models,
  X,
  y,
  cv = 5L,
  scoring = NULL,
  params = NULL,
  cl = NULL,
  show_progress = FALSE,
  verbose = TRUE
)

Arguments

models

A named list of Model$new(...) objects to benchmark.

X

A data frame or matrix of predictors.

y

A vector of outcomes (factor for classification, numeric for regression).

cv

Integer, number of cross-validation folds (default 5).

scoring

Scoring metric: "rmse", "mae", "accuracy", or "f1" (default: auto-detected based on task)

params

Optional named list of lists, each sublist containing extra arguments to pass to the corresponding model's fit() call. Names must match models.

cl

Optional number of clusters for parallel processing

show_progress

Logical, whether to show a progress bar (default TRUE).

verbose

Logical, whether to print messages about each model (default TRUE).

Value

A list containing the CV scores for each model.

Examples

## Not run: 
library(randomForest)

X <- iris[, 1:4]
y <- iris$Species

models <- list(
  glm  = Model$new(caret::train),
  rf   = Model$new(randomForest::randomForest),
  xgb  = Model$new(caret::train)
)

params <- list(
  glm = list(method = "glmnet",
             tuneGrid = data.frame(alpha = 0, lambda = 0.01),
             trControl = trainControl(method = "none")),
  rf  = list(ntree = 150),
  xgb = list(method = "xgbTree",
             tuneGrid = data.frame(nrounds = 150, max_depth = 3, eta = 0.3,
                                   gamma = 0, colsample_bytree = 1,
                                   min_child_weight = 1, subsample = 1),
             trControl = trainControl(method = "none"))
)

results <- benchmark(models, X, y, cv = 5, params = params,
                     show_progress = TRUE, verbose = TRUE)
print(results)

## End(Not run)
## Not run: 
library(randomForest)

X <- iris[, 1:4]
y <- iris$Species

models <- list(
  glm  = Model$new(caret::train),
  rf   = Model$new(randomForest::randomForest),
  xgb  = Model$new(caret::train)
)

params <- list(
  glm = list(method = "glmnet",
             tuneGrid = data.frame(alpha = 0, lambda = 0.01),
             trControl = trainControl(method = "none")),
  rf  = list(ntree = 150),
  xgb = list(method = "xgbTree",
             tuneGrid = data.frame(nrounds = 150, max_depth = 3, eta = 0.3,
                                   gamma = 0, colsample_bytree = 1,
                                   min_child_weight = 1, subsample = 1),
             trControl = trainControl(method = "none"))
)

results <- benchmark(models, X, y, cv = 5, params = params,
                     show_progress = TRUE, verbose = TRUE)
print(results)

## End(Not run)

Cross-Validation for Model Objects

Description

Perform k-fold cross-validation with consistent scoring metrics across different model types. The scoring metric is automatically selected based on the detected task type.

Usage

cross_val_score(
  model,
  X,
  y,
  cv = 5,
  scoring = NULL,
  show_progress = TRUE,
  verbose = TRUE,
  cl = NULL,
  seed = 123,
  fit_params = NULL,
  predict_params = NULL
)
cross_val_score(
  model,
  X,
  y,
  cv = 5,
  scoring = NULL,
  show_progress = TRUE,
  verbose = TRUE,
  cl = NULL,
  seed = 123,
  fit_params = NULL,
  predict_params = NULL
)

Arguments

model

A Model object

X

Feature matrix or data.frame

y

Target vector (type determines regression vs classification)

cv

Number of cross-validation folds (default: 5)

scoring

Scoring metric: "rmse", "mae", "accuracy", "f1", or a custom function with signature function(true, pred) returning a scalar. Default: auto-detected based on task type.

show_progress

Whether to show progress bar (default: TRUE) in sequential mode

verbose

logical flag enabling verbose messages (default: TRUE) in parallel mode

cl

Optional number of clusters for parallel processing If using cl for parallel execution, custom scoring functions must be self-contained (no dependencies on the calling environment).

seed

Reproducibility seed

fit_params

A list of additional arguments passed to model$fit()

predict_params

A list of additional arguments passed to model$predict()

Value

Vector of cross-validation scores for each fold

Examples

## Not run: 
library(glmnet)
X <- matrix(rnorm(100), ncol = 4)
y <- 2*X[,1] - 1.5*X[,2] + rnorm(25)  # numeric -> regression

mod <- Model$new(glmnet::glmnet)
(cv_scores <- cross_val_score(mod, X, y, cv = 5))  # auto-uses RMSE
mean(cv_scores)  # Average RMSE

cross_val_score(mod, X, y,
fit_params     = list(alpha = 0, lambda = 0.1),
predict_params = list(type = "response"))

cross_val_score(mod, X, y,
fit_params     = list(alpha = 0.5, lambda = 0.1),
predict_params = list(type = "response"))

# Custom scoring: R-squared
r2 <- function(true, pred) {
  ss_res <- sum((true - pred)^2)
  ss_tot <- sum((true - mean(true))^2)
  1 - ss_res / ss_tot
}

(cv_scores4 <- cross_val_score(mod, X, y, cv = 5, scoring = r2))
mean(cv_scores4)  # Average R²

# Classification with accuracy scoring
data(iris)
X_class <- iris[, 1:4]
y_class <- iris$Species  # factor -> classification
mod2 <- Model$new(e1071::svm)
(cv_scores2 <- cross_val_score(mod2, X_class, y_class, cv = 5))  # auto-uses accuracy
mean(cv_scores2)  # Average accuracy

iris_bin <- iris[iris$Species != "virginica", ]
X_bin <- iris_bin[, 1:4]
y_bin <- droplevels(iris_bin$Species)
(cv_scores3 <- cross_val_score(mod2, X_bin, y_bin, cv = 3, 
scoring="f1", fit_params=list(kernel="polynomial")))  
mean(cv_scores3)  # Average F1

## End(Not run)

## Not run: 
library(glmnet)
X <- matrix(rnorm(100), ncol = 4)
y <- 2*X[,1] - 1.5*X[,2] + rnorm(25)  # numeric -> regression

mod <- Model$new(glmnet::glmnet)
(cv_scores <- cross_val_score(mod, X, y, cv = 5))  # auto-uses RMSE
mean(cv_scores)  # Average RMSE

cross_val_score(mod, X, y,
fit_params     = list(alpha = 0, lambda = 0.1),
predict_params = list(type = "response"))

cross_val_score(mod, X, y,
fit_params     = list(alpha = 0.5, lambda = 0.1),
predict_params = list(type = "response"))

# Custom scoring: R-squared
r2 <- function(true, pred) {
  ss_res <- sum((true - pred)^2)
  ss_tot <- sum((true - mean(true))^2)
  1 - ss_res / ss_tot
}

(cv_scores4 <- cross_val_score(mod, X, y, cv = 5, scoring = r2))
mean(cv_scores4)  # Average R²

# Classification with accuracy scoring
data(iris)
X_class <- iris[, 1:4]
y_class <- iris$Species  # factor -> classification
mod2 <- Model$new(e1071::svm)
(cv_scores2 <- cross_val_score(mod2, X_class, y_class, cv = 5))  # auto-uses accuracy
mean(cv_scores2)  # Average accuracy

iris_bin <- iris[iris$Species != "virginica", ]
X_bin <- iris_bin[, 1:4]
y_bin <- droplevels(iris_bin$Species)
(cv_scores3 <- cross_val_score(mod2, X_bin, y_bin, cv = 3, 
scoring="f1", fit_params=list(kernel="polynomial")))  
mean(cv_scores3)  # Average F1

## End(Not run)

Extract probability predictions from any R model in a standardised format

Description

Extract probability predictions from any R model in a standardised format

Usage

extract_probabilities(model, X, y_train = NULL, verbose = FALSE)
extract_probabilities(model, X, y_train = NULL, verbose = FALSE)

Arguments

model

Fitted model object (any class)

X

Feature matrix or data.frame for predictions

y_train

Optional training labels used to name output columns

verbose

Print diagnostic information (default: FALSE)

Value

Numeric matrix of shape n_samples x n_classes with column names as class labels. Attributes extraction_method and model_class record how predictions were obtained.

Convert a formula-based model to a matrix interface

Description

Wraps a model function that expects formula + data so it can be called with a plain X (data.frame or matrix) and y (vector). Factors in X are preserved; special column names are safely backtick-quoted in the generated formula so they survive the formula parser.

Usage

formula_to_matrix(fit_func, predict_func = stats::predict)
formula_to_matrix(fit_func, predict_func = stats::predict)

Arguments

fit_func

A model-fitting function whose first two arguments are formula and data (e.g. lm, glm).

predict_func

A prediction function with signature function(model, newdata, ...). Defaults to stats::predict.

Value

A named list with two elements:

fit(X, y, weights, ...): Fits the model. X is a data.frame (or coercible matrix), y is the response vector. Extra arguments are forwarded to fit_func.
predict(model, newdata, ...): Generates predictions. newdata must have the same columns as the X used in fit. Extra arguments are forwarded to predict_func.

Examples

lm_matrix <- formula_to_matrix(lm)
X <- data.frame(wt = mtcars$wt, hp = mtcars$hp, cyl = factor(mtcars$cyl))
y <- mtcars$mpg
model <- lm_matrix$fit(X, y)
lm_matrix$predict(model, X[1:5, ])

lm_matrix <- formula_to_matrix(lm)
X <- data.frame(wt = mtcars$wt, hp = mtcars$hp, cyl = factor(mtcars$cyl))
y <- mtcars$mpg
model <- lm_matrix$fit(X, y)
lm_matrix$predict(model, X[1:5, ])

Convert a matrix-based model to a formula interface

Description

Wraps a model function that expects a numeric matrix X and a response vector y (like glmnet::glmnet) so it can be called with the familiar formula + data interface. The formula is expanded via model.matrix, which handles factor dummy-coding, interactions, and inline transformations automatically.

Usage

matrix_to_formula(
  fit_func,
  predict_func = function(model, newX, ...) stats::predict(model, newdata = newX, ...)
)
matrix_to_formula(
  fit_func,
  predict_func = function(model, newX, ...) stats::predict(model, newdata = newX, ...)
)

Arguments

fit_func

A model-fitting function whose first two positional arguments are x (numeric matrix) and y (response vector), e.g. glmnet::glmnet.

predict_func

A prediction function with signature function(model, newX, ...) where newX is a numeric matrix. Defaults to a thin wrapper around stats::predict that passes newdata as newx.

Value

A named list with two elements:

fit(formula, data, ...): Fits the model. The formula is expanded with model.matrix; the intercept column is dropped before passing to fit_func (add it back via ... if your model needs it). Extra arguments are forwarded to fit_func.
predict(model, newdata, ...): Generates predictions. newdata is expanded with the same model.matrix terms captured at fit time. Extra arguments are forwarded to predict_func.

Examples

## Not run: 
glmnet_formula <- matrix_to_formula(
  fit_func = glmnet::glmnet,
  predict_func = function(model, newX, ...) {
    glmnet::predict.glmnet(model, newx = newX, s = 0.01, ...)
  }
)
model <- glmnet_formula$fit(mpg ~ wt + hp + factor(cyl), data = mtcars)
glmnet_formula$predict(model, newdata = mtcars[1:5, ])

## End(Not run)

## Not run: 
glmnet_formula <- matrix_to_formula(
  fit_func = glmnet::glmnet,
  predict_func = function(model, newX, ...) {
    glmnet::predict.glmnet(model, newx = newX, s = 0.01, ...)
  }
)
model <- glmnet_formula$fit(mpg ~ wt + hp + factor(cyl), data = mtcars)
glmnet_formula$predict(model, newdata = mtcars[1:5, ])

## End(Not run)

Unified Machine Learning Interface using R6

Description

Provides a consistent interface for various machine learning models in R, with automatic detection of formula vs matrix interfaces, built-in cross-validation, model interpretability, and visualization.

An R6 class that provides a unified interface for regression and classification models with automatic interface detection, cross-validation, and interpretability features. The task type (regression vs classification) is automatically detected from the response variable type.

Public fields

model_fn: The modeling function (e.g., glmnet::glmnet, randomForest::randomForest)
fitted: The fitted model object
task: Type of task: "regression" or "classification" (automatically detected)
X_train: Training features matrix
y_train: Training target vector

Methods

Method `new()`

Initialize a new Model

Usage

Model$new(model_fn)

Arguments

model_fn: A modeling function (e.g., glmnet, randomForest, svm)

Returns

A new Model object

Method `fit()`

Fit the model to training data

Automatically detects task type (regression vs classification) based on the type of the response variable y. Numeric y -> regression, factor y -> classification.

Usage

Model$fit(X, y, ...)

Arguments

X: Feature matrix or data.frame
y: Target vector (numeric for regression, factor for classification)
...: Additional arguments passed to the model function

Returns

self (invisible) for method chaining

Method `predict()`

Generate predictions from fitted model

Usage

Model$predict(X, ...)

Arguments

X: Feature matrix for prediction
...: Additional arguments passed to predict function

Returns

Vector of predictions

Method `predict_proba()`

Predict probabilities from fitted model

Usage

Model$predict_proba(X)

Arguments

X: Feature matrix for prediction

Method `print()`

Print model information

Usage

Model$print()

Returns

self (invisible) for method chaining

Method `summary()`

Compute numerical derivatives and statistical significance

Uses finite differences to compute approximate partial derivatives for each feature, providing model-agnostic interpretability.

Usage

Model$summary(h = 0.01, alpha = 0.05)

Arguments

h: Step size for finite differences (default: 0.01)
alpha: Significance level for p-values (default: 0.05)

Details

The method computes numerical derivatives using central differences.

Statistical significance is assessed using t-tests on the derivative estimates across samples.

Returns

A data.frame with derivative statistics (invisible)

Method `plot()`

Create partial dependence plot for a feature

Visualizes the relationship between a feature and the predicted outcome while holding other features at their mean values.

Usage

Model$plot(feature = 1, n_points = 100)

Arguments

feature: Index or name of feature to plot
n_points: Number of points for the grid (default: 100)

Returns

self (invisible) for method chaining

Method `clone_model()`

Create a deep copy of the model

Useful for cross-validation and parallel processing where multiple independent model instances are needed.

Usage

Model$clone_model()

Returns

A new Model object with same configuration

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Model$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Author(s)

Your Name

Examples


# Regression example with glmnet
library(glmnet)
X <- matrix(rnorm(100), ncol = 4)
y <- 2*X[,1] - 1.5*X[,2] + rnorm(25)  # numeric -> regression

mod <- Model$new(glmnet::glmnet)
mod$fit(X, y, alpha = 0, lambda = 0.1)
mod$summary()
predictions <- mod$predict(X)

# Classification example  
data(iris)
iris_binary <- iris[iris$Species %in% c("setosa", "versicolor"), ]
X_class <- as.matrix(iris_binary[, 1:4])
y_class <- droplevels(iris_binary$Species)  # factor -> classification

mod2 <- Model$new(e1071::svm)
mod2$fit(X_class, y_class, kernel = "radial")
predictions <- mod2$predict(X_class)
mod2$predict_proba(X_class)



# Regression example with glmnet
library(glmnet)
X <- matrix(rnorm(100), ncol = 4)
y <- 2*X[,1] - 1.5*X[,2] + rnorm(25)  # numeric -> regression

mod <- Model$new(glmnet::glmnet)
mod$fit(X, y, alpha = 0, lambda = 0.1)
mod$summary()
predictions <- mod$predict(X)

# Classification example  
data(iris)
iris_binary <- iris[iris$Species %in% c("setosa", "versicolor"), ]
X_class <- as.matrix(iris_binary[, 1:4])
y_class <- droplevels(iris_binary$Species)  # factor -> classification

mod2 <- Model$new(e1071::svm)
mod2$fit(X_class, y_class, kernel = "radial")
predictions <- mod2$predict(X_class)
mod2$predict_proba(X_class)

Split data into training and test sets

Description

Randomly splits a feature matrix or data.frame and its corresponding response vector into training and test subsets.

Usage

train_test_split(X, y, test_size = 0.2, seed = NULL)
train_test_split(X, y, test_size = 0.2, seed = NULL)

Arguments

X

A matrix or data.frame of features.

y

A vector of responses (numeric or factor). Must have the same number of rows as X.

test_size

Proportion of observations to use as the test set. A number in (0, 1). Default is 0.2 (80/20 split).

seed

An optional integer random seed for reproducibility. If NULL (default) the current RNG state is used.

Value

A named list with four elements:

X_train

Training features (same type as X).

X_test

Test features (same type as X).

y_train

Training response.

y_test

Test response.

Examples

# matrix input
X <- iris[, 1:4]
y <- iris$Species
d <- unifiedml::train_test_split(X, y, test_size = 0.3, seed = 42)
dim(d$X_train)  # 105 x 4
dim(d$X_test)   #  45 x 4

# data.frame input
d2 <- unifiedml::train_test_split(iris[, 1:4], iris$Species, test_size = 0.2)
is.data.frame(d2$X_train)  # TRUE

# matrix input
X <- iris[, 1:4]
y <- iris$Species
d <- unifiedml::train_test_split(X, y, test_size = 0.3, seed = 42)
dim(d$X_train)  # 105 x 4
dim(d$X_test)   #  45 x 4

# data.frame input
d2 <- unifiedml::train_test_split(iris[, 1:4], iris$Species, test_size = 0.2)
is.data.frame(d2$X_train)  # TRUE

Package 'unifiedml'

Help Index

Unified Interface for Machine Learning Models

Description

Package Content

Maintainer

Author(s)

Benchmark Multiple Models with Cross-Validation and Model-Specific Parameters

Description

Usage

Arguments

Value

Examples

Cross-Validation for Model Objects

Description

Usage

Arguments

Value

Examples

Extract probability predictions from any R model in a standardised format

Description

Usage

Arguments

Value

Convert a formula-based model to a matrix interface

Description

Usage

Arguments

Value

Examples

Convert a matrix-based model to a formula interface

Description

Usage

Arguments

Value

Examples

Unified Machine Learning Interface using R6

Description

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method fit()

Usage

Arguments

Returns

Method predict()

Usage

Arguments

Returns

Method predict_proba()

Usage

Arguments

Method print()

Usage

Returns

Method summary()

Usage

Arguments

Details

Returns

Method plot()

Usage

Arguments

Returns

Method clone_model()

Usage

Returns

Method clone()

Usage

Arguments

Author(s)

Examples

Split data into training and test sets

Description

Usage

Arguments

Method `new()`

Method `fit()`

Method `predict()`

Method `predict_proba()`

Method `print()`

Method `summary()`

Method `plot()`

Method `clone_model()`

Method `clone()`