Introduction to tisthemachinelearner, S3 interface with calibration

Introduction

The tisthemachinelearner package provides a simple R interface to scikit-learn models through Python’s tisthemachinelearner package. This vignette demonstrates how to use the package with R’s built-in mtcars dataset.

Setup

First, let’s load the required packages:

library(reticulate)
library(tisthemachinelearner)

Data Preparation

We’ll use the classic mtcars dataset to predict miles per gallon (mpg) based on other car characteristics:

# Load data

# Split features and target
X <- as.matrix(MASS::Boston[, -14])  # all columns except mpg
y <- MASS::Boston[, 14]              # mpg column

# Create train/test split
set.seed(42)
train_idx <- sample(nrow(X), size = floor(0.8 * nrow(X)))
X_train <- X[train_idx, ]
X_test <- X[-train_idx, ]
y_train <- y[train_idx]
y_test <- y[-train_idx]

Ridge Regression with Cross-Validation

Now let’s try Ridge regression with cross-validation for hyperparameter tuning:

# Fit ridge regression model
start <- proc.time()[3]
reg_ridge <- tisthemachinelearner::regressor(X_train, y_train, "Ridge", 
                      #alphas = c(0.01, 0.1, 1, 10),
                      calibration = TRUE, venv_path = "../venv")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")

# Make predictions
start <- proc.time()[3]
predictions_ridge_splitconformal <- predict(reg_ridge, X_test, method = "splitconformal")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")

start <- proc.time()[3]
predictions_ridge_surrogate <- predict(reg_ridge, X_test, method = "surrogate")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")

start <- proc.time()[3]
predictions_ridge_bootstrap <- predict(reg_ridge, X_test, method = "bootstrap")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")

# Calculate coverage
coverage_ridge_splitconformal <- mean(y_test >= predictions_ridge_splitconformal[, "lwr"] & y_test <= predictions_ridge_splitconformal[, "upr"])
coverage_ridge_surrogate <- mean(y_test >= predictions_ridge_surrogate[, "lwr"] & y_test <= predictions_ridge_surrogate[, "upr"])
coverage_ridge_bootstrap <- mean(y_test >= predictions_ridge_bootstrap[, "lwr"] & y_test <= predictions_ridge_bootstrap[, "upr"])

cat("Ridge Regression Split Conformal Coverage:", coverage_ridge_splitconformal, "\n")
cat("Ridge Regression Surrogate Coverage:", coverage_ridge_surrogate, "\n")
cat("Ridge Regression Bootstrap Coverage:", coverage_ridge_bootstrap, "\n")

Session Info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tisthemachinelearner_0.10.0 Matrix_1.7-5               
#> [3] reticulate_1.46.0           rmarkdown_2.31             
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39    R6_2.6.1         fastmap_1.2.0    xfun_0.57       
#>  [5] lattice_0.22-9   maketools_1.3.2  cachem_1.1.0     knitr_1.51      
#>  [9] htmltools_0.5.9  png_0.1-9        buildtools_1.0.0 lifecycle_1.0.5 
#> [13] cli_3.6.6        grid_4.6.0       sass_0.4.10      jquerylib_0.1.4 
#> [17] compiler_4.6.0   sys_3.4.3        tools_4.6.0      evaluate_1.0.5  
#> [21] bslib_0.11.0     Rcpp_1.1.1-1.1   yaml_2.3.12      jsonlite_2.0.0  
#> [25] rlang_1.2.0