Introduction to tisthemachinelearner, S3 interface

Introduction

The tisthemachinelearner package provides a simple R interface to scikit-learn models through Python’s tisthemachinelearner package. This vignette demonstrates how to use the package with R’s built-in mtcars dataset.

Setup

First, let’s load the required packages:

library(reticulate)
library(tisthemachinelearner)

Data Preparation

We’ll use the classic mtcars dataset to predict miles per gallon (mpg) based on other car characteristics:

# Load data
data(mtcars)
head(mtcars)

# Split features and target
X <- as.matrix(mtcars[, -1])  # all columns except mpg
y <- mtcars[, 1]              # mpg column

# Create train/test split
set.seed(42)
train_idx <- sample(nrow(mtcars), size = floor(0.8 * nrow(mtcars)))
X_train <- X[train_idx, ]
X_test <- X[-train_idx, ]
y_train <- y[train_idx]
y_test <- y[-train_idx]

Linear Regression

Let’s start with a simple linear regression model:

# Fit linear regression model
start <- proc.time()[3]
reg_linear <- tisthemachinelearner::regressor(X_train, y_train, "LinearRegression", venv_path = "../venv")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")

# Make predictions
start <- proc.time()[3]
predictions <- predict(reg_linear, X_test)
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")

# Calculate RMSE
rmse <- sqrt(mean((predictions - y_test)^2))
cat("Linear Regression RMSE:", rmse, "\n")

# Compare actual vs predicted values
results <- data.frame(
  Actual = y_test,
  Predicted = predictions,
  Difference = y_test - predictions
)
print(results)

Ridge Regression with Cross-Validation

Now let’s try Ridge regression with cross-validation for hyperparameter tuning:

# Fit ridge regression model
start <- proc.time()[3]
reg_ridge <- regressor(X_train, y_train, "RidgeCV", venv_path = "../venv")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")

# Make predictions
start <- proc.time()[3]
predictions_ridge <- predict(reg_ridge, X_test)
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")

# Calculate RMSE
rmse_ridge <- sqrt(mean((predictions_ridge - y_test)^2))
cat("Ridge Regression RMSE:", rmse_ridge, "\n")

Visualization

Let’s visualize how well our predictions match the actual values:

# Create scatter plot of actual vs predicted values
par(mfrow = c(1, 2))

# Linear Regression plot
plot(y_test, predictions,
     xlab = "Actual MPG",
     ylab = "Predicted MPG",
     main = "Linear Regression",
     pch = 16)
abline(a = 0, b = 1, col = "red", lty = 2)

# Ridge Regression plot
plot(y_test, predictions_ridge,
     xlab = "Actual MPG",
     ylab = "Predicted MPG",
     main = "Ridge Regression",
     pch = 16)
abline(a = 0, b = 1, col = "red", lty = 2)

Model Comparison

Compare the performance of both models:

comparison <- data.frame(
  Model = c("Linear Regression", "Ridge Regression"),
  RMSE = c(rmse, rmse_ridge)
)
print(comparison)

Conclusion

This example demonstrates how to:

  1. Prepare R data for use with the regressor
  2. Fit different types of regression models
  3. Make predictions on new data
  4. Calculate and compare model performance
  5. Visualize results

The tisthemachinelearner package makes it easy to use scikit-learn models with R data, combining the familiarity of R data structures with the power of Python’s machine learning ecosystem.

Session Info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tisthemachinelearner_0.10.0 Matrix_1.7-5               
#> [3] reticulate_1.46.0           rmarkdown_2.31             
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39    R6_2.6.1         fastmap_1.2.0    xfun_0.57       
#>  [5] lattice_0.22-9   maketools_1.3.2  cachem_1.1.0     knitr_1.51      
#>  [9] htmltools_0.5.9  png_0.1-9        buildtools_1.0.0 lifecycle_1.0.5 
#> [13] cli_3.6.6        grid_4.6.0       sass_0.4.10      jquerylib_0.1.4 
#> [17] compiler_4.6.0   sys_3.4.3        tools_4.6.0      evaluate_1.0.5  
#> [21] bslib_0.11.0     Rcpp_1.1.1-1.1   yaml_2.3.12      jsonlite_2.0.0  
#> [25] rlang_1.2.0