The tisthemachinelearner package provides a simple R
interface to scikit-learn models through Python’s
tisthemachinelearner package. This vignette demonstrates
how to use the package with R’s built-in mtcars
dataset.
We’ll use the classic mtcars dataset to predict miles
per gallon (mpg) based on other car characteristics:
# Load data
data(mtcars)
head(mtcars)
# Split features and target
X <- as.matrix(mtcars[, -1]) # all columns except mpg
y <- mtcars[, 1] # mpg column
# Create train/test split
set.seed(42)
train_idx <- sample(nrow(mtcars), size = floor(0.8 * nrow(mtcars)))
X_train <- X[train_idx, ]
X_test <- X[-train_idx, ]
y_train <- y[train_idx]
y_test <- y[-train_idx]Let’s start with a simple linear regression model:
# Fit linear regression model
start <- proc.time()[3]
reg_linear <- tisthemachinelearner::regressor(X_train, y_train, "LinearRegression", venv_path = "../venv")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
# Make predictions
start <- proc.time()[3]
predictions <- predict(reg_linear, X_test)
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
# Calculate RMSE
rmse <- sqrt(mean((predictions - y_test)^2))
cat("Linear Regression RMSE:", rmse, "\n")
# Compare actual vs predicted values
results <- data.frame(
Actual = y_test,
Predicted = predictions,
Difference = y_test - predictions
)
print(results)Now let’s try Ridge regression with cross-validation for hyperparameter tuning:
# Fit ridge regression model
start <- proc.time()[3]
reg_ridge <- regressor(X_train, y_train, "RidgeCV", venv_path = "../venv")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
# Make predictions
start <- proc.time()[3]
predictions_ridge <- predict(reg_ridge, X_test)
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
# Calculate RMSE
rmse_ridge <- sqrt(mean((predictions_ridge - y_test)^2))
cat("Ridge Regression RMSE:", rmse_ridge, "\n")Let’s visualize how well our predictions match the actual values:
# Create scatter plot of actual vs predicted values
par(mfrow = c(1, 2))
# Linear Regression plot
plot(y_test, predictions,
xlab = "Actual MPG",
ylab = "Predicted MPG",
main = "Linear Regression",
pch = 16)
abline(a = 0, b = 1, col = "red", lty = 2)
# Ridge Regression plot
plot(y_test, predictions_ridge,
xlab = "Actual MPG",
ylab = "Predicted MPG",
main = "Ridge Regression",
pch = 16)
abline(a = 0, b = 1, col = "red", lty = 2)Compare the performance of both models:
This example demonstrates how to:
The tisthemachinelearner package makes it easy to use
scikit-learn models with R data, combining the familiarity of R data
structures with the power of Python’s machine learning ecosystem.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] tisthemachinelearner_0.10.0 Matrix_1.7-5
#> [3] reticulate_1.46.0 rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.57
#> [5] lattice_0.22-9 maketools_1.3.2 cachem_1.1.0 knitr_1.51
#> [9] htmltools_0.5.9 png_0.1-9 buildtools_1.0.0 lifecycle_1.0.5
#> [13] cli_3.6.6 grid_4.6.0 sass_0.4.10 jquerylib_0.1.4
#> [17] compiler_4.6.0 sys_3.4.3 tools_4.6.0 evaluate_1.0.5
#> [21] bslib_0.11.0 Rcpp_1.1.1-1.1 yaml_2.3.12 jsonlite_2.0.0
#> [25] rlang_1.2.0