--- title: "Introduction to tisthemachinelearner, S3 interface with calibration" author: "T. Moudiki" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to tisthemachinelearner, S3 interface with calibration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The `tisthemachinelearner` package provides a simple R interface to scikit-learn models through Python's `tisthemachinelearner` package. This vignette demonstrates how to use the package with R's built-in `mtcars` dataset. ## Setup First, let's load the required packages: ```{r} library(reticulate) library(tisthemachinelearner) ``` ## Data Preparation We'll use the classic `mtcars` dataset to predict miles per gallon (mpg) based on other car characteristics: ```{r, eval=FALSE} # Load data # Split features and target X <- as.matrix(MASS::Boston[, -14]) # all columns except mpg y <- MASS::Boston[, 14] # mpg column # Create train/test split set.seed(42) train_idx <- sample(nrow(X), size = floor(0.8 * nrow(X))) X_train <- X[train_idx, ] X_test <- X[-train_idx, ] y_train <- y[train_idx] y_test <- y[-train_idx] ``` ## Ridge Regression with Cross-Validation Now let's try Ridge regression with cross-validation for hyperparameter tuning: ```{r, eval=FALSE} # Fit ridge regression model start <- proc.time()[3] reg_ridge <- tisthemachinelearner::regressor(X_train, y_train, "Ridge", #alphas = c(0.01, 0.1, 1, 10), calibration = TRUE, venv_path = "../venv") end <- proc.time()[3] cat("Time taken:", end - start, "seconds\n") # Make predictions start <- proc.time()[3] predictions_ridge_splitconformal <- predict(reg_ridge, X_test, method = "splitconformal") end <- proc.time()[3] cat("Time taken:", end - start, "seconds\n") start <- proc.time()[3] predictions_ridge_surrogate <- predict(reg_ridge, X_test, method = "surrogate") end <- proc.time()[3] cat("Time taken:", end - start, "seconds\n") start <- proc.time()[3] predictions_ridge_bootstrap <- predict(reg_ridge, X_test, method = "bootstrap") end <- proc.time()[3] cat("Time taken:", end - start, "seconds\n") # Calculate coverage coverage_ridge_splitconformal <- mean(y_test >= predictions_ridge_splitconformal[, "lwr"] & y_test <= predictions_ridge_splitconformal[, "upr"]) coverage_ridge_surrogate <- mean(y_test >= predictions_ridge_surrogate[, "lwr"] & y_test <= predictions_ridge_surrogate[, "upr"]) coverage_ridge_bootstrap <- mean(y_test >= predictions_ridge_bootstrap[, "lwr"] & y_test <= predictions_ridge_bootstrap[, "upr"]) cat("Ridge Regression Split Conformal Coverage:", coverage_ridge_splitconformal, "\n") cat("Ridge Regression Surrogate Coverage:", coverage_ridge_surrogate, "\n") cat("Ridge Regression Bootstrap Coverage:", coverage_ridge_bootstrap, "\n") ``` ## Session Info ```{r} sessionInfo() ```