---
title: "conformalize matrix interface"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{conformalize matrix interface}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r, message=FALSE}
library(misc)
```

## Example: Conformal Prediction with Out-of-Sample Coverage

In this example, we demonstrate how to use the `conformalize` function to perform conformal prediction and calculate the out-of-sample coverage rate.

### Simulated Data

We will generate a simple dataset for demonstration purposes.

```{r simulate-data}
set.seed(123)
n <- 200
x <- matrix(runif(n * 2), ncol = 2)
y <- 3 * x[, 1] + 2 * x[, 2] + rnorm(n, sd = 0.5)
data <- data.frame(x1 = x[, 1], x2 = x[, 2], y = y)
```

### Fit Conformal Model

We will use a linear model (`lm`) as the `fit_func` and its corresponding `predict` function as the `predict_func`.

```{r fit-conformal-model}
library(stats)

# Define fit and predict functions
fit_func <- function(x, y, ...)
{
  df <- data.frame(y=y, x) # naming of columns is mandatory for `predict`
  print(head(df))
  ranger::ranger(y ~ ., data=df, ...)
}

predict_func <- function(obj, newx)
{
  colnames(newx) <- paste0("X", 1:ncol(newx)) # mandatory, linked to df in fit_func
  predict(object=obj, data=newx)$predictions # only accepts a named newx
}

# Apply conformalize
conformal_model <- misc::conformalize(  
  x = x,
  y = y,
  fit_func = fit_func,
  predict_func = predict_func,
  split_ratio = 0.8,
  seed = 123
)
```

### Generate Predictions and Prediction Intervals

We will use the `predict.conformalize` method to generate predictions and calculate prediction intervals.

```{r predict-intervals}
# New data for prediction
new_data <- data.frame(X1 = runif(50), X2 = runif(50))

# Predict with split conformal method
predictions <- predict(
  conformal_model,
  newdata = new_data,
  level = 0.95,
  method = "split",
  predict_func = predict_func
)

head(predictions)

residuals(conformal_model)
```

### Calculate Out-of-Sample Coverage Rate

The coverage rate is the proportion of true values that fall within the prediction intervals.

```{r calculate-coverage}
# Simulate true values for the new data
true_y <- 3 * new_data$x1 + 2 * new_data$x2 + rnorm(50, sd = 0.5)

# Check if true values fall within the prediction intervals
coverage <- mean(true_y >= predictions[, "lwr"] & true_y <= predictions[, "upr"])

cat("Out-of-sample coverage rate:", coverage)
```

### Results

- The prediction intervals are calculated using the split conformal method.
- The out-of-sample coverage rate is displayed, which should be close to the specified confidence level (e.g., 0.95).

## Example: Conformal Prediction with the `MASS::Boston` Dataset

In this example, we use the `MASS::Boston` dataset to demonstrate conformal prediction.

### Load the Data

We will use the `MASS` package to access the `Boston` dataset.

```{r load-boston-data}
library(MASS)

# Load the Boston dataset
data(Boston)

# Inspect the dataset
head(Boston)
```

### Split the Data

We will split the data into training and test sets to ensure they are disjoint.

```{r split-data-boston}
set.seed(123)
n <- nrow(MASS::Boston)
train_indices <- sample(seq_len(n), size = floor(0.8 * n))
train_data <- MASS::Boston[train_indices, ]
test_data <- MASS::Boston[-train_indices, ]
```

### Fit Conformal Model 

```{r fit-conformal-boston}
predict_func <- function(obj, newx)
{
  predict(object=obj, data=newx)$predictions # only accepts a named newx
}


# Apply conformalize using the training data
conformal_model_boston <- misc::conformalize(
  x = as.matrix(train_data[, -which(names(train_data) == "medv")]),
  y = train_data$medv,
  fit_func = fit_func,
  predict_func = predict_func,
  seed = 123
)
```

### Generate Predictions and Prediction Intervals 

We will use the `predict.conformalize` method to generate predictions and calculate prediction intervals for the test set.

```{r predict-intervals-boston}
# Predict with split conformal method on the test data
predictions_boston <- predict(
  conformal_model_boston,
  newdata = as.matrix(test_data),
  level = 0.95,
  method = "split",
  predict_func = predict_func
)

head(predictions_boston)

residuals(conformal_model)
```

### Calculate Out-of-Sample Coverage Rate 1

The coverage rate is the proportion of true values in the test set that fall within the prediction intervals.

```{r calculate-coverage-boston}
# True values for the test set
true_y_boston <- test_data$medv

# Check if true values fall within the prediction intervals
coverage_boston <- mean(true_y_boston >= predictions_boston[, "lwr"] & true_y_boston <= predictions_boston[, "upr"])

cat("Out-of-sample coverage rate for Boston dataset:", coverage_boston)
```