--- title: "Beyond GARCH" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Beyond GARCH} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction Probabilistic stock forecasting often relies on parametric models like ARIMA for the mean and GARCH for volatility. The `mlarchf` function in the `ahead` package offers a flexible hybrid alternative to ARMA-GARCH by combining machine learning approaches with ARCH effects. The model decomposes the time series into two components: 1. Mean component: $y_t = \mu_t + \sigma_t \varepsilon_t$ 2. Volatility component: $\sigma_t^2 = f(\varepsilon_{t-1}^2, \varepsilon_{t-2}^2, ...)$ where: - $\mu_t$ is the conditional mean (modeled using **any forecasting method**) - $\sigma_t$ is the conditional volatility (modeled using **machine learning**) - $\varepsilon_t$ are standardized residuals The **key innovation** is using machine learning methods and conformal prediction to model the volatility component, allowing for more flexible and potentially more accurate volatility forecasts than traditional GARCH models. The function supports various machine learning methods through parameters `fit_func` and `predict_func` as in other `ahead` models, and through the `caret` package. The forecasting process involves: - Fitting a mean model (default: `auto.arima`) - Modeling the squared residuals using machine learning. For this to work, the residuals from the mean model need to be centered, so that $$ \mathbb{E}[\epsilon_t^2|F_{t-1}] $$ (basically a supervised regression of squared residuals on their lags) is a good approximation of the latent conditional volatility - Conformalizing the standardized residuals for prediction intervals This new approach combines the interpretability of traditional time series models with the flexibility of machine learning, while maintaining proper uncertainty quantification through conformal prediction. # Basic Usage Let's start with a simple example using the Google stock price data from the `fpp2` package: ```{r setup, include=FALSE} library(forecast) library(ahead) library(randomForest) library(e1071) library(glmnet) ``` ```{r basic_example, fig.width=7.2} y <- fpp2::goog200 # Default model for volatility (Ridge regression for volatility) (obj_ridge <- ahead::mlarchf(y, h=20L, B=500L)) ``` # Different Machine Learning Methods The package supports various machine learning methods for volatility modeling. Here are some examples: ```{r ml_methods, fig.width=7.2} # Random Forest (obj_rf <- ahead::mlarchf(y, fit_func = randomForest::randomForest, predict_func = predict, h=20L, B=500L)) # Support Vector Machine (obj_svm <- ahead::mlarchf(y, fit_func = e1071::svm, predict_func = predict, h=20L, B=500L)) # Elastic Net (obj_glmnet <- ahead::mlarchf(y, fit_func = glmnet::cv.glmnet, predict_func = predict, h=20L, B=500L)) ``` Let's visualize the forecasts: ```{r plot_ridge_rf, fig.width=7.2} par(mfrow=c(1, 2)) plot(obj_ridge, main="Ridge Regression") plot(obj_rf, main="Random Forest") ``` ```{r plot_svm_glmnet, fig.width=7.2} par(mfrow=c(1, 2)) plot(obj_svm, main="Support Vector Machine") plot(obj_glmnet, main="Elastic Net") ``` # Using caret Models The package also supports models from the `caret` package, which provides access to hundreds of machine learning methods. Here's how to use them: ```{r caret_example, fig.width=7} y <- window(fpp2::goog200, start=100) # Random Forest via caret (obj_rf <- ahead::mlarchf(y, ml_method="ranger", h=20L)) # Gradient Boosting via caret (obj_glmboost <- ahead::mlarchf(y, ml_method="glmboost", h=20L)) # Gradient Boosting via caret and using clustering explicitly (obj_glmboost2 <- ahead::mlarchf(y, ml_method="glmboost", n_clusters=2, h=20L)) (obj_glmboost3 <- ahead::mlarchf(y, ml_method="glmboost", n_clusters=3, h=20L)) ``` Visualizing the forecasts: ```{r plot_caret, fig.width=7} par(mfrow=c(1, 2)) plot(obj_rf, main="Random Forest (caret)") plot(obj_glmboost, main="Gradient Boosting (caret)") par(mfrow=c(1, 2)) plot(obj_glmboost2, main="Gradient Boosting (caret) \n 2 clusters") plot(obj_glmboost3, main="Gradient Boosting (caret) \n 3 clusters") ``` Looking at the simulation paths: ```{r plot_sims, fig.width=7.2} par(mfrow=c(1, 2)) matplot(obj_rf$sims, type='l', main="RF Simulation Paths") matplot(obj_glmboost$sims, type='l', main="GBM Simulation Paths") ``` # Customizing Mean and Residual Models You can also customize both the mean forecasting model and the model for forecasting standardized residuals: ```{r custom_models, fig.width=7} # Using RW + Theta method for mean and residuals along with SVM for volatility (obj_svm <- ahead::mlarchf(y, fit_func = e1071::svm, predict_func = predict, h=20L, mean_model=forecast::rwf, model_residuals=forecast::thetaf)) # Using Theta + Theta method for mean and residuals along with GLMNET for volatility (obj_glmnet <- ahead::mlarchf(y, fit_func = glmnet::cv.glmnet, predict_func = predict, h=20L, mean_model=forecast::thetaf, model_residuals=forecast::thetaf)) ``` ```{r plot_custom, fig.width=7.2} plot(obj_svm, main="SVM with RW + Theta") plot(obj_glmnet, main="Elastic Net with Theta + Theta") ``` When using non-ARIMA models for the mean forecast, it's important to check if the residuals of the mean forecasting model are centered and stationary: ```{r diagnostics} # Diagnostic tests for residuals print(obj_svm$resids_t_test) print(obj_svm$resids_kpss_test) print(obj_glmnet$resids_t_test) print(obj_glmnet$resids_kpss_test) ```