| Title: | Time Series dataset collection (real-world and synthetic) |
|---|---|
| Description: | Collection of functions for simulating various types of time series data and accessing real-world time series datasets. |
| Authors: | T. Moudiki [aut, cre] |
| Maintainer: | T. Moudiki <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.1 |
| Built: | 2026-05-23 10:52:39 UTC |
| Source: | https://github.com/thierrymoudiki/simulatetimeseries |
Collection of functions for simulating various types of time series data and accessing real-world time series datasets.
Index of help topics:
check_list_contains Check if a list contains an object
create_correlation_matrix
create correlation matrix
debug_print Print a variable for debugging
get_data_1 Get data 1
lstmf LSTM Forecasting for Financial Returns
martingale_test Martingale Test for Time Series
meboot Maximum Entropy Bootstrap for Time Series using
Rcpp
rbootstrap Simulate using bootstrap resampling
rcpp_hello_world Simple function using Rcpp
removenas Remove NA values from a time series
rfitdistr Simulate from parametric distribution
rgan Generate Synthetic Data using GANs
rgaussiandens Simulate Gaussian Kernel Density
rsurrogate Simulate using surrogate data
simulate_correlated_gaussian
Simulate Correlated Gaussian Random Variables
simulate_time_series_1
Simulate Univariate Time Series Dataset 1
simulate_time_series_2
Simulate a univariate time series dataset 2
simulate_time_series_3
Simulate a univariate time series dataset 3
simulate_time_series_4
Simulate a univariate time series dataset 4
simulatetimeseries-package
Time Series dataset collection (real-world and
synthetic)
splitts split time series sequentially
train_gan Train a Simple GAN Model
ts_dataset Time Series Dataset Collection
T. Moudiki <[email protected]>
T. Moudiki [aut, cre]
Checks whether a list of time series objects already contains a given element, comparing values, start, and frequency.
check_list_contains(list_elts, new_elt) check_list_contains(list_elts, new_elt)check_list_contains(list_elts, new_elt) check_list_contains(list_elts, new_elt)
list_elts |
List of time series objects. |
new_elt |
A time series object to check for presence in the list. |
lst |
list to check |
obj |
object to look for |
logical indicating if object is in list
Logical. TRUE if the element is found, FALSE otherwise.
x <- ts(1:10) y <- ts(1:10) check_list_contains(list(x), y)x <- ts(1:10) y <- ts(1:10) check_list_contains(list(x), y)
create correlation matrix
create_correlation_matrix(cor_values)create_correlation_matrix(cor_values)
Prints the name and value of a variable to the console, with a newline for clarity.
debug_print(x)debug_print(x)
x |
Any R object to print. |
Invisibly returns the input object.
debug_print(iris)debug_print(iris)
Data from Task Views + synthetic
get_data_1(diffs = TRUE, install_pkgs = FALSE)get_data_1(diffs = TRUE, install_pkgs = FALSE)
diffs |
return the differentiated series or not? (lag = 1) |
a list of time series objects
LSTM Forecasting for Financial Returns
lstmf( y, h = 10, level = c(80, 95), lookback = 20, units = 50, epochs = 100, batch_size = 32 )lstmf( y, h = 10, level = c(80, 95), lookback = 20, units = 50, epochs = 100, batch_size = 32 )
y |
Univariate time series of returns |
h |
Forecast horizon |
level |
Confidence levels for prediction intervals |
lookback |
Number of previous periods to use for prediction |
units |
Number of LSTM units |
epochs |
Training epochs |
batch_size |
Batch size for training |
Object of class 'forecast' with LSTM predictions
Performs a set of martingale tests on a numeric vector, using polynomial and inverse polynomial trends.
martingale_test(x)martingale_test(x)
x |
Numeric vector. The time series to test. |
A named list of p-values for each martingale test.
x <- rnorm(100) martingale_test(x)x <- rnorm(100) martingale_test(x)
Generates bootstrap replicates of a time series using the maximum entropy bootstrap algorithm with Rcpp implementation for improved performance. This method is particularly useful for non-stationary time series and preserves the dependence structure of the original data.
meboot( x, reps = 999, trim = list(trim = 0.1, xmin = NULL, xmax = NULL), reachbnd = TRUE, expand.sd = TRUE, force.clt = TRUE, scl.adjustment = FALSE, sym = FALSE, colsubj, coldata, coltimes, ... )meboot( x, reps = 999, trim = list(trim = 0.1, xmin = NULL, xmax = NULL), reachbnd = TRUE, expand.sd = TRUE, force.clt = TRUE, scl.adjustment = FALSE, sym = FALSE, colsubj, coldata, coltimes, ... )
x |
A numeric vector or time series object to be bootstrapped. |
reps |
Number of bootstrap replicates to generate (default: 999). |
trim |
Controls tail behavior. Can be a single numeric value specifying the trim proportion (e.g., 0.10 for 10
|
reachbnd |
Logical indicating whether to allow generated values to reach the boundaries xmin and xmax (default: TRUE). |
expand.sd |
Logical indicating whether to expand the standard deviation of the ensemble (default: TRUE). |
force.clt |
Logical indicating whether to force the central limit theorem compliance by centering each replicate (default: TRUE). |
scl.adjustment |
Logical indicating whether to adjust the scale of the ensemble to match the original data's variance (default: FALSE). |
sym |
Logical indicating whether to force symmetry in the maximum entropy density (default: FALSE). |
colsubj |
Deprecated parameter from original meboot (included for compatibility). |
coldata |
Deprecated parameter from original meboot (included for compatibility). |
coltimes |
Deprecated parameter from original meboot (included for compatibility). |
... |
Additional arguments passed to expansion functions. |
The maximum entropy bootstrap algorithm generates replicates that:
Preserve the dependence structure of the original time series
Can handle non-stationary time series
Satisfy the ergodic theorem and central limit theorem
Maintain the mean and autocorrelation structure
The Rcpp implementation provides significant performance improvements over the original R implementation, especially for large datasets and many replications.
A list with the following components:
x |
Original time series data |
ensemble |
Matrix of bootstrap replicates (n x reps) |
xx |
Sorted original data |
z |
Intermediate points between sorted values |
dv |
Absolute differences between consecutive observations |
dvtrim |
Trimmed mean of differences |
xmin |
Lower bound used for generation |
xmax |
Upper bound used for generation |
desintxb |
Interval means satisfying mean-preserving constraint |
ordxx |
Ordering index of original data |
kappa |
Scale adjustment factor (if scl.adjustment = TRUE) |
Vinod, H. D., & Lopez-de-Lacalle, J. (2009). Maximum entropy bootstrap for time series: The meboot R package. Journal of Statistical Software, 29(5), 1-19.
# Basic usage with a time series set.seed(123) x <- ts(rnorm(100), start = c(2000, 1), frequency = 12) result <- meboot(x, reps = 1000) # Plot first few replicates matplot(result$ensemble[, 1:5], type = "l", lty = 1) lines(result$x, col = "black", lwd = 2) # With custom bounds result_bounded <- meboot(x, reps = 100, trim = list(trim = 0.1, xmin = -3, xmax = 3)) # With scale adjustment result_scaled <- meboot(x, reps = 100, scl.adjustment = TRUE)# Basic usage with a time series set.seed(123) x <- ts(rnorm(100), start = c(2000, 1), frequency = 12) result <- meboot(x, reps = 1000) # Plot first few replicates matplot(result$ensemble[, 1:5], type = "l", lty = 1) lines(result$x, col = "black", lwd = 2) # With custom bounds result_bounded <- meboot(x, reps = 100, trim = list(trim = 0.1, xmin = -3, xmax = 3)) # With scale adjustment result_scaled <- meboot(x, reps = 100, scl.adjustment = TRUE)
Generates bootstrap samples from a numeric vector, optionally producing multiple replicates.
rbootstrap(x, n = length(x), p = 1, seed = 123)rbootstrap(x, n = length(x), p = 1, seed = 123)
x |
Numeric vector to resample. |
n |
Integer. Number of samples per replicate (default: length of x). |
p |
Integer. Number of replicates (default: 1). |
seed |
Integer. Random seed for reproducibility (default: 123). |
A vector or matrix of bootstrap samples.
x <- rnorm(10) rbootstrap(x, n = 10, p = 3)x <- rnorm(10) rbootstrap(x, n = 10, p = 3)
Simple function using Rcpp
rcpp_hello_world()rcpp_hello_world()
## Not run: rcpp_hello_world() ## End(Not run)## Not run: rcpp_hello_world() ## End(Not run)
Remove NA values from a time series
remove NAs by linear interpolation
removenas(y) removenas(y)removenas(y) removenas(y)
x |
time series object |
time series object without NA values
Simulate from parametric distribution
rfitdistr(x, n = length(x), p = 1)rfitdistr(x, n = length(x), p = 1)
Creates synthetic data using Generative Adversarial Networks with predefined architectures optimized for different data types.
rgan(x, n, p = 1, type_input = c("unimodal", "mixture", "financial"))rgan(x, n, p = 1, type_input = c("unimodal", "mixture", "financial"))
x |
Matrix or data.frame. Input data to learn distribution from. |
n |
Integer. Number of synthetic samples (rows) to generate. |
p |
Integer. Number of features (columns) to generate. Currently must match input dimension. |
type_input |
Character. Type of data distribution: "unimodal", "mixture", or "financial". |
A matrix of synthetic data with n rows and p columns.
## Not run: # Unimodal normal distribution real_data <- matrix(rnorm(1000, mean = 2, sd = 4)) synthetic <- rgan(real_data, n = 500, p = 1, type_input = "unimodal") # Mixture distribution mixture <- rbinom(1000, 1, 0.5) real_mixture <- matrix(mixture * rnorm(1000, 2, 1) + (1-mixture) * rnorm(1000, 8, 2)) synthetic <- rgan(real_mixture, n = 500, p = 1, type_input = "mixture") # Financial data synthetic <- rgan(EuStockMarkets[1:500,1], n = 500, p = 1, type_input = "financial") ## End(Not run)## Not run: # Unimodal normal distribution real_data <- matrix(rnorm(1000, mean = 2, sd = 4)) synthetic <- rgan(real_data, n = 500, p = 1, type_input = "unimodal") # Mixture distribution mixture <- rbinom(1000, 1, 0.5) real_mixture <- matrix(mixture * rnorm(1000, 2, 1) + (1-mixture) * rnorm(1000, 8, 2)) synthetic <- rgan(real_mixture, n = 500, p = 1, type_input = "mixture") # Financial data synthetic <- rgan(EuStockMarkets[1:500,1], n = 500, p = 1, type_input = "financial") ## End(Not run)
Generates samples from a Gaussian kernel density estimate of a numeric vector.
rgaussiandens( x, n = length(x), p = 1, seed = 123, method = c("antithetic", "traditional") )rgaussiandens( x, n = length(x), p = 1, seed = 123, method = c("antithetic", "traditional") )
x |
Numeric vector to estimate density from. |
n |
Integer. Number of samples per replicate (default: length of x). |
p |
Integer. Number of replicates (default: 1). |
seed |
Integer. Random seed for reproducibility (default: 123). |
method |
Character. Sampling method: "antithetic" or "traditional". |
A vector or matrix of samples from the estimated density.
x <- rnorm(10) rgaussiandens(x, n = 10, p = 3)x <- rnorm(10) rgaussiandens(x, n = 10, p = 3)
Simulate using surrogate data
rsurrogate(x, n = length(x), p = 1, seed = 123)rsurrogate(x, n = length(x), p = 1, seed = 123)
Generates a univariate time series with specified trend, seasonality, and noise distribution.
simulate_time_series_1( n, trend = c("linear", "quadratic"), seasonality = c("none", "sinusoidal"), distribution = c("normal", "student"), noise_sd = 10, seed = 123 )simulate_time_series_1( n, trend = c("linear", "quadratic"), seasonality = c("none", "sinusoidal"), distribution = c("normal", "student"), noise_sd = 10, seed = 123 )
n |
Integer. Number of data points. |
trend |
Character. "linear" or "quadratic". |
seasonality |
Character. "none" or "sinusoidal". |
distribution |
Character. "normal" or "student". |
noise_sd |
Numeric. Standard deviation of noise. |
seed |
Integer. Random seed for reproducibility. |
A time series object.
ts_data <- simulate_time_series_1(n = 100L, trend = "quadratic", seasonality = "sinusoidal", noise_sd = 2500, distribution = "normal") plot(ts_data, type = "l", main = "Simulated Time Series")ts_data <- simulate_time_series_1(n = 100L, trend = "quadratic", seasonality = "sinusoidal", noise_sd = 2500, distribution = "normal") plot(ts_data, type = "l", main = "Simulated Time Series")
Simulate a univariate time series dataset 2
simulate_time_series_2( n, trend = c("linear", "sinusoidal"), seasonality = FALSE, noise_sd = 0.1, ar = 0, ma = 0, seed = 123 )simulate_time_series_2( n, trend = c("linear", "sinusoidal"), seasonality = FALSE, noise_sd = 0.1, ar = 0, ma = 0, seed = 123 )
n |
numerical, number of data points |
trend |
string, "linear" or "sinusoidal" |
seasonality |
string, "none" or "sinusoidal" |
noise_sd |
numerical, standard deviation of noise |
ar |
autoregressive order |
ma |
moving average order |
seed |
int, reproducibility seed |
a native time series object
ts_data <- simulate_time_series_2( n = 100L, trend = "sinusoidal", seasonality = TRUE, noise_sd = runif(n = 1, min = 20, max=50) ) plot(ts_data, type = "l", main = "Simulated Time Series")ts_data <- simulate_time_series_2( n = 100L, trend = "sinusoidal", seasonality = TRUE, noise_sd = runif(n = 1, min = 20, max=50) ) plot(ts_data, type = "l", main = "Simulated Time Series")
Simulate a univariate time series dataset 3
simulate_time_series_3(n = 100, seed = 123)simulate_time_series_3(n = 100, seed = 123)
n |
numerical, number of data points |
seed |
int, reproducibility seed |
a native time series object
print(simulate_time_series_3(10))print(simulate_time_series_3(10))
Simulate a univariate time series dataset 4
simulate_time_series_4(n = 600, psi = 0.1, theta = 0.1, seed = 123)simulate_time_series_4(n = 600, psi = 0.1, theta = 0.1, seed = 123)
n |
numerical, number of data points |
psi |
1st parameter for innovation variance (in [0, 1]) |
theta |
2nd parameter for innovation variance (in [0, 1]) |
seed |
int, reproducibility seed |
a native time series object
plot(simulate_time_series_4())plot(simulate_time_series_4())
split time series sequentially
splitts(y, split_prob = 0.5, return_indices = FALSE, ...)splitts(y, split_prob = 0.5, return_indices = FALSE, ...)
Trains a simple Generative Adversarial Network (GAN) using user-supplied generator and discriminator functions.
train_gan( train_dat, generator_fn, discriminator_fn, n_iter = 5, epochs_per_iter = 30, num_resamples = 500, seed = 123L )train_gan( train_dat, generator_fn, discriminator_fn, n_iter = 5, epochs_per_iter = 30, num_resamples = 500, seed = 123L )
train_dat |
Matrix or data.frame. Training data for the GAN. |
generator_fn |
Function. Returns a generator model given latent dimension. |
discriminator_fn |
Function. Returns a discriminator model given input dimension. |
n_iter |
Integer. Number of training iterations (default: 5). |
epochs_per_iter |
Integer. Number of epochs per iteration (default: 30). |
num_resamples |
Integer. Number of synthetic samples to generate after training (default: 500). |
seed |
Integer. Random seed for reproducibility (default: 123). |
Requires both 'tensorflow' and 'keras3' packages to be installed. The generator and discriminator functions must return valid Keras models.
A list containing the trained GAN model ('model'), a matrix of synthetic resamples ('resamples'), and elapsed training time in seconds ('time').
# Example usage (requires keras3 and tensorflow) # train_gan(train_dat = matrix(rnorm(100)), generator_fn = my_gen, discriminator_fn = my_disc)# Example usage (requires keras3 and tensorflow) # train_gan(train_dat = matrix(rnorm(100)), generator_fn = my_gen, discriminator_fn = my_disc)
A collection of univariate time series from various R packages and synthetic data
ts_datasetts_dataset
A list containing multiple time series objects
Various R packages including astsa, datasets, expsmooth, fma, forecast, fpp2, MASS, tswge, and synthetic data