Title: | Penalized Regression with Inferred Seasonality Module - Forecasting Unemployment Initial Claims using 'Google Trends' Data |
---|---|
Description: | Implements Penalized Regression with Inferred Seasonality Module (PRISM) to generate forecast estimation of weekly unemployment initial claims using 'Google Trends' data. It includes required data and tools for backtesting the performance in 2007-2020. |
Authors: | Dingdong Yi [aut, cre], Samuel Kou [aut], Shaoyang Ning [aut] |
Maintainer: | Dingdong Yi <[email protected]> |
License: | GPL-2 |
Version: | 0.2.1 |
Built: | 2024-10-24 02:57:31 UTC |
Source: | https://github.com/ryanddyi/prism |
Out-of-sample prediction for whole period
back_test(n.lag = 1:52, s.window = 52, n.history = 700, stl = TRUE, n.training = 156, UseGoogle = T, alpha = 1, nPred = 0, discount = 0.01, sepL1 = F)
back_test(n.lag = 1:52, s.window = 52, n.history = 700, stl = TRUE, n.training = 156, UseGoogle = T, alpha = 1, nPred = 0, discount = 0.01, sepL1 = F)
n.lag |
the number of lags to be used as regressor in Stage 2 of PRISM (by default = 1:52 for weekly data) |
s.window |
seasonality span in seasonal decomposition (by default = 52 for weekly data) |
n.history |
length of training period (e.g. in weeks) for seasonal decomposition. |
stl |
if TRUE, use STL seasonal decomposition; if FALSE, use classic additive seasonal decomposition. |
n.training |
length of training period in Stage 2, penalized linear regression (by default = 156) |
UseGoogle |
boolean variable indicating whether to use Google Trend data. |
alpha |
penalty between lasso and ridge. alpha=1 represents lasso, alpha=0 represents ridge, alpha=NA represents no penalty (by default alpha = 1). |
nPred |
the number of periods ahead for forecast. nPred = 0,1,2,3. |
discount |
exponential weighting: (1-discount)^lag. |
sepL1 |
if TRUE, use separate L1 regularization parameters for time series components and exogenous variables (Goolgle Trend data) |
prediction
nPred
week ahead prediction of the whole periods (07 - 20).
claim_data = load_claim_data() # It may take a few minutes. prism_prediction = back_test() # evaluate the out-of-sample prediction error as a ratio to naive method evaluation_table(claim_data, prism_prediction)
claim_data = load_claim_data() # It may take a few minutes. prism_prediction = back_test() # evaluate the out-of-sample prediction error as a ratio to naive method evaluation_table(claim_data, prism_prediction)
Out-of-sample prediction evaluation
evaluation_table(claim_data, prism_prediction)
evaluation_table(claim_data, prism_prediction)
claim_data |
the output of load_claim_data(). |
prism_prediction |
the output of back_test(). |
Load weekly unemployment initial claim data and related Google Trend data over 5-year span (each week ends on the Saturday). The list of Google search terms is the same as in paper.
load_5y_search_data(folder = "0408")
load_5y_search_data(folder = "0408")
folder |
foldernames for a certain periods of Google Trends data. The foldernames are "0408", "0610", "0812", "1014", "1216", "1418", "1620". For example, the folder "0408" is for 2004-2008. |
A list of following named xts objects
claim.data
unemployment initial claim data of the same span as Google Trend data.
claim.all
load all unemployment initial claim data since 1967
claim.early
unemployment initial claim data from 1980-01-06 to the start of claim.data
.
allSearch
Google Trends data of a span over five years. It is in the scale of 0 – 100.
Load weekly unemployment initial claim data (each week ends on the Saturday).
load_claim_data(GT.startDate = "2004-01-03", GT.endDate = "2016-12-31")
load_claim_data(GT.startDate = "2004-01-03", GT.endDate = "2016-12-31")
GT.startDate |
start date of claim data |
GT.endDate |
end date of claim data |
A list of following named xts objects
claim.data
unemployment initial claim data from GT.startDate to GT.endDate.
claim.all
load all unemployment initial claim data since 1967
claim.early
unemployment initial claim data prior to GT.startDate
A function for nowcasting and forecasting time series.
prism(data, data.early, GTdata, stl = TRUE, n.history = 700, n.training = 156, alpha = 1, UseGoogle = T, nPred.vec = 0:3, discount = 0.01, sepL1 = F)
prism(data, data.early, GTdata, stl = TRUE, n.history = 700, n.training = 156, alpha = 1, UseGoogle = T, nPred.vec = 0:3, discount = 0.01, sepL1 = F)
data |
time series of interest as xts, last element can be NA. (e.g., unemployment initial claim data in the same period as |
data.early |
historical time series of response variable before contemporaneous exogenous data, |
GTdata |
contemporaneous exogenous data as xts. (e.g., Google Trend data) |
stl |
if TRUE, use STL seasonal decomposition; if FALSE, use classic additive seasonal decomposition. |
n.history |
training period for seasonal decomposition. (by default = 700 wks) |
n.training |
length of regression training period (by default = 156) |
alpha |
penalty between lasso and ridge. alpha=1 represents lasso, alpha=0 represents ridge, alpha=NA represents no penalty. |
UseGoogle |
boolean variable indicating whether to use Google Trend data. |
nPred.vec |
the number of periods ahead for forecast. nPred.vec could be a vector of intergers. e.g. nPred.vec=0:3 gives results from nowcast to 3-week ahead forecast. |
discount |
exponential weighting: (1-discount)^lag (by default = 0.01). |
sepL1 |
if TRUE, use separate L1 regularization parameters for time series components and exogenous variables (Goolgle Trend data) |
A list of following named objects
coef
coefficients for Intercept, z.lags, seasonal.lags and exogenous variables.
pred
a vector of prediction with nPred.vec
weeks forward.
prism_data = load_5y_search_data('0610') data = prism_data$claim.data[1:200] # load claim data from 2006-01-07 to 2009-10-31 data[200] = NA # delete the data for the latest date and try to nowcast it. data.early = prism_data$claim.earlyData # load claim prior to 2006 GTdata = prism_data$allSearch[1:200] # load Google trend data from 2006-01-07 to 2009-10-31 result = prism(data, data.early, GTdata) # call prism method result$pred # output 0-3wk forward prediction
prism_data = load_5y_search_data('0610') data = prism_data$claim.data[1:200] # load claim data from 2006-01-07 to 2009-10-31 data[200] = NA # delete the data for the latest date and try to nowcast it. data.early = prism_data$claim.earlyData # load claim prior to 2006 GTdata = prism_data$allSearch[1:200] # load Google trend data from 2006-01-07 to 2009-10-31 result = prism(data, data.early, GTdata) # call prism method result$pred # output 0-3wk forward prediction
PRISM penalized linear regression function for a range of time (only used internally for back testing)
prism_batch(data, GTdata, var, n.training = 156, UseGoogle = T, alpha = 1, nPred.vec = 0:3, start.date = NULL, n.weeks = NULL, discount = 0.01, sepL1 = F)
prism_batch(data, GTdata, var, n.training = 156, UseGoogle = T, alpha = 1, nPred.vec = 0:3, start.date = NULL, n.weeks = NULL, discount = 0.01, sepL1 = F)
data |
time series of interest as xts, last element can be NA. (e.g., unemployment initial claim data in the same period as |
GTdata |
contemporaneous exogenous data as xts. (e.g., Google Trend data) |
var |
generated regressors from stage 1. |
n.training |
length of regression training period (by default = 156) |
UseGoogle |
boolean variable indicating whether to use Google Trend data. |
alpha |
penalty between lasso and ridge. alpha=1 represents lasso, alpha=0 represents ridge, alpha=NA represents no penalty. |
nPred.vec |
the number of periods ahead for forecast. nPred.vec could be a vector of intergers. e.g. nPred.vec=0:3 gives results from nowcast to 3-week ahead forecast. |
start.date |
the starting date for forecast. If NULL, the forecast start at the earliest possible date. |
n.weeks |
the number of weeks in the batch. If NULL, the forecast end at the latest possible date. |
discount |
exponential weighting: (1-discount)^lag (by default = 0.01) |
sepL1 |
if TRUE, use separate L1 regularization parameters for time series components and exogenous variables (Goolgle Trend data) |
A list of following named objects
coef
coefficients for Intercept, z.lags, seasonal.lags and exogenous variables.
pred
prediction results for n.weeks
from start.date
.
Stage 1 of PRISM. The function generates prism seasonal components and seasonally adjusted lag components.
var_generator(data, data.early, stl = TRUE, n.lag = 1:52, s.window = 52, n.history = 700)
var_generator(data, data.early, stl = TRUE, n.lag = 1:52, s.window = 52, n.history = 700)
data |
time series of interest as xts, last element can be NA. |
data.early |
historical time series of response variable before Google Trend data is available. (e.g., unemployment initial claim prior to 2004) |
stl |
if TRUE, use STL seasonal decomposition; if FALSE, use classic additive seasonal decomposition. |
n.lag |
the number of lags to be used as regressor in Stage 2 of PRISM (by default = 1:52 for weekly data) |
s.window |
seasonality span (by default = 52 for weekly data) |
n.history |
training period for seasonal decomposition. (by default = 700 wks) |
A list of following named objects
y.lags
seasonally adjusted components, z_lag, and seasonal components, s_lag.