This is the fourth issue in our ARMA Unplugged modeling series. In this issue, we start by examining the nature of input time series and qualify them as Flow/Stock-type of series. This breakdown holds a barrier to their exposure to or influence by calendar-based events.
In econometric theory, a time series sample is one that has equally-spaced observations across time. But are all periods treated equally in these series? What about when the sample period is a trading day, calendar month, or quarter? As we will see, it depends.
Why do we care?
Let’s try to answer this question with two examples:
Example 1: Consider the monthly production level of a given factory. Assuming the factory is only operational on weekdays, and production capacity (machine and labor) is held constant, then a month with more weekdays is expected to have higher values.
Example 2: Consider the monthly asset’s returns. Let’s assume that no major events, market sentiment changes, or major news affecting the asset’s returns ever develop throughout any holding period in our sample data horizon. If that is the case, one may argue that the expected returns of the months with more trading days will be higher and more volatile than returns for months with fewer trading days.
In short, calendar events influence the values of the time series sample, and a prior adjustment for those events will help us to better understand the process, modeling, and forecast.
By now, your interest should be piqued enough to introduce the central questions we are trying to answer here: What are the calendar events? How do we test for their influence on a given time series? How do we quantify and/or adjust for their effect prior to and post the analysis phase?
How do we exploit this new calendar information to our advantage?
By the end of this paper, we are hoping to leave you with a solid understanding of calendar events and the type of adjustments for the observation values you can do before your march towards a serious analysis exercise.
Background
In econometric literature, time series is broken into two separate groups: stock and flow. The stock time series data measure one attribute at a time (e.g. unemployment, inflation, etc.), while the flow time series is a measure of an activity (e.g. production level, asset’s return, etc.).
Why do we need to make this distinction? Stock time series are not affected by calendar events (e.g. trading days in a period, Easter, etc.), but flow-type time series are.
For flow-type time series, the observed value depends on the absolute length of the period in terms of (1) calendar day, (2) workdays, or the occurrence of special calendar events in this period. Consider the following examples:
- Monthly production level of a factory: Holding everything else (e.g. capacity) constant, we would expect a higher throughput in months with more workdays.
- Monthly retail sales: holding everything else (e.g. inventory) constant, we expect a higher sales level in the month around the holiday(s) (e.g. Easter, Christmas, Ramadan, etc.)
Hang on, what about leap years? Is the month of February the same in a leap year versus a non-leap year? For a flow-type series, it is not.
Calendar Event
A calendar event is a deterministic (predictable) factor and exogenous to the time series process, but it may influence the observed values in each period.
Examples
- Number of workdays in a period
- Number of occurrences of a given weekday (e.g. Monday) in a period
- Occurrence of a moving holiday (e.g. Easter) in a period
- Occurrence of a holiday in a period
- Long weekend effect?
- For events that can span a few periods (e.g. Ramadan), the number of days (or weekdays) of the holiday in each period
Framework
Calendar events (weekdays, holidays, events, etc.) are expressed as regression variables which affect the time series conditional mean.
$$y_t=\beta_o+\sum_{i=1}^N{\beta_j x_{j,t}}+z_t$$ $$z_t=(y_t-\beta_o-\sum_{i=1}^N{\beta_j x_{j,t}})$$
Where
- $N$ is the number of regression variables
- $\{z_t\}$ are the correlated residuals
Dissimilar to multiple-linear regression, the regression residuals $\{z_t\}$ are correlated, so we use an econometric model to capture correlations (e.g. ARMA/ARIMA), and yield white-noise residuals. The new model can be expressed as follows:
$$z_t=f(z_{t-1},z_{t-2}\cdots z_1, a_{t-1},a_{t-2}\cdots a_1)+a_t$$ $$a_t \sim \mathrm{i.i.d}\sim\Phi(0,\sigma^2)$$
The modeling above is similar to an ARMAX-type of model.
IMPORTANT
- This formulation is defined as prior-adjusted, as the regression variables are captured first and residuals are the ones modeled with ARIMA.
- The values of the regression coefficients and the ARMA/ARIMA models are calculated concurrently. The LLF optimizer will include the regression coefficients as part of its free parameters pool as we seek optimal maxima.
- To evaluate whether a regression variable is significant in the model, the AIC (Akaike information criteria) values for different candidate models are compared (i.e. AIC test). AIC penalizes the model with more free parameters
The US Census bureau fitted a seasonal ARIMA model with a regression and called it regARIMA; that is, a regARIMA with order $(p,d,q)\times(P,D,Q)_s$ regARIMA is expressed as follows:
$$\Phi(L^s)\phi(L)(1-L)^d (1-L^s)^D z_t=\Theta(L^s)\theta(L)a_t$$ $$\Phi(L^s)\phi(L)(1-L)^d (1-L^s)^D (y_t-\beta_o-\sum_{i=1}^N{\beta_j x_{j,t}})=\Theta(L^s)\theta(L)a_t$$ $$a_t \sim \mathrm{i.i.d}\sim\Phi(0,\sigma^2)$$
The order of the ARIMA model $(p,d,q)\times(P,D,Q)_s$ is identified using standard model identification procedures (e.g. Box-Jenkins, etc.) and the values of the regression parameters and the ARIMA model are evaluated during the same process.
The model has a total number of parameters of $p+q+P+Q+2$.
Now, we need to define the regression variable for different calendar events: moving holidays, leap years, and the trading day’s effect.
1. Moving Holiday
Moving holidays are holidays that occur each year, but where the exact timing shifts under the Gregorian calendar system. Examples of moving holidays include Easter Sunday, Labor Day, and Thanksgiving Day. These holidays are considered moving holidays because their effects on series have the potential to affect more than one month.
The moving holiday variable assumes that the fundamental structure of the time series changes for a fixed number of days before each of these three holidays. For Thanksgiving Day, the effect continues after Christmas (Dec. 24th). The moving holiday has an effect on retail sales, and/or tourism and/or travel.
How do we define the variables?
Example 1: Easter Holiday
The Easter holiday effect is assumed to start w-days before the moving holiday, and thus it can be distributed in February, March, or both.
$$E(w,t)=\frac{N_g}{w}$$
Where
- $N_g$ is the number of w-days falling in month t
So, if we assume the Easter effect has a 10-day effect before the holiday, then we can distribute the effect on the months where the period occurs.
Another method of treating Easter is one adopted by Statistics Canada: If Easter falls on or before April w-th, then the Easter variable is defined as follows:
$$E(w,t)=\left\{\begin{matrix} n_g/w & \mathrm{March}\\ -n_g/w & \mathrm{April} \\ 0 & \mathrm{Else}& \end{matrix}\right. $$
If Easter falls after April w, then $E(w,t)=0$ for all months that year.
Where:
- $n_g$ is the number of w-days that fall in March
Although we are referring to the same moving holiday, the regression variable definition varies between Canada and American statistics.
Example 2: Thanksgiving holiday
The Thanksgiving holiday effect is assumed to start w-days before thanksgiving and continue to Christmas (i.e. Dec 24th). If $w\prec 0$, it is assumed that the effect starts after Thanksgiving.
$$\mathrm{ThC}(w,t)=\frac{N_g}{N}$$
Where:
- $N$ is the number of days from w-days before Thanksgiving Through Dec. 24th
- $N_g$ is the number of days of $N$ that fall in month t
2. Fixed Holidays
Unlike moving holidays, fixed holidays occur either on a fixed date or on a particular day of a given month and do not typically affect other months. For this reason, fixed holidays are believed to be absorbed by the seasonal component of the series, and no special treatment for them is needed.
3. Trading Day (or day-of-the-week) effect
The trading day effect is related to months having different numbers of each day of the week from year to year. In each month, there are four weeks and an additional one, two, or three days, which translates that each weekday occurring at least four times/month, and some weekdays will occur five times. Why this is important: recall our example on the factory, or a retail store open during the weekdays.
The trading-day effect occurs when a series is affected by the differing day-of-the-week compositions of the same calendar month in different years.
Trading-day effects can be modeled (monthly and quarterly flow) with seven variables that represent (no. of Mondays). . . (no. of Sundays) in month t.
$$D_{1,t}=\{\mathrm{no.\:of\:Mondays\:in\:month}\}$$ $$D_{2,t}=\{\mathrm{no.\:of\:Tuesdays\:in\:month}\}$$ $$\cdots$$ $$D_{7,t}=\{\mathrm{no.\:of\:Sundays\:in\:month}\}$$
Furthermore, Bell and Hillmer (1983) note, however, that a better parametrization of the same effect instead uses 6 contrast variables defined as (no. of Mondays) - (no. of Sundays), . . . , (no. of Saturdays) - (no. of Sundays)
$$TD_{1,t}=(\mathrm{no.\:of\:Mondays})-(\mathrm{no.\:of\:Sundays})$$ $$TD_{2,t}=(\mathrm{no.\:of\:Tuesdays})-(\mathrm{no.\:of\:Sundays})$$ $$\cdots$$ $$TD_{6,t}=(\mathrm{no.\:of\:Saturdays})-(\mathrm{no.\:of\:Sundays})$$
In addition, we can also use a more parsimonious model to capture the trading days effect. This model reduces the number of trading day regressors from six to one by assuming the daily effect of weekdays (Monday through Friday) is the same, and the daily effect of weekend days (Saturday and Sunday) is the same.
$$TD_{t}=\sum_{i=1}^5 D_{i,t}-\frac{5}{2}\sum_{j=6}^7 D_{j,t}$$
While this constrained model has fewer regression variables (aka regressors), it is potentially less precise due to its fundamental assumption that weekdays have the same effect, and Saturdays and Sundays have the same effect.
4. Leap Year Effect
$$\mathrm{LY}_{t}=\left\{\begin{matrix} 0.75 & \mathrm{Feb\:in\:Leap\:Year\:(Q1)}\\ -0.25 &\mathrm{Feb\:in\:non-Leap\:Year\:(Q1)}\\ \: 0 & \mathrm{Else} \end{matrix}\right.$$
5. Length of the month effect
A time series will not exhibit a trading day effect if levels of activity are constant over each day of the week. However, different months have different lengths (28, 29, 30, and 31 days), hence monthly activity can vary purely because certain months are longer than others. This is known as the length of month effect.
The length of month effect is handled in two ways. (1) For non-February months the effect is automatically absorbed into the seasonal component of the decomposition of the series because these months have constant month-lengths. For the month of February, the length of the month is handled with a leap year regression variable.
Furthermore, (2) If a series has trading day corrections, then these adjustments will include the length of the month effect. If there is no trading day effect in a time series, then the length of month effect is accounted for in the seasonal component.
In sum, we can ignore the length of month effect (monthly data) or the length of the quarter for quarterly data.
Conclusion
We have covered a lot of ground in our discussion here, but the main takeaway is to understand the underlying sample data and its potential sensitivity (influence) to calendar events (e.g. trading day, moving holiday(s), leap year, etc.).
Once we have an initial set of suspects, we can decide how we quantify their occurrence or effect for each period. The decision between different variables (for the same holiday effect) is mainly driven through experience and, to some extent, the software tools available. For instance, in our example of the Easter holiday, I restricted the variables that are supported by the X12-ARIMA program.
The burning question I suspect is on your mind now: can I design my own variable(s) to model the effect of a moving holiday? You bet! After all, you only need to generate the time series for this variable for past data points and include it as an external regression variable in the model.
In addition, if you are conducting a forecast, then you’ll need to provide the values for this variable covering the out-of-sample forecast horizon.
By defining your own variable, you can support non-US or Canada moving holidays (e.g. Chinese New Year, Ramadan, Hanukkah, etc.), or you can redefine the (weight) effect of days prior or post the holiday. As you can see, the possibilities are endless.
In our next issue, we’ll delve into the seasonal adjustment procedure using non-parametric methods (e.g. X11/X-12-ARIMA) and model-based (e.g. SEATS/TRAMO in X-13ARIMA-SEATS) methods.
Comments
Article is closed for comments.