Results from M3 forecasting competition

July 03, 2017 - John-Paul Clarke, Chief Science Officer

Two key questions for any person or group doing forecasting are: (1) How good are we in predicting reality? (2) How good are we compared to others? And, as you might expect, Jens (my co-founder) kept asking me these two questions. So, to get Jens off my back, my team and I decided to compare ourselves to the 24 other methods in the International Institute of Forecasters (IIF) M3-Competition.

Per Wikipedia, the M3-Competition was intended to both replicate and extend the features of the two prior IIF competitions, the M-competition and M2-Competition, through the inclusion of more time series and more methods (e.g., neural networks) and researchers. A total of 3003 time-series were used, and the paper documenting the results of the competition was published in the International Journal of Forecasting in 2000. The raw data was also made available on the IIF website so that other researchers, such as ourselves, could compare ourselves to those who had already participated in the competition.

Our core strength at Pace is the prediction of discrete events with discrete values. Thus, we randomly selected 51 discrete-event discrete-value time-series from the set of time-series in the M3 database, and used them as the basis for comparing the performance of our Sibyl forecasting engine to 5 of the afore-mentioned 24 methods. The methods were selected because they are the most popular methods and because they gave the best results. The M3 time series included yearly, quarterly, monthly, daily, and other time series, taken from different domains, such as: micro, industry, macro, finance, demographic, and other. To ensure that accurate forecasting models can be developed, minimum thresholds were set for the number of observations: 14 for yearly series, 16 for quarterly series, 48 for monthly series, and 60 for other series.

The Mean Average Percentage Error (MAPE) for the 51 time-series are shown in the table below. For sake of clarity, we have only provided the numbers for Sibyl and the best four pre-existing forecasting methods. As you can see, Sibyl has a lower MAP compared to the other methods. In fact, the Sibyl forecasts were closest to the actual values in more than half of the instances.

Table 1: Comparison of Sibyl to Best 5 Forecasting Methods

To give you an indication of performance in specific instances, we selected and present below the forecast and actual time histories for arguably the 5 nastiest time-series in the set of 51 time-series. That is, the time-series where the training data was training data was either very stochastic or bear little resemblance to the comparison data. As you can see, Sibyl does incredibly well, at least better than even I expected, in capturing the underlying dynamics of the time series.