Predicting New Car Registrations: Nowcasting with Google Search and Macroeconomic Data

Published as: E. Tomczyk, T. Doligalski, Predicting New Car Registrations: Nowcasting with Google Search and Macroeconomic Data, [in:] Sł. Partycki (ed.),  E-społeczeństwo w Europie Środkowej i Wschodniej. Teraźniejszość i perspektywy rozwoju (e-Society in Middle and Eastern Europe. Present and Development Perspectives), Wydawnictwo KUL, Lublin 2015, p. 228-236.

Download the paper as pdf from SSRN:  Predicting New Car Registrations: Nowcasting with Google Search and Macroeconomic Data


Abstract

Based on search queries data and a macroeconomic index (PMI) we attempt to predict new car registrations. As the forecasting horizon is short, the modelling is performed in accordance with the idea of nowcasting. The study covers 48 monthly observations for sixteen car producers present on Polish market. The proposed model explains the level of new car registrations for the five major makes and allows to forecast the number of registrations for the current and next month.

Keywords: nowcasting, prediction, modelling, car, automotive, demand, registrations,  Internet, search, Google, Poland, CEE, PMI

 

Introduction

In modern economies there are several sources of data on real-time activities which may help in modelling the behaviour of various entities, such as consumers or businesses.  These sources of information include online auctions, parcel shipment companies, credit card or mobile operators, as they possess precise data on transactions in certain locations [4, p.1]. A special role among them is played by search engines which provide data on frequencies of their queries. Possibly the most popular is Google Trends presenting both number and location of chosen searches.  Availability of such data enables modelling in accordance with the idea of  nowcasting.

Nowcasting is defined as the prediction of the present, the very near future and the very recent past [2, p.4]. The reason for modelling the current or recent past events is the delay in availability of data, which in modern economies can amount to weeks or even months. The other circumstances increasing the application value of nowcasting are macroeconomic turbulences, great uncertainty and unique shocks, as they cause past values to lose their predictive power [8, p.4].  As Bańbura et al. state ‘nowcasting is based on exploitation of data which is published early and possibly at higher frequencies than the target variable of interest in order to obtain an ‘early estimate’ before the official figure becomes available’ [2, p.4].  The scope of nowcasted activities is wide and includes current changes in unemployment, private consumption or – beyond just the economic activities – development of infectious diseases [4, p. 2].

As mentioned above, search engines are a valuable source of data on various entities’ behaviour (mostly consumers’ behaviour) in modern economies. Their usefulness results from both their popularity and type of data gathered. Search engine queries, as opposed to questionnaires, are not biased by submitting false information aimed at creating a desired image. Below we present three conditions the fulfillment of which should increase the application value of nowcasting with search data.

  • The nowcasted activity is preceded with search engine queries.

The searching behaviour depends heavily upon the type of activity. Consumers are more likely to search information on products of high involvement or information related to some risk [5, p. 1340]. Under certain circumstances people tend to rely more on friends’ opinions and less on online searching information online (e.g. on local markets).

  • It is possible to identify search queries associated with the nowcasted activity.

The common problem is selection of queries not only typical for the activity, but also queries that precede it. Queries including brand or product name may be related not only to pre-purchase search, but also with post-purchase services or buying an used product. On the other hand, the list of specific queries related strictly to pre-purchase phase may be long and difficult to identify. Moreover, these phrases may be rarely entered and data on their search frequency may be not available.

(iii) Searches lead to a nowcasted activity to a similar extent.
In other words, searching consumers have similar purchasing potential. A consumer looking for information on a movie to watch in cinema is likely to do it only once (or not at all). An investor entering abbreviation of company quoted on stock exchange may in theory buy or sell any number of company’s stocks. In the first case, search queries are more likely to serve as a valuable predictor of the demand.  Demand nowcasting with Google search is probably easier on B2C than on B2B market as the size of transactions following the search varies less.  Some brands or products may however attract interests of consumers who do not intend to purchase them (e.g. prestigious brands, innovative products).

Interestingly Choi and Varian provide an example which demonstrates that the use of more sophisticated methods may help if the three above listed conditions are not met [4, p. 2].    They modeled a confidence index for Australian consumers by identification of phrase categories, which frequency of entering is correlated with the historical levels of the consumer confidence. The prediction of the consumer confidence index is based on the assumption that the identified correlations will persist in the future.

 

Nowcasting of automotive markets

We decided on car registrations as the modelled variable for the following two reasons. First, data on car registrations are available with monthly frequency thus offering relatively long time series. This is not typical; many macroeconomic data are available in quarterly or yearly intervals only. And second, scope of these data includes 20 bestselling car manufacturers in a given month and offers a wide cross-section through Polish new car market.

The customer behaviour on automotive market imperfectly meets the three conditions mentioned in the previous section. The potential buyers are likely to conduct search queries before the purchase in order to recognize the vehicle parameters or find out the dealer’s location. The names of the car makes are the phrases associated with the purchase. Unfortunately, these phrases as the broad matches may refer also to other activities (e.g. looking for spare parts). As we decided on modelling both consumer and business car registrations, a single search may lead to the purchase of more than one car. However, the great majority of Polish companies are small and medium businesses, so we can safely assume that an average transaction will include rather small number of vehicles.

The automotive market was the subject of a number of nowcasting analyses. Sun, Li, Li and Zhang  forecast automotive demand in China with macroeconomic, price, consumer and other factors (i.e. sales of competitive automobile types, advertising investment) [9, p. 431]. The category of consumer factors includes consumer satisfaction index as well as searching index based on queries of Baidu, the leading Chinese search engine. Their model offers more accurate car sales prediction than popular benchmark models, especially during market fluctuations. Researching the Chilean automotive market, Carrière-Swallow and Labbé create Google Trends Automotive Index, which together with an autoregressive component form the set of explanatory variables. The proposed model also outperforms and provides forecasts more rapidly than benchmark models [3, p. 5].

Results obtained in our previous article point to importance of autoregressive component (that is, lagged number of car registrations) and Internet search queries in modelling number of passenger car registrations in Poland. In case of four automotive brands (i.e. Fiat, Opel, Skoda, Toyota) autoregressive component and search data were two major factors influencing number of first registrations. The registrations of the remaining two brands (i.e. Peugeot, Renault) can be explained with previous level of registrations and car manufacturer web site traffic [6, p.6].

In this paper, we add a macroeconomic component to verify dependence of car registrations on the general situation in the economy. We also extend the scope of research to 16 best-selling brands on Polish market.  The purpose of this paper is to model new car registrations with the Google search and macroeconomic data and to evaluate forecasting quality of the nowcasting models.


Description of data

Our sample covers 48 monthly observations from January 2011 to December 2014.[1] Monthly data on new registrations of passenger cars are provided by the Polish Association of Automotive Industry (PAAI)[2] on the basis of the Central Register of Vehicles database administered by the Ministry of the Interior. A one-month lag is allowed by Polish law between the purchase of a private car and its registration. Current data on number of registrations becomes available on the PAAI webpage around the 5th day of the next month. Due to availability of data, the following 16 passenger car makes were included in the initial empirical analysis: BMW, Citroen, Dacia, Fiat, Ford, Honda, Hyundai, Kia, Nissan, Opel, Peugeot, Renault, Skoda, Suzuki, Toyota, and Volkswagen. Together, they constitute 85% of new car registrations in Poland.

Relative numbers of queries pertaining to car makes are defined in a way proposed by Choi and Varian [4, p.3]. Original search data is defined relative to BMW registrations in the week of June 6 –14, 2014. For aggregated monthly series, data is rescaled relative to the maximum monthly query in the period analysed (that is, number of BMW queries in June 2014, equal to 280.71) and defined in percentage terms.

Table 1 presents each car producers’ share in total volume of searches and ratio of number of searches to number of registrations (henceforth, S/R ratio). The shares of car producers in searches are calculated as the ratio of shares of particular producer to the sum of searches of all producers. Thus, they sum to 100%. S/R ratio is calculated by dividing the producer’s share in Google searches by its share in registrations.  If the ratio exceeds 1, the make is proportionally more often searched than it is registered. In case of BMW and Honda, the ratios are the highest. This illustrates that these brands to the highest extent attract interest of consumers who do not follow with actual purchase. On the other hand, Dacia, Volkswagen and Skoda are the producers who are relatively rarely searched as compared to their number of registrations. The diverse behaviour can be explained with the fact that BMW and Honda can be perceived as aspirational brands, while the latter three makes offer rather utilitarian than symbolic value, and thus draw proportionally less interest.


Table 1. Car manufacturers’ shares in Google searches and ratios of number of searches to number of registrations (S/R)

 

Shares inGoogle searches S/R ratio
BMW 12% 4.98
Citroen 8% 3.06
Dacia 4% 1.73
Fiat 8% 1.63
Ford 7% 1.47
Honda 12% 1.37
Hyundai 10% 1.16
Kia 5% 1.15
Nissan 5% 0.94
Opel 6% 0.90
Peugeot 7% 0.74
Renault 4% 0.58
Skoda 3% 0.46
Suzuki 6% 0.44
Toyota 3% 0.29
WV 1% 0.28
Total 100%

Source: authors’ calculations

Our previous results [6, p.6] suggest that web traffic variables and seasonal effects do not exhibit statistically significant or economically substantial influence on number of car registrations in Poland, and autoregressive component and search data remain the major factors in explaining the dependent variable. To extend our analysis, we add Purchasing Managers’ Index (PMI) to our set of regressors to account for the impact of macroeconomic environment on car registrations. PMI data, published by the Polish economic portal Bankier.pl,[3] becomes available on their webpage with one-month lag and free of charge. In comparison with other macroeconomic data PMI may be considered current and easily accessible, and our preliminary data analysis suggested that it is better suited to modelling car registrations than indicator of business conditions in retail trade. It also reflects the changes of activities related rather to B2B than B2C market, as opposed to data on frequency of search queries which may over-represent the consumer purchasing behaviour. We do not include explanatory variables taken from the Polish car market, for two reasons. First, autoregressive component in our models takes account of recent (lagged one month) number of car registrations; and second, car market data is not easily accessible on the Internet, and therefore less useful for the purpose of real-time analysis.

 

Estimation results

Based on the average number of registrations per month, car makes considered for empirical analysis can be grouped into three categories of car sellers: major (five makes), medium (three makes), and small (the remaining eight car makes; see Table 2).

 

Table 2. Average number of car registrations in 2011-2015

 

Make Category
of  producer
Averagemonthly

registrations

1. Skoda major 2 958
2. Volkswagen major 2 075
3. Toyota major 1 904
4. Opel major 1 763
5. Ford major 1 724
6. Renault medium 1 288
7. Hyundai medium 1 251
8. Kia medium 1 260
9. Nissan small 1 020
10. Peugeot small 999
11. Fiat small 985
12. Citroen small 825
13. Dacia small 795
14. Honda small 523
15. Suzuki small 506
16. BMW small 502

Source: authors’ calculations

 

We found that results of the subsequent stages of empirical analysis are very dependent on the number of car registrations. For medium and small producers we did not find an economically valid and statistically significant dependence of number of car registrations on PMI, and a limited one only on lagged search queries. However, for the five major players, the results look more promising.

Linear models with HAC standard errors (to account for serial correlation in the error term) have been estimated for five dependent variables describing number of first registrations of Ford, Opel, Skoda, Toyota and Volkswagen. Results are summarised in Table 3. AR(1) component (that is, number of car registrations lagged one month) and internet search data lagged two months were included on the basis of our previous analysis of car registrations data. Also, PMI data lagged two months were added; the lag is meant to account for one-month delay in publishing the index plus an additional one-month lag before it can be reflected in car registration numbers because of the delay allowed by Polish law between the purchase of a vehicle and its registration.

 

 

Table 3. Summary of estimation results

 

Ford Opel Skoda Toyota Volkswagen
constant −3778.03 ** −2451.22 ** −3339.64 ** −2667.55  * −1102.73
AR(1) 0.281 ** 0.348 ** 0.394 ** 0.457 ** 0.563 **
St-2 18.190  * 24.497 ** 69.425 ** 49.779 ** 79.434
PMIt-2 79.708 ** 39.612 ** 53.611 ** 36.241 14.579
R2 0.525 0.421 0.485 0.448 0.451
RESET p-value 0.745 0.341 0.592 0.206 0.157
normality p-value 0.084 0.016 0.956 0.900 0.101
maximum VIF 1.234 1.189 1.287 1.297 1.099
omit St-2: improved information criteria 2/3 0/3 0/3 0/3 0/3

** – coefficient statistically different from zero at 0.05 significance level; * – coefficient statistically different from zero at 0.10 significance level

Source: authors’ calculations

 

The strongest explanatory power is exhibited by the autoregressive component (that is, number of registrations lagged one month) and search queries lagged two months. Lagged dependent variable has positive and statistically significant influence in all cases, and so does lagged search data, with the sole exception of Volkswagen. Hypothesis of dependence of car registrations on Purchasing Managers’ Index is confirmed in three out of five cases: for Ford, Opel and Skoda.

All five models are characterized by satisfactory coefficients of determination, comparable to those obtained in previous research, and all are correctly specified according to the RESET test. All but one (that is, Opel) have normally distributed standard errors when tested at the 0.05 significance level, and since sample size is adequate, absence of normality of standard errors does not negatively influence estimation results. There is no multicollinearity in any of the models. In addition, we conducted the omitted variable test for lagged search queries to verify whether ignoring internet search data improves statistical quality of the estimated models. In only one case (that is, Ford) two out of three information criteria point to higher quality of the reduced model; in four remaining cases, models with search variable perform better than models without it.

 

Evaluation of forecasting quality

 

To assess forecasting quality of the models, they were re-estimated on the basis of a limited sample: from January 2011 to June 2014 (that is, on the basis of 42 observations). The remaining six months of 2014 were used to evaluate forecasting quality of the models using mean errors (ME) and mean absolute percentage errors (MAPE). Results are reported in Table 4.

 

Table 4. Measures of forecasting quality

 

Ford Opel Skoda Toyota Volkswagen
ME -160.12 316.57 -200.57 346.79 122.23
MAPE 12.8% 17.3% 15.8% 14.6% 13.8%

Source: authors’ calculations

 

 

Results presented in Table 4 suggest that number of registrations of Ford and Skoda are somewhat overestimated by the models, and those of Opel, Toyota and Volkswagen – slightly underestimated. However, size of bias does not appear substantial as compared to actual number of registrations. In-sample forecast errors as measured by mean percentage absolute errors range from 12.8% for Ford to 17.3% for Opel. As far as we are aware, there exist no similar studies for the Polish market to compare our results with but we consider them acceptable.

 

Discussion and limitations

 

Empirical analysis of passenger car registrations suggests a notable disparity between small/medium and major car producers. For the first category of car makes, we were not able to define and estimate economically meaningful and statistically significant relationships with lagged search queries and Purchasing Managers’ Index representing the macroeconomic environment. It seems that registration volumes of minor car producers are not directly influenced by aggregated economic variables, and it is generally difficult to fit a model which would meet standard quality criteria.

For the five major sellers, we find that number of registrations of Ford, Opel, Skoda and Toyota cars are adequately explained by the autoregressive component, internet search data lagged two months, and – with the exception of Toyota –  Purchasing Managers’ Index lagged two months. It appears that major new car sellers share a common pattern of dependence of their registration numbers on internet search and macroeconomic factors. It is interesting to note that Volkswagen registration numbers are statistically significantly influenced only by the autoregressive component.  It seems that web searches of this car make do not fall in step with purchasing decisions. As shown in Table 1, Volkswagen is characterized by one of the lowest ratios of number of searches to number of registrations.

Forecasting quality of the models constructed for the five major sellers seems satisfactory. We also found that among the estimated coefficients, the ones that exhibited the largest instability when shortened sample was used were those associated with lagged search variables. This may suggest that search data, being volatile and subject to major variation from month to month, reduces stability of results of econometric analysis and therefore presents additional challenges when used for nowcasting.

Estimation results show that Google search and macroeconomic data can be used to explain number of registrations of 5 major car producers in short term (i.e. one month). The estimates are based upon publicly available data and do not require any insider knowledge. The nowcasting estimation can also serve to assess the current sales level of major producers, as the sales precede the registrations with about two weeks (to be exact: from zero to four weeks). This conclusion remains in accordance with the definition of nowcasting which is the prediction of the present [4, p. 2; 2, p.4].

Nowcasting models which use search data may serve as a source of marketing insight on current trends in consumer purchasing behaviour. Knowledge of this type is difficult to acquire via traditional research methods such as surveys or in-depth interviews. As mentioned above, nowcasting with Google search data also helps to identify changes in consumer purchasing behaviour. This is especially useful for businesses requiring sustaining efficient infrastructure or extensive resource planning.

Our results, as well as conclusions of other studies [8, 7, 3] show that modelled activity is explained also with the autoregressive component (i.e. level of the activity from previous period). Thus nowcasting seems to be more feasible in industries in which data on subjects’ behaviour is recorded and publicly available. If the data is collected in higher frequencies (e.g. weeks instead of months), nowcasting models may provide faster and more accurate results.

The research is burdened with following limitations. Data on popularity of Google searches are so called “broad matches” presenting the volume of all searches including the given keyword (here: automotive make). Thus they include also queries not related with purchasing new car, but referring to e.g.  spare parts or used cars.

Furthermore, in this paper we attempt to model the number of car registrations conducted by both consumers and businesses. Many models using Google search data as explanatory variable refer to private consumption [8]. Data on registrations on separate markets are available, however in shorter time queries.  In the future it will be possible to model private car registrations and hopefully achieving better results. On the other hand, the current research reflects the entire sale volume and thus is of greater practical use.

Among other potentially productive directions of further analysis we would suggest readdressing the question of factors influencing number of registrations for smaller car producers since they seem to differ from those influencing registrations of major players; and searching for additional macroeconomic explanatory variables that would be available easily, free of charge and (almost) in real time.

 

Literature

 

  1. Askitas N., Zimmermann K.F. (2009). Google Econometrics and Unemployment Forecasting. Applied Economics Quarterly, 55 (2), 107-120.
  2. Bańbura M., Giannone D., Modugno M., Reichlin L. (2013). Now-Casting and the Real-Time Data Flow, Working Papers, European Central Bank, no. 1564.
  3. Carrière-Swallow Y., Labbé F. (2013). Nowcasting with Google Trends in an Emerging Market. Journal of Forecasting, 32 (4), 289–298.
  4. Choi H., Varian H. (2011). Predicting the Present with Google Trends. Economic Record, 88: 2–9. doi: 10.1111/j.1475-4932.2012.00809.x
  5. Dholakia U.M. (2001). A motivational process model of product involvement and consumer risk perception, European Journal of Marketing, vol. 35 iss: 11/12, pp.1340 – 1362.
  6. Doligalski T., Tomczyk E. (2015). Nowcasting New Car Registrations with Google Search Data and Car Manufacturers’ Website Traffic. Working paper.
  7. Li N., Peng G., Chen H., Bao. J. (2013). A Prediction Study on E-commerce Orders Based on Site Search Data. 6th International Conference on Information Management, Innovation Management and Industrial Engineering, 2, 314-318.
  8. Schmidt T., Vosen S. (2009). Forecasting Private Consumption: Survey-based Indicators vs. Google Trends. Ruhr Economic Papers, 155.
  9. Sun B., Li B., Li G., Zhang K. (2013). Automobile Demand Forecasting: An Integrated Model of PLS Regression and ANFIS. Advances in Information Sciences & Service Sciences, 5(8), 429-436.

 

 

[1] With the exception of Honda registrations data which are available up to September 2014 (45 observations).

[2] Polish Association of Automotive Industry, http://www.pzpm.org.pl/en, [2015.04.02].

[3] http://www.bankier.pl/gospodarka/wskazniki-makroekonomiczne/pmi-polska-pol, [2015.03.10].