Disclaimer: The views in the article are those of the author and do not necessarily represent the views of Zonda or its employees.
Abstract
This article reviews recent literature related to house price forecasting at the national and regional levels. After discussing the existing literature, the article briefly discusses packages and modeling in R. The article also covers future areas of research and possible innovations in the house price forecasting space.
Keywords: House Price Forecasting; Forecasting; Economic Modeling Disclaimer: The views in this article represent those of the author alone and do not necessarily represent the views of Zonda or its employees.
Acknowledgements: I would like to thank Madeline Hintz-Meszaros for helping me review this article as well as two anonymous reviewers.
1 Introduction
Case and Shiller [1989, 1990] were articles that generated substantial interest in house prices in the United States and the forecasting of said prices. Their work showed that the real estate market is not efficient, meaning price changes tended to move in the same direction across several years. Real estate price forecasting is still of great interest to economists in general, as well as real estate researchers and practitioners, as residential investment is a key component of the economy. Leamer [2007], in his influential paper, makes the point that the housing market strongly influences the performance of the overall economy.
Thus, being able to predict home prices (or at least turning points in the market) is of vital importance to policymakers and practitioners. In fact, in the private sector, prices are one of the most commonly asked about items by clients. This makes sense as companies and home buyers stand to gain or lose substantially depending on the condition of the market and the timing of sales or purchases. Further, although much research has focused on national or international pricing, there is significantly less attention on forecasting at sub-national levels (state, metropolitan area, county). More localized forecasting is an area with room for expansion in the future as housing functions more on a local level than a national level. However, as Hendershott and Weicher [2002] noted in a literature review of forecasting work up to that point in time, a variety of forecasts were substantially off-base. Forecasting is very difficult but the field is always advancing.
The rest of this paper will focus on past research related to house price forecasting. I will mostly focus on studies relating to the United States and will move through the modeling advancements by the scope of models, whether national or regional. Following this, I will briefly touch on methods in R that have proved useful and comment on possible expansions to these programs that would be beneficial to users. The final two sections will offer thoughts on advancements followed by the conclusion. My aim is to give readers a (brief) review of the field as well as ideas and resources for future work.
2 Forecasting Literature
2.1 Time Series Modeling
2.1.1 National Level
Case and Shiller’s classic work created a strong interest in house price forecasting. In an earlier work, Sklarz et al. [1987] used autoregressive methods to forecast housing starts. Cho [1996] summarized a variety of research that had been completed up to that time and commented on the efficiency of the housing market. Similarly to Case and Shiller, he found there was evidence of positive serial correlation in housing prices at least in the short-term. With this in mind, time series methods have been the main focus of researchers studying house price forecasting. Hoffman and Rasche [1997], although not focused explicitly on housing, used vector error-correction models (VECMs) to forecast key components of the U.S. economy. As many macroeconomic variables show evidence of cointegration, VECMs can be very useful for forecasting purposes.
Larson [2010] also used vector error correction models to forecast home prices. In particular, note that his estimation of home prices using a VECM with income produced good multi-step ahead forecasts (p.26). Leung [2014] also provided evidence that incomes and house prices have a cointegrating relationship. More recently, Kiefer [24 October 2019] created a framework for doing out-of-sample forecasts based on one input variable in a vector error correction setup. For example, if we have a very reliable true out-of-sample forecast of income levels, this can be used to generate out-of-sample estimates of housing prices based on the cointegrating relationship. VECMs, with not too many variables, are a modeling technique that could be useful for out-of-sample forecasting as Kiefer [24 October 2019] showed. This approach could certainly be expanded and generalized for other forecasting questions.
More recently, Milunovich [2020] compared a broad range of forecasting models for Australian real house prices. He found that, in general, univariate models performed better at short forecast horizons. He found multivariate forecasts may have some benefit at longer horizons, although the overall performance of the forecasts decreases as the forecast horizon grows (p.28). McGurk [2020] included capital flow information in his forecasting model and found that it added some explanatory power to U.S. home price forecasts. This was an interesting study as the forecast included some international financial flow in- formation which may be useful as Bernanke [10 March 2005] hypothesized that a global savings glut drove macroeconomic behavior in the U.S. prior to the Great Recession.
Lastly, Gravatt et al. [2022] created a simple method to determine the possibility of overpricing in metropolitan housing markets. This simple method is a great addition to the above modeling techniques as it is intuitive and is easily understandable by practitioners and the public. Sometimes, ease of use and access bring more benefit than more extensive and computation heavy analyses.
2.1.2 Regional Level
Moving to regional models, Miller et al. [2005] used an autoregressive model to estimate house prices. Interestingly, this was an early example of forecasting at the metropolitan level. The authors ran their model on 316 metropolitan areas to allow the forecasts to catch local effects. This is a useful approach because forecasts at the metro level are of great interest to practitioners. Rapach and Strauss [2007] demonstrated that combinations of various autoregressive distributed lag (ARDL) models helped accurately forecast housing prices in various states. In another study, Miller and Sklarz [2012] noted that incorporating short-term to medium-term market conditions improved the timing and accuracy of forecasts.
In another regional study, Gupta and Miller [2012] forecasted house prices for Los Angeles, Las Vegas, and Phoenix using a variety of models. They found that recursive forecasts (updated on a quarterly frequency) fairly accurately captured turning points in the metro housing markets they considered. Balcilar et al. [2015] examined a wide range of linear and non-linear forecasting models for U.S. house prices at the national level and for the U.S. Census regions. They found, in general, that the additional costs, in terms of complexity and computation, for non-linear models do not outweigh the benefit of using linear models for short-term forecasting. Bork and Møller [2015] used dynamic model averaging methods to create forecasts that change over time and across the 50 U.S. states. They found using this technique improved the accuracy of their forecasts. Kholodilin and Siliverstovs [2017] performed a forecasting exercise on 71 German cities and their property prices. They found that no single indicator predicted all markets, but forecast combinations were helpful in improving forecast accuracy.
To close this section, Moody’s Analytics developed a method for forecasting house prices down to the metropolitan level. Chen et al. [2013] documented the model and input variables. This involved creating and evaluating a model that generated the equilibrium price for each unit of interest (in this case, metropolitan areas) and an adjustment equation that estimated how quickly prices that have deviated from equilibrium return to those levels. All of these methods can be useful for modeling regional or metropolitan level prices.
2.2 Other Modeling Approaches
Auterson [2014] developed a theory driven model based on demand and household utility to simulate house prices in the United Kingdom. Saunders and Tulip [2019] created a structural macroeconomic model to simulate house prices in Australia. Models constructed similarly would have the advantage of capturing inter-relationships among various macroeconomic variables (interest rates, vacancy rates, etc.) and how they impact house prices. Agnello and Schuknecht [2009] investigated the drivers of housing cycles in advanced economies using a panel model. Their model showed some promise in identifying price boom periods. Similarly, Geng [2018] used a cross-county panel model to investigate fundamental drivers of housing prices in advanced economies. Panel models (including the U.S.) can be useful to give studies more statistical power. These type of models could certainly be adapted more widely on a state or metropolitan level in the United States.
To conclude this subsection, I would refer all interested readers to Petropoulos et al. [2022] which contains a massive, current literature review of all aspects of forecasting and to Ghysels et al. [2013] which extensively covers modeling related to real estate price forecasting.
3 Modeling Considerations
In R, there are several packages that are useful for time series testing and modeling. An older package that is still widely used is forecast [Hyndman et al., 2008]. This is a go-to for many practitioners. It has a variety of forecasting methods including a useful command (auto.arima) for automatically selecting the optimal model for a given series. Recently, Hyndman and Athanasopoulos [2021] have published a freely available forecasting textbook. The text is available digitally online for free (a paper copy can also be purchased). The book details the fable package in R, which has a variety of built-in forecasting methods. Krispin [2019] also has an inexpensive book that details forecasting methods and practice in R featuring his TSstudio package. It features many examples with applications.
Several other packages offer additional time series methods. urca [Pfaff et al.] is a useful package that can be used for cointegration and unit-root testing. vars [Pfaff and Stigler] is another R package which allows the user to model using VECMs. Additionally, Weiss et al. [2018] have a useful package called ForecastComb that allows for forecast combinations, which have been shown to help improve the accuracy of forecasts. modeltime [Dancho] is a newer package that allows for streamlined forecasting across many different segments (such as metropolitan areas). modeltime is also specifically built for forecasting at scale, meaning it can create automated forecasts for many different units (counties, metros, states). Additionally, I have found R Journal and the Journal of Statistical Software to be very informative to stay up-to-date on the latest open-source forecasting packages in R. Many of these papers feature vignettes and applications for those interested.
4 Areas for Future Research
Much literature has been devoted to forecasting national house price levels both for the US and other advanced economies. Typically, these papers have used VECMs and other methods, possibly with a panel setup. Relatively less literature has focused on forecasting at the metropolitan level. This is not necessarily because researchers have not thought of this (see, for example, footnote 5 in Miller et al. [2005]) but there have been ongoing issues with availability and breadth of data at the metropolitan level. So, one area of future development would be to have more metro level data available for longer time periods at a monthly or quarterly interval (preferably publicly accessible through the U.S. Census Bureau or a similar public source).
Another area that is somewhat neglected is modeling for multiple areas. On the practitioner side, many private companies and individuals are interested in forecasts not so much for the national level, but for specific metros (typically, the more metro areas covered the better). As far as I know, there is relatively limited information and package availability on forecasting on a large scale (fable has some capability for forecasting across multiple units, modeltime is explicitly setup for this but information on that package is somewhat less readily available). Forecasting routinely for many units is a key area of development for the real estate field (and all forecasters in general). I think this will be a major area of development in the future for many companies in real estate as well as other industries.
Further, many organizations are interested in true out-of-sample forecasting. What this means is not a holdout sample from a time series model, but truly forecasting for future values (i.e., July 2024 for instance). A problem with many methods is the requirement of forecasts of inputs for out-of-sample dates. One of the few papers focused on this problem is Kiefer [24 October 2019]. His method, which could be further generalized, takes a VECM and, using an observable input forecasted from a known series, can make true out-of-sample forecasts. Combined with the fairly good track record of VECMs in forecasting prices, this could be a key area for future forecast development.
Lastly, the research from Gravatt et al. [2022] was very interesting. The method the authors used was simple, but it was easy to understand and use. The authors also freely presented their method and have made the output available publicly. This type of research is of great use to non-academic audiences as practitioners and non-academic researchers can easily access and use the model and paper. Greater collaboration between academic and non-academic researchers seems like an easy area to further promote understanding of housing markets.
5 Conclusion
Dating back to Case and Shiller [1989, 1990] house price forecasting has generated great interest in the economics and forecasting fields. Housing is one of the most important components of U.S. economic performance [Leamer, 2007], so it makes sense that a great deal of research has gone into this topic. Many researchers have focused on the problem at the national level from a time series approach. Some research (although not quite as much) has come at the forecasting problem from a metropolitan or state level. Free software, such as R, is available for researchers and practitioners to use for forecasting. Various R packages have been developed that can be of use specifically for forecasting. More focus could go into developing packages that can be used for forecasting at scale (many metropolitan areas). This is an area of great interest, especially to non-academic audiences. Also, a focus on true out-of-sample forecasting is an area that could yield strong benefits to those in academia and the private sector.
Finally, more collaboration between academic and non-academic researchers could be of benefit to the field. Non-academics have strengths in bringing needs/wants from practitioners while academics can bring cutting-edge research and software development. Encouraging interaction and dialogue between the two types of researchers could yield great benefits.
Declaration: The author received no funding for this article and reports no conflict of interest.
6 Glossary
Autoregressive Model (AR) – time series model that depends on its own prior values as well as a random component which will drive changes.
Autoregressive distributed lag (ARDL) model – time series model, similar to AR but includes an explanatory variable and its lags which will in- fluence the behavior of the outcome variable.
Fable package – a package in R. It is an update of the forecast package that includes more options for forecasting.
Forecast package – a package in R that has a variety of useful built-in forecasting methods. Good for ”off-the-shelf” forecasting.
ForecastComb package – a package in R that allows a user to combine various forecasts to create a single, weighted forecast.
Journal of Statistical Software – a free online journal that publishes ar- ticles on software innovations in R as well as other programs. Good for general statistical modeling knowledge as well.
Modeltime package – a package in R that lets the user create forecasts at scale, meaning you can create automated forecasts of a unit of interest (different cities, products, etc.) all at once.
Panel model – an econometric model that will generate an estimate using inputs from various units of interest (states, countries, etc.) across time.
R Journal – a free online modeling/econometric journal specifically focused on packages and programming innovations in R.
TSstudio package – a package in R for time series forecasting. This is another R package that can be used for forecasting.
Urca package – a package in R that can be used for cointegration testing as well as a variety of other time series modeling tests.
Vars package – a package in R which can be used to create vector error- correction models and other time series models.
Vector Error – correction Model (VECM) – time series model that in- corporates the cointegrating relationship between variables. Cointegration is where there is a relationship that will cause two variables to move together (this can be tested for statistically).
References
Luca Agnello and Ludger Schuknecht. Booms and busts in housing markets: De- terminants and implications. European Central Bank Working Paper Series, (1071), 2009.
Toby Auterson. Forecasting house prices. U.K. Office for Budget Responsibility Working Paper, (6), 2014.
Mehmet Balcilar, Rangan Gupta, and Stephen M. Miller. The out-of-sample forecasting performance of nonlinear models of regional housing prices in the us. Applied Economics, 47(22):2259–2277, 2015.
Ben S. Bernanke. Remarks by governor ben s. bernanke. In Sandridge Lecture Virginia Association of Economists, 10 March 2005.
Lasse Bork and Stig Vinther Møller. Forecasting house prices in the 50 states using dynamic model averaging and dynamic model selection. International Journal of Forecasting, 31(1):63–78, 2015.
Karl E. Case and Robert J. Shiller. The efficiency of the market for single-family homes. The American Economic Review, 79(1):125–137, 1989.
Karl E. Case and Robert J. Shiller. Forecasting prices and excess returns in the housing market. AREUEA Journal, 18(3):253–273, 1990.
Cella Chen, Andres Carbacho-Burges, Sunayana Mehra, and Mike Zoller. The moody’s analytics case-shiller home price index forecast methodology. Tech- nical report, Moody’s Analytics, 2013.
Man Cho. House price dynamics: A survey of theoretical and empirical issues.
Journal of Housing Research, 7(2):145–172, 1996.
Matt Dancho. modeltime: The Tidymodels Extension for Time Series Modeling.
Nan Geng. Fundamental drivers of house prices in advanced economies. Inter- national Monetary Fund Working Paper, (WP/18/164):1–24, 2018.
Eric Ghysels, Alberto Plazzi, Rossen Valkanov, and Walter Torous. Chapter 9 – forecasting real estate prices. In Graham Elliott and Allan Timmermann, editors, Handbook of Economic Forecasting Volume 2 Part A. Elsevier, 2013.
Denise Gravatt, Eli Baracha, and Ken H. Johnson. A note on the estimation of the degree of over or under-pricing of housing markets relative to their long-term pricing trend. Journal of Housing Research, 31(1):1–3, 2022.
Rangan Gupta and Stephen M. Miller. ’ripple effects’ and forecasting home prices in los angeles, las vegas, and phoenix. The Annals of Regional Science, 48:763–782, 2012.
Patric H. Hendershott and John C. Weicher. Forecasting housing markets: Lessons learned. Real Estate Economics, 30(1):1–11, 2002.
Dennis L. Hoffman and Robert H. Rasche. Stls/us-vecm6.1: A vector error- correction forecasting model of the u.s. economy. Federal Reserve Bank of St. Louis Working Paper, (1997-008A), 1997.
Rob Hyndman et al. forecast: Forecasting Functions for Time Series and Linear Models, 2008.
Rob J. Hyndman and George Athanasopoulos. Forecasting: Principles and Practice, 3rd Edition. Otexts, 2021.
Konstantin A. Kholodilin and Boriss Siliverstovs. Think national, forecast local: A case study of 71 german urban housing markets. Applied Economics, 49 (42):4271–4297, 2017.
Len Kiefer. Forecasts from a bivariate vecm conditional on one of the variables, 24 October 2019. This article can be found on Kiefer’s blog: lenkiefer.com.
Rami Krispin. Hands-On Time Series Analysis with R. Packt Publishing, 2019.
William D. Larson. Evaluating alternative methods of forecasting house prices: A post-crisis reassessment. George Washington University Research Program on Forecasting Working Paper, (2010-004), 2010.
Edward E. Leamer. Housing is the business cycle. NBER Working Paper Series, (13428), 2007.
Charles Ka Yui Leung. Error correction dynamics of house prices: An equilib- rium benchmark. Federal Reserve Bank of Dallas Globalization and Monetary Policy Institute Working Paper, (177), 2014.
Zachary McGurk. Us real estate inflation prediction: Exchange rates and net foreign assets. The Quarterly Review of Economics and Finance, 75:53–66, 2020.
Norman G. Miller and Michael Sklarz. Integrating real estate market conditions into home price forecasting systems. Journal of Housing Research, 21(2):183– 214, 2012.
Norman G. Miller, Michael Sklarz, and Thomas G. Thibodeau. The impact of interest rates and employment on nominal housing prices. International Real Estate Review, 8(1):26–42, 2005.
George Milunovich. Forecasting australia’s real house price index: A comparison of time series and machine learning methods. Journal of Forecasting, 39(7): 1098–1118, 2020.
Fotios Petropoulos et al. Forecasting: Theory and practice. International Jour- nal of Forecasting, 38(3):705–871, 2022.
Bernhard Pfaff and Matthieu Stigler. vars: VAR Modelling.
Bernhard Pfaff, Eric Zivot, and Matthieu Stigler. urca: Unit Root and Cointe- gration Test for Time Series Data.
David E. Rapach and Jack K. Strauss. Forecasting real housing price growth in the eighth district states. Regional Economic Development, 31(2):33–42, 2007.
Trent Saunders and Peter Tulip. A model of the australian housing market.
Reserve Bank of Australia Research Discussion Paper, (RDP 2019-01), 2019.
Michael A. Sklarz, Norman G. Miller, and Will Gersch. Forecasting using long- order autoregressive processes: An example using housing starts. AREUEA Journal, 15(4):375–388, 1987.
Christoph E. Weiss, Eran Raviv, and Gernot Roetzer. Forecast combinations in r using the forecastcomb package. R Journal, 10(2):262–281, 2018.