Skip to main content

Forecasting with large datasets: A time varying covariance matrix

Periodic Reporting for period 1 - FORECASTING (Forecasting with large datasets: A time varying covariance matrix)

Reporting period: 2016-09-06 to 2018-09-05

The basic aim of this project is to suggest novel estimation methodologies for high dimensional datasets. More specifically, we aspire to propose a general framework that permits estimation of links, or connections, across an increasingly large set of variables (economic or not), which are non-constant, across time. To this end, we address two realistic features of observed datasets which have been barely tackled together in the literature, so far. These are the time varying structure and the large dimensionality of economic datasets.

Time variation in economic relationships has been largely studied in economics. It can be seen either as abrupt shifts in the assumed generating mechanisms of the variables, or as smooth stochastic or deterministic changes in that. Either way, it can be considered as the result of altering forces such as institutional switching, economic transitions, preference fluctuations, policy transformations or technological changes, inter alia. All these can imply instabilities in the assumed economic relationships.

Large datasets are, nowadays, a key characteristic of human development (e.g. computers, being in the middle of most economic transactions generate huge amounts of data that can be analyzed to extract critical information). This is relevant for answering economic policy questions or a key to various scientific discoveries. In large datasets, conventional statistical and econometric techniques such as sample covariance estimation or regression coefficient estimation fail to work consistently due to the dimensionality of the estimation object. For instance, in a linear economic relationship we frequently obtain T observations of a dependent variable (y) as a function of many potential predictors (p predictors). When the number of predictors p is large or larger than the temporal dimension T, then a regression with all available covariates becomes extremely problematic if not impossible. Analogously, when our aim is to estimate the large covariance matrix of the p predictors, the sample estimate becomes heavily unreliable. It is also, particularly, computationally demanding since the dimension of the estimated object rises as a square of the dimension of the dataset under analysis. The current literature provides some novel answers but only when we assume a fixed, across time, covariance matrix, of the true data generating mechanism.

These two aspects of the observed datasets are important characteristics of the reality and failure to provide a framework that can accommodate these, simultaneously, will certainly result to unreliable scientific discoveries. In economics, this implies that the developed models will be insufficient to capture important characteristics of the economy, delivering false or unsuccessful policy suggestions.

We provide a unified framework that can accommodate these aspects of real datasets, with nice theoretical properties. To this end, the large dimensional econometrics literature, is combined with the non parametric estimation literature, in an innovative fashion, and novel methodologies on large covariance matrix and large dimensional regression, are proposed. As it is shown, our methods imply significant improvements, in a wide range of applications and metrics, over the relevant methodologies that currently dominate the literature.
All the objectives of the project were fully accomplished while additional results have emerged as a by-product of work undertaken during the project execution. To this end, we have first derived formulas for large covariance matrix estimators, when the true generating mechanism of the large dataset is non-constant across time, and sparse. The derived convergence rates of the estimates to the true parameters, prove the nice theoretical properties in very general setups, which have not been considered in the literature, so far. Additionally, we propose a completely new framework for estimation and model selection in the large dimensional regression literature. This relies on the theoretical grounds of the proposed large covariance matrix estimate. Extensive simulations provide support of the developed theory by also comparing with other popular estimators which are currently considered as the state of the art. Moreover, a comprehensive empirical investigation on optimal portfolio selection, and large dimensional model selection and estimation support our theoretical contributions.

All results were well disseminated across the academic community, in Europe and the USA. Presentations, accompanied with short research visits to relevant academic institutions, such as the University of Southern California (USA), King's College (UK), Queen Mary University of London (UK), Universitat Pompeu Fabra (Spain), Athens University of Economics and Business (Greece) and Bank of Greece members, aim to inform the interested scholars and accelerate further the research agenda, increase to the highest level the networking opportunities, which constitute significant dissemination activities to targeted groups of Econometricians. The results were also presented in the University of Cyprus, within the brown bag seminar series, where the widely established, local academic community had the chance to understand deeper the theoretical problems that arise within the context of large dimensional econometric theory and notice the strong theoretical and empirical contributions of the project. The proposed methodologies were accepted with increased enthusiasm, while their comments and discussions helped to better exploit this research area.
We have extended the state of the art in two significant research areas of theoretical and empirical large dimensional econometrics. The results prove that the proposed machinery for high dimensional datasets is at the frontier of the relevant literature. This is reasonably justified by direct comparison with other popular methods which currently dominate this literature. The suggested frameworks, are applicable in a less strict assumptions list, have desirable theoretical properties, and provide significant gains in the two major empirical exercises which were part of this project. There are also important socio-economic gains. The empirical results show that when the proposed frameworks are used, we can form a better portfolio of assets, in terms of several measures of optimality. This means, resources of an economy can be directed to more beneficial investment options, yielding higher levels of social welfare. For instance, European pension funds, can invest their reserves more optimally, in terms of risk undertaken for a specific level of expected return. The former is of increasing importance, in the aftermath of the recent financial crisis. Similarly, consider policy makers who are increasingly integrating new economic variables in their models in order to design optimal policy decisions. A misspecified econometric model will certainly lead to suboptimal decisions. Moreover, forecasting with this model will result to poor performance. Economic forecasts are important as they can supply the society with early waning mechanisms on the success of adopted policies, allowing for favourable modifications, which is also highly relevant at this period of time. These are only two out of a large list of real world problems on which our proposals can provide better solutions, than existing ones.