Explainable Performance (with S. Hué, C. Hurlin and S. Saurin) New!
We introduce the XPER (eXplainable PERformance) methodology to measure the specific contribution of the input features to the predictive or economic performance of a model. Our methodology offers several advantages. First, it is both model-agnostic and performance metric-agnostic. Second, XPER is theoretically founded as it is based on Shapley values. Third, the interpretation of the benchmark, which is inherent in any Shapley value decomposition, is meaningful in our context. Fourth, XPER is not plagued by model specification error, as it does not require re-estimating the model. Fifth, it can be implemented either at the model level or at the individual level. In an application based on auto loans, we find that performance can be explained by a surprisingly small number of features, XPER decompositions are rather stable across metrics, yet some feature contributions switch sign across metrics. Our analysis also shows that explaining model forecasts and model performance are two distinct tasks.
Computational Reproducibility in Finance: Evidence from 1,000 Tests (with O. Akmansoy, C. Hurlin, A. Menkveld, A. Dreber, F. Holzmeister, J. Huber, M. Johannesson, M. Kirchler, M. Razen, U. Weitzel) Review of Financial Studies, R&R
We analyze the computational reproducibility of more than 1,000 empirical answers to six research questions in finance provided by 168 international research teams. Surprisingly, neither researcher seniority, nor the quality of the research paper seem related to the level of reproducibility. Moreover, researchers exhibit strong overconfidence when assessing the reproducibility of their own research and underestimate the difficulty faced by their peers when attempting to reproduce their results. We further find that reproducibility is higher for researchers with better coding skills and for those exerting more effort. It is lower for more technical research questions and more complex code.
Non-Standard Errors (with A. Menkveld, A. Dreber, F. Holzmeister, J. Huber, M. Johannesson, M. Kirchler, M. Razen, U. Weitzel et al.) Journal of Finance, forthcoming
My contribution: I was in charge of the reproducibility verification policy of the #fincap project
In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence generating process (EGP). We claim that EGP variation across researchers adds uncertainty: Non-standard errors (NSEs). We study NSEs by letting 164 teams test the same hypotheses on the same data. NSEs turn out to be sizable, but smaller for better reproducible or higher rated research. Adding peer-review stages reduces NSEs. We further find that this type of uncertainty is underestimated by participants.
The Fairness of Credit Scoring Models (with C. Hurlin and S. Saurin) Management Science, R&R
In credit markets, screening algorithms aim to discriminate between good-type and bad-type borrowers. However, when doing so, they also often discriminate between individuals sharing a protected attribute (e.g. gender, age, racial origin) and the rest of the population. In this paper, we show how (1) to test whether there exists a statistically significant difference between protected and unprotected groups, which we call lack of fairness and (2) to identify the variables that cause the lack of fairness. We then use these variables to optimize the fairness-performance trade-off. Our framework provides guidance on how algorithmic fairness can be monitored by lenders, controlled by their regulators, and improved for the benefit of protected groups.
The Economics of Computational Reproducibility (with J.-E. Colliard and C. Hurlin) Updated
We investigate why economics displays a relatively low level of computational reproducibility. We first study the benefits and costs of reproducibility for readers, authors, and academic journals. Second, we show that the equilibrium level of reproducibility may be suboptimally low due to three market failures: a competitive bottleneck effect due to the competition between journals to attract authors, the public good dimension of reproducibility, and the positive externalities of reproducibility outside academia. Third, we discuss different policies to address these market failures and move out of a low reproducibility equilibrium. In particular, we show that coordination among journals could reduce by half the cost of verifying the reproducibility of accepted papers.