RWP 21-12, November 2021
Machine learning and artificial intelligence methods are often referred to as “black boxes” when compared with traditional regression-based approaches. However, both traditional and machine learning methods are concerned with modeling the joint distribution between endogenous (target) and exogenous (input) variables. Where linear models describe the fitted relationship between the target and input variables via the slope of that relationship (coefficient estimates), the same fitted relationship can be described rigorously for any machine learning model by first-differencing the partial dependence functions. Bootstrapping these first-differenced functionals provides standard errors and confidence intervals for the estimated relationships. We show that this approach replicates the point estimates of OLS coefficients and demonstrate how this generalizes to marginal relationships in machine learning and artificial intelligence models. We further discuss the relationship of partial dependence functions to Shapley value decompositions and explore how they can be used to further explain model outputs.
JEL Classifications: C14, C18, C15
Article Citation
Cook, Thomas R., Greg Gupton, Zach Modig, and Nathan M. Palmer. 2021. “Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values.” Federal Reserve Bank of Kansas City, Research Working Paper no. 21-12, November. Available at External Linkhttps://doi.org/10.18651/RWP2021-12