Download Article

RWP 21-12, November 2021

Machine learning and artificial intelligence methods are often referred to as “black boxes” when compared with traditional regression-based approaches. However, both traditional and machine learning methods are concerned with modeling the joint distribution between endogenous (target) and exogenous (input) variables. Where linear models describe the fitted relationship between the target and input variables via the slope of that relationship (coefficient estimates), the same fitted relationship can be described rigorously for any machine learning model by first-differencing the partial dependence functions. Bootstrapping these first-differenced functionals provides standard errors and confidence intervals for the estimated relationships. We show that this approach replicates the point estimates of OLS coefficients and demonstrate how this generalizes to marginal relationships in machine learning and artificial intelligence models. We further discuss the relationship of partial dependence functions to Shapley value decompositions and explore how they can be used to further explain model outputs.

JEL Classifications: C14, C18, C15

Article Citation

  • Cook, Thomas R., Greg Gupton, Zach Modig, and Nathan M. Palmer. 2021. “Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values.” Federal Reserve Bank of Kansas City, Research Working Paper no. 21-12, November. Available at External Linkhttps://doi.org/10.18651/RWP2021-12

Author

Thomas R. Cook

Data Scientist

Tom Cook is a Data Scientist in the Economic Research Department of the Federal Reserve Bank of Kansas City. He joined the bank in August 2016 after completing his PhD in Politic…