SHAP is the most popular model-agnostic technique that is used to explain predictions. SHAP stands for SHapley Additive exPlanations

Shapely values are obtained by incorporating concepts from Cooperative Game Theory and local explanations

Mathematical and Algorithm Foundation

Shapely Values

Shapely values were from game theory and invented by Lloyd Shapley. Shapely values were invented to be a way of providing a fair solution to the following question:

Question

If we have a coalition C that collaborates to produce a value V: How much did each individual member contribute to the final value

The method here we assess each individual member’s contribution is to removing each member to get a new coalition and then compare their production, like this graphs:

And then, we get every member 1 included or not included coalitions like this:

Using left value - right value, we can get difference like image left above; And then we calculate the mean of them:

φ_{i} = \frac{1}{Members} \forall C s.t. i \in / C \sum \frac{Marginal Contribution of i to C}{Coalitions of size |C|}

Shapely Additive Explanations

We need to know what’s additive mean here. Lundberg and Lee define an additive feature attribution as follows:

$x^{'}$ , the simplified local inputs usually means that we turn a feature vector into a discrete binary vector, where features are either included or excluded. Also, the $g (x^{'})$ should take this form:

g (x^{'}) = φ_{0} + i = 1 \sum N φ_{i} x^{'}_{i}

$φ_{0}$ is the null output of this model, that is, the average output of this model

$φ_{i}$ is feature affect, is how much that feature changes the output of the model, introduced above. It’s called attribution

Now Lundberg and Lee go on to describe a set of three desirable properties of such an additive feature method, local accuracy, missingness, and consistency.

Local accuracy

g (x^{'}) \approx f (x) if x^{'} \approx x

Missingness

x_{i}^{'} = 0 \to φ_{i} = 0

if a feature excluded from the model. it’s attribution must be zero; that is, the only thing that can affect the output of the explanation model is the inclusion of features, not the exclusion.

Consistency

If feature contribution changes, the feature effect cannot change in the opposite direction

Why SHAP

Lee and Lundberg in their paper argue that only SHAP satisfies all three properties if the feature attributions in only additive explanatory model are specifically chosen to be the shapley values of those features

SHAP, step-by-step Process, same as shap.explainer

For example, we consider a ice cream shop in the airport, it has four features we can know to predict his business.

[temperature day of weeks num of flights num of hours] \to [T D F H]

For, example, we want to know the temperature 80 in sample [80 1 100 4] shapley value, here’s the step

Step 1. Get random permutation of features, and give a bracket to the feature we care and everything in its right. (manually)

[F D T H]

Step 2. Pick random sample from dataset

For example, [200 5 70 8], form: [F D T H]

Step 3. Form vectors $x_{1} x_{2}$

x_{1} = [1001808]

$x_{1}$ is partially from original sample and partially from the random chosen one, the feature in bracket will from random chosen one, exclude what we care

x_{2} = [1001708]

$x_{2}$ just change the feature we care into the same as random chosen one’s feature value

Then, calculate the diff and record

D I FF = c_{1} - c_{2}

Step 4. Record the diff & return to step 1. and repeat many times

SHAP (T = 80∣ [8011004]) = average(DIFF)

Shapley kernel

Too many coalitions need to be sampled

Like we introduce shapley values above, for each $φ_{i}$ we need to sample a lot of coalitions to compute the difference.

For 4 features, we need 64 total coalitions to sample; For 32 features, we need 17.1 billion coalitions to sample.

It’s entirely untenable.

So, to get over this difficulty, we need devise a shapley kernel, and that’s how the Lee and Lundberg do

Detail

Though most of ML models won’t just let you omit a feature, what we do is define a background dataset B, one that contains a set of representative data points that model was trained over. We then filled in out omitted feature of features with values from background dataset, while holding the features are included in the permutation fixed to their original values. We then take the average of the model output over all of these new synthetic data point as our model output for that feature permutation which we call $\overset{y}{ˉ}$ .

E [y_{12i4} \forall i \in B] = \overset{y}{ˉ}_{124}

Them we have a number of samples computed in this way,like image in left.

We can formulate this as a weighted linear regression, with each feature assigned a coefficient.

And we can prove that, in the special choice, the coefficient can be the shaplely values. This weighting scheme is the basis of the Shapley Kernal. In this situation, the weighted linear regression process as a whole is Kernal SHAP.

Different types of SHAP

Kernal SHAP
Low-order SHAP
Linear SHAP
Max SHAP
Deep SHAP
Tree SHAP

You need to notice

We can see that, we calculate shapley values using linear regression lastly. So there must be the error here, but some python packages can not give us the error bound, so it’s confusion to konw if this error come from linear regression or the data, or the model.

Reference

Shapley Additive Explanations (SHAP)

SHAP: A reliable way to analyze your model interpretability

【Python可解释机器学习库SHAP】：Python的可解释机器学习库SHAP

Shapley Values : Data Science Concepts

Appendix

Other methods to interprete model:

Papers with Code - SHAP Explained

🎣 JudeW's Knowledge Brain

Recent writing

IELTS Talking Material Preparation

Words related to different topics in IELTS writing - Crime About

IELTS Academic Writing Task 2 Reading note

SHAP - a reliable way to analyze model interpretability

Mathematical and Algorithm Foundation

Shapely Values

Shapely Additive Explanations

Local accuracy

Missingness

Consistency

Why SHAP

SHAP, step-by-step Process, same as shap.explainer

Shapley kernel

Too many coalitions need to be sampled

Detail

Different types of SHAP

You need to notice

Reference

Appendix

Graph View

Table of Contents

Backlinks