Download the notebook here! Interactive online version: binder

Quantitative sensitivity analysis

Generalized Sobol Indices

Here we show how to compute generalized Sobol indices on the EOQ model using the algorithm presented in Kucherenko et al. 2012. We import our model function from temfpy and use the Kucherenko indices function from econsa.

[1]:
import matplotlib.pyplot as plt  # noqa: F401
import numpy as np

from temfpy.uncertainty_quantification import eoq_model

# TODO: Reactivate once Tim's PR is ready.
# from econsa.kucherenko import kucherenko_indices  # noqa: E265

The function kucherenko_indices expects the input function to be broadcastable over rows, that is, a row represents the input arguments for one evaluation. For sampling around the mean parameters we specify a diagonal covariance matrix, where the variances depend on the scaling of the mean. Since the variances of the parameters are unknown prior to our analysis we choose values such that the probability of sampling negative values is negligible. We do this since the EOQ model is not defined for negative parameters and the normal sampling does not naturally account for bounds.

[2]:
def eoq_model_transposed(x):
    """EOQ Model but with variables stored in columns."""
    return eoq_model(x.T)

mean = np.array([1230, 0.0135, 2.15])
cov = np.diag([1, 0.000001, 0.01])

# indices = kucherenko_indices( # noqa: E265
#    func=eoq_model_transposed, # noqa: E265
#    sampling_mean=mean,  # noqa: E265
#    sampling_cov=cov,  # noqa: E265
#    n_draws=1_000_000,  # noqa: E265
#    sampling_scheme="sobol",  # noqa: E265
# )  # noqa: E265

Now we are ready to inspect the results.

[4]:
# fig  # noqa: E265

Shapley Effects

econsa offers an implementation of the algorithm for the computation of Shapley effects as propsoed by Song et al. (2016). Here we show how to compute Shapley effects using the EOQ model as referenced above. We adjust the model in temfpy to accomodate an n-dimensional array for use in the context of the Shapley effects as implemented in econsa.

[6]:
# import necessary packages and functions
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import chaospy as cp

from econsa.shapley import get_shapley
from econsa.shapley import _r_condmvn

Sampling via x_all and x_cond, and the model of interest model

First, we load all neccesary objects for the estimation of Shapley effects. The following objects are needed in the case of Gaussian model inputs.

  • The functions x_all and x_cond for (un-)conditional sampling. These functions depend on the distribution from which we are sampling from. For the purposes of this illustration, we will sample from a multivariate normal distribution, but the functions can be tailored to the user’s specific needs.

  • A mean vector and covariance matrix of the model inputs. They are necessary for sampling conducted by the above two functions in the case of a Gaussian distribution.

  • The model the user wishes to perform sensitivity analysis (SA) on that maps model inputs to a model output. Here we consider the EOQ model.

[7]:
# Mean vector and covariance matrix of the model inputs.
n_inputs = 3
mean = np.array([5.345, 0.0135, 2.15])
cov = np.diag([1, 0.000001, 0.01])

Choosing n_perms

Since we are conducting SA on a model with three inputs, the number of permutations on which the computation algorithm is based is \(3! = 6\). For larger number of inputs it might be worthwhile to consider only a subset of all permutations. E.g. for a model with 10 inputs, there are 3,628,800 permutations. Considering all permutations could be computationally infeasible. Thus, get_shapley allows the user to set a specific number of permutations by the argument n_perms.

Choosing the number of Monte Carlo (MC) runs n_output, n_outer, and n_inner

\(N_V\), \(N_O\), and \(N_I\) denote the function arguments n_output, n_outer, and n_inner, respectively. For the algorithm by Song et al. (2016) these three MC simulations are needed. The number of model evaluations required for the estimation of Shapley effects by get_shapley are given by

\[N_V+m \cdot N_I \cdot N_O \cdot (k-1),\]

where \(m\) denotes the number of permutations, n_perms, and \(k\) the number of inputs, n_inputs.

Song et al. (2016) show that choosing \(N_I = 3\) is optimal. \(N_V\) needs to be large enough to reliably estimate the total output variance, \(V[Y]\). Given these choices, \(N_O\) is chosen to consume the rest of the computational budget.

[8]:
# Model for which senstivity analysis is being performed.
def eoq_model_ndarray(x, r=0.1):
    """EOQ Model that accepts ndarray."""
    m = x[:, 0]
    c = x[:, 1]
    s = x[:, 2]
    return np.sqrt((24 * m * s) / (r * c))
[9]:
# Function for unconditional sampling.
def x_all(n):
    distribution = cp.MvNormal(mean, cov)
    return distribution.sample(n)

# Function for conditional sampling in the case of Gaussian inputs.
def x_cond(n, subset_j, subsetj_conditional, xjc):
    if subsetj_conditional is None:
        cov_int = np.array(cov)
        cov_int = cov_int.take(subset_j, axis = 1)
        cov_int = cov_int[subset_j]
        distribution = cp.MvNormal(mean[subset_j], cov_int)
        return distribution.sample(n)
    else:
        return _r_condmvn(n, mean=mean, cov=cov, dependent_ind=subset_j, given_ind=subsetj_conditional, x_given=xjc)
[11]:
# Estimate Shapley effects using the exact method.
np.random.seed(1234)
method = "exact"
n_perms = None
n_output = 10 ** 4
n_outer = 10 ** 3
n_inner = 3

exact_shapley = get_shapley(method, eoq_model_ndarray, x_all, x_cond, n_perms, n_inputs, n_output, n_outer, n_inner)
[12]:
exact_shapley
[12]:
Shapley effects std. errors CI_min CI_max
X1 0.828423 0.008709 0.811353 0.845493
X2 0.130322 0.004717 0.121077 0.139568
X3 0.041255 0.012115 0.017509 0.065000
[13]:
# Estimate Shapley effects using the random method.
np.random.seed(1234)
method = "random"
n_perms = 5
n_output = 10 ** 4
n_outer = 10 ** 3
n_inner = 3

random_shapley = get_shapley(method, eoq_model_ndarray, x_all, x_cond, n_perms, n_inputs, n_output, n_outer, n_inner)
[15]:
random_shapley
[15]:
Shapley effects std. errors CI_min CI_max
X1 0.817302 0.015392 0.787134 0.847470
X2 0.129110 0.003608 0.122037 0.136182
X3 0.053588 0.013530 0.027069 0.080108

Now we plot the ranking of the Shapley values below.

[17]:
# plot for exact and random permutations methods
ranking(data = df)
../_images/tutorials_sensitivity-analysis-quantitative_24_0.svg

As noticed above, both methods produce the same ranking. Sometimes, it is neccesary to compare the parameter estimates with their parameter values. A typical application is hypothesis testing, that is, whether the parameter estimates are significant, and what the contribution of significant / insignificant estimates is to the output variance as reflected by their Shapley ranking.

We can plot the parameter estimates together with their Shapley ranking as shown below:

[19]:
# plot exact method comparison
ranking_params_shapley(ordered_df)
../_images/tutorials_sensitivity-analysis-quantitative_27_0.svg
[21]:
# plot random method comparison
ranking_params_shapley(ordered_df)
../_images/tutorials_sensitivity-analysis-quantitative_29_0.svg

When do we use randomly sampled permutations? Choosing method

The exact method is good for use when the number of parameters is low, depending on the computational time it takes to estimate the model in question. If it is computationally inexpensive to estimate the model for which sensitivity analysis is required, then the exact method is always preferable, otherwise the random is recommended. A good way to proceed if one suspects that the computational time required to estimate the model is high, having a lot of parameters to conduct SA on is always to commence the exercise with a small number of parameters, e.g. 3, then get a benchmark of the Shapley effects using the exact method. Having done that, repeat the exercise using the random method on the same vector of parameters, calibrating the n_perms argument to make sure that the results produced by the random method are the same as the exact one. Once this is complete, scale up the exercise using the random method, increasing the number of parameters to the desired parameter vector.

Quantile Based Sensitivity Measures

We show how to compute global sensitivity measures based on quantiles of model’s output.

[19]:
# import necessary packages and functions
import numpy as np
import pandas as pd
import chaospy as cp
import matplotlib.pyplot as plt
import seaborn as sns

from temfpy.uncertainty_quantification import eoq_model
from econsa.quantile_measures import mc_quantile_measures

Firstly, we specify the parameters of the function. This function mc_quantile_measures is capable of computing numerical results of quantile-based sensitivity measures on various user-provided models. Here we take EOQ model from temfpy as an example, where the model function is adjusted to accommodate an n-dimensional array. For multivariate distributed samples, the loc and scale keywords are denoted by a mean vector and a covariance matrix respectively. Considering both efficient computation and good convergence, we set n_draws equal to 3000 and \(2^{13}\) for brute force and double loop reordering estimators correspondently. Note that the double loop reordering estimator is more efficient than the brute force estimator.

[20]:
# model to perform quantile based sensitivity ananlysis
def eoq_model_transposed(x):
    """EOQ Model but with variables stored in columns."""
    return eoq_model(x.T)
[21]:
# mean and covaraince matrix inputs
mean = np.array([5.345, 0.0135, 2.15])
cov = np.diag([1, 0.000001, 0.01])
n_params = len(mean)
dist_type = "Normal"

Then we are ready to calculate the numerical results using the algorithm presented in Kucherenko et al. 2019.

[22]:
# compute quantile measures using brute force estimator
bf_measures = mc_quantile_measures("brute force", eoq_model_transposed, n_params, mean, cov, dist_type, 3000,)
[23]:
bf_measures
[23]:
x_1 x_2 x_3
Measures alpha
q_1 0.020 64.010462 10.595192 6.522037
0.052 53.565287 11.764802 7.126140
0.084 47.203635 11.849184 7.315335
0.116 43.584702 11.945758 7.390247
0.148 40.746792 12.074625 7.493912
... ... ... ... ...
Q_2 0.852 0.851180 0.108055 0.040765
0.884 0.857788 0.102897 0.039315
0.916 0.863797 0.099314 0.036889
0.948 0.871352 0.093822 0.034825
0.980 0.880632 0.089250 0.030119

124 rows × 3 columns

[24]:
# compute quantile measures using double loop reordering estimator
dlr_measures = mc_quantile_measures("DLR", eoq_model_transposed, n_params, mean, cov, dist_type, 2 ** 13,)
[25]:
dlr_measures
[25]:
x_1 x_2 x_3
Measures alpha
q_1 0.020 64.179809 10.414823 6.285291
0.052 52.267399 11.037341 6.778817
0.084 46.311146 11.211302 6.917412
0.116 42.563103 11.407816 7.013578
0.148 40.321098 11.638289 7.183969
... ... ... ... ...
Q_2 0.852 0.865417 0.097312 0.037271
0.884 0.871006 0.093422 0.035572
0.916 0.876810 0.089801 0.033388
0.948 0.883911 0.084970 0.031119
0.980 0.890728 0.081605 0.027667

124 rows × 3 columns

Now we are able to visualize the results.

[27]:
# plot brute force estimates
plot_quantile_measures(bf_measures)
../_images/tutorials_sensitivity-analysis-quantitative_44_0.png
[28]:
# plot double loop reordering estimates
plot_quantile_measures(dlr_measures)
../_images/tutorials_sensitivity-analysis-quantitative_45_0.png

The brute force estimator and DLR estimator generate the same ranking of variables for all quantiles: \(x_1, x_2, x_3\)(in descending order). At \(\alpha=0.5\), measures \(Q_i\) reachs their minimun:\(i=1\) and maximum:\(i=2,3\).