Nested Logit and Non-Proportional Patterns of Substitution#
import arviz as az
import matplotlib.pyplot as plt
import pandas as pd
from pymc_extras.prior import Prior
from pymc_marketing.customer_choice.nested_logit import NestedLogit
from pymc_marketing.paths import data_dir
az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100
%config InlineBackend.figure_format = "retina"
We’ve seen in the other example notebook for consumer choice how the Multinomial Logit model suffers from the IIA limitation that leads to implausible counterfactual inferences regarding market behaviour. If you haven’t read that one, we advise you read it before continuing. We will now show how the nested logit model specification avoids this property by adding more explicit structure to the choice scenarios that are modelled.
In this notebook we will re-use the same heating choice data set seen in the Multinomial Logit example and apply a few different specifications of a nested logit model.
data_path = data_dir / "choice_wide_heating.csv"
df = pd.read_csv(data_path)
df
| idcase | depvar | ic_gc | ic_gr | ic_ec | ic_er | ic_hp | oc_gc | oc_gr | oc_ec | oc_er | oc_hp | income | agehed | rooms | region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | gc | 866.00 | 962.64 | 859.90 | 995.76 | 1135.50 | 199.69 | 151.72 | 553.34 | 505.60 | 237.88 | 7 | 25 | 6 | ncostl |
| 1 | 2 | gc | 727.93 | 758.89 | 796.82 | 894.69 | 968.90 | 168.66 | 168.66 | 520.24 | 486.49 | 199.19 | 5 | 60 | 5 | scostl |
| 2 | 3 | gc | 599.48 | 783.05 | 719.86 | 900.11 | 1048.30 | 165.58 | 137.80 | 439.06 | 404.74 | 171.47 | 4 | 65 | 2 | ncostl |
| 3 | 4 | er | 835.17 | 793.06 | 761.25 | 831.04 | 1048.70 | 180.88 | 147.14 | 483.00 | 425.22 | 222.95 | 2 | 50 | 4 | scostl |
| 4 | 5 | er | 755.59 | 846.29 | 858.86 | 985.64 | 883.05 | 174.91 | 138.90 | 404.41 | 389.52 | 178.49 | 2 | 25 | 6 | valley |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 895 | 896 | gc | 766.39 | 877.71 | 751.59 | 869.78 | 942.70 | 142.61 | 136.21 | 474.48 | 420.65 | 203.00 | 6 | 20 | 4 | mountn |
| 896 | 897 | gc | 1128.50 | 1167.80 | 1047.60 | 1292.60 | 1297.10 | 207.40 | 213.77 | 705.36 | 551.61 | 243.76 | 7 | 45 | 7 | scostl |
| 897 | 898 | gc | 787.10 | 1055.20 | 842.79 | 1041.30 | 1064.80 | 175.05 | 141.63 | 478.86 | 448.61 | 254.51 | 5 | 60 | 7 | scostl |
| 898 | 899 | gc | 860.56 | 1081.30 | 799.76 | 1123.20 | 1218.20 | 211.04 | 151.31 | 495.20 | 401.56 | 246.48 | 5 | 50 | 6 | scostl |
| 899 | 900 | gc | 893.94 | 1119.90 | 967.88 | 1091.70 | 1387.50 | 175.80 | 180.11 | 518.68 | 458.53 | 245.13 | 2 | 65 | 4 | ncostl |
900 rows × 16 columns
Single Layer Nesting#
The important addition gained through the nested logit specification is the ability to specify “nests” of products in this way we can partition the market into “natural” groups of competing products ensuring that there is an inherent bias in the model towards a selective pattern of preference. As before we specify the models using formulas, but now we also add a nesting structure.
## No Fixed Covariates
utility_formulas = [
"gc ~ ic_gc + oc_gc |",
"ec ~ ic_ec + oc_ec | ",
"gr ~ ic_gr + oc_gr | ",
"er ~ ic_er + oc_er | ",
"hp ~ ic_hp + oc_hp | ",
]
nesting_structure = {"central": ["gc", "ec", "hp"], "room": ["gr", "er"]}
nstL_1 = NestedLogit(
df,
utility_formulas,
"depvar",
covariates=["ic", "oc"],
nesting_structure=nesting_structure,
alphas_nests=True,
)
nstL_1
<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x173a81940>
We will dwell a bit on the manner in which these nests are specified and why. The nested logit model partitions the choice set into nests of alternatives that share common unobserved attributes (i.e., are more similar to each other). It computes the overall probability of choosing an alternative as the product of (1) The probability of choosing a nest (marginal probability), and (2) the probability of choosing an alternative within that nest (conditional probability, given that nest). In our case we want to isolate the probability of choosing a central heatings system and a room based heating system.
Each of the alternatives alt is indexed to a nest. So that we can determine (§) the marginal probability of choosing room or central and (2) conditional probability of choosing ec given that you have chosen central. Our utilities are decomposed into contributions from fixed_covariates and alternative specific covariates:
\(U = Y + W + \epsilon\)
and the probabilities are derived from these decomposed utilitiies in the following manner.
where
and
while the inclusive term \(I_{k}\) is:
such that \(I_{k}\) is used to “aggregate” utilities within a nest a “bubble up” their contribution to decision through the product of the marginal and conditional probabilities. More extensive details of this mathematical formulation can be found in Kenneth Train’s “Discrete Choice Methods with Simulation”.
nstL_1.sample(
fit_kwargs={
"idata_kwargs": {"log_likelihood": True},
"nuts_sampler": "nutpie",
"progressbar": True,
}
)
Sampling: [alphas_, alphas_nests, betas, lambdas_nests, likelihood]
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/pymc/sampling/mcmc.py:328: UserWarning: `idata_kwargs` are currently ignored by the nutpie sampler
warnings.warn(
Sampler Progress
Total Chains: 4
Active Chains: 0
Finished Chains: 4
Sampling for a minute
Estimated Time to Completion: now
| Progress | Draws | Divergences | Step Size | Gradients/Draw |
|---|---|---|---|---|
| 2000 | 0 | 0.11 | 63 | |
| 2000 | 1 | 0.12 | 175 | |
| 2000 | 2 | 0.11 | 95 | |
| 2000 | 1 | 0.11 | 63 |
Sampling: [likelihood]
<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x173a81940>
The model structure is quite a bit more complicated now than the simpler multinomial logit as we need to calculate the marginal and conditional probabilities within each of the nests seperately and then “bubble up” the probabilies as a product over the branching nests. These probabilities are deterministic functions of the summed utilities and are then fed into our categorical likelihood to calibrate our parameters against the observed data.
nstL_1.graphviz()
But again we are able to derive parameter estimates for the drivers of consumer choice and consult the model implications is in a standard Bayesian model.
az.summary(
nstL_1.idata,
var_names=["betas", "alphas", "lambdas_nests", "alphas_nests"],
round_to=5,
)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
(between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:991: RuntimeWarning: invalid value encountered in scalar divide
varsd = varvar / evar / 4
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| betas[ic] | -0.00245 | 0.00071 | -0.00382 | -0.00114 | 0.00001 | 0.00001 | 3100.30564 | 2664.58751 | 1.00095 |
| betas[oc] | -0.00617 | 0.00165 | -0.00930 | -0.00317 | 0.00004 | 0.00003 | 1798.24975 | 1576.41780 | 1.00171 |
| alphas[gc] | 1.12221 | 0.29903 | 0.57939 | 1.71076 | 0.00647 | 0.00462 | 2128.38565 | 2277.73719 | 1.00115 |
| alphas[ec] | 1.20962 | 0.43109 | 0.40850 | 1.99476 | 0.00952 | 0.00687 | 2043.42174 | 2164.58576 | 1.00033 |
| alphas[gr] | -0.64209 | 3.64331 | -7.16379 | 6.69013 | 0.16307 | 0.09298 | 502.30586 | 735.44040 | 1.00538 |
| alphas[er] | 0.72538 | 3.64851 | -6.06945 | 7.58250 | 0.16347 | 0.09275 | 503.42563 | 696.82660 | 1.00494 |
| alphas[hp] | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | NaN | 4000.00000 | 4000.00000 | NaN |
| lambdas_nests[central] | 0.82972 | 0.12952 | 0.59742 | 0.99990 | 0.00371 | 0.00264 | 1258.11306 | 1893.71359 | 1.00190 |
| lambdas_nests[room] | 0.78966 | 0.14615 | 0.53154 | 0.99998 | 0.00315 | 0.00204 | 1722.77268 | 1141.52465 | 1.00175 |
| alphas_nests[central] | 0.68631 | 0.71647 | -0.67748 | 1.99964 | 0.01827 | 0.01451 | 1536.95101 | 1524.19559 | 1.00477 |
| alphas_nests[room] | -0.67159 | 0.71305 | -2.07221 | 0.59883 | 0.01833 | 0.01457 | 1511.39872 | 1587.96716 | 1.00482 |
Here we see additionally lambda parameters for each of the nests. These terms measure how strongly correlated the unobserved utility components are for alternatives within the same nest. Closer to 0 indicates a high correlation, substitution happens mostly within the nest. Whereas a value closer to 1 implies lower within nest correlation suggesting IIA approximately holds within the nest. The options available for structuring a market can be quite extensive. As we might have “nests within nests” where the conditional probabilities flow through successive choices within segments of the market.
Again the parameter estimates seem to be recovered sensibly on the beta coefficients, but the question remains as to whether this additional nesting structure will help support plausible counterfactual reasoning. Note how the model struggles to identify the the intercept terms and places weight on
Making Interventions in Structured Markets#
new_policy_df = df.copy()
new_policy_df[["ic_ec", "ic_er"]] = new_policy_df[["ic_ec", "ic_er"]] * 1.5
idata_new_policy_1 = nstL_1.apply_intervention(new_choice_df=new_policy_df)
Sampling: [likelihood]
Here we can see that both nesting structures recover non-proportional patterns of product substitution. We have elided the IIA feature of the multinomial logit and can continue to assess whether or not the behaviour implications of these utility theory makes sense.
change_df_1 = nstL_1.calculate_share_change(nstL_1.idata, nstL_1.intervention_idata)
change_df_1
| policy_share | new_policy_share | relative_change | |
|---|---|---|---|
| product | |||
| gc | 0.635648 | 0.703700 | 0.107059 |
| ec | 0.071388 | 0.026369 | -0.630621 |
| gr | 0.149848 | 0.179990 | 0.201150 |
| er | 0.087477 | 0.028179 | -0.677865 |
| hp | 0.055640 | 0.061762 | 0.110034 |
Visualising the Substitution Patterns#
We can additionally test counterfactuals about the removal of products from the market.
fit_kwargs = {
"target_accept": 0.97,
"tune": 2000,
"nuts_sampler": "nutpie",
"idata_kwargs": {"log_likelihood": True},
"progressbar": False,
}
new_policy_df = nstL_1.choice_df.copy()
new_policy_df = new_policy_df[
(new_policy_df["depvar"] != "hp") & (new_policy_df["depvar"] != "gr")
]
new_utility_formulas = [
"gc ~ ic_gc + oc_gc | income",
"ec ~ ic_ec + oc_ec | income ",
"er ~ ic_er + oc_er | income ",
]
nstL_1.nesting_structure = {"central": ["gc", "ec"], "room": ["er"]}
nstL_1.alternatives = ["gc", "ec", "er"]
idata_new_policy_3 = nstL_1.apply_intervention(
new_choice_df=new_policy_df,
new_utility_equations=new_utility_formulas,
fit_kwargs=fit_kwargs,
)
change_df_3 = nstL_1.calculate_share_change(nstL_1.idata, nstL_1.intervention_idata)
change_df_3
Sampling: [alphas_, alphas_nests, betas, betas_fixed_, lambdas_nests, likelihood]
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/pymc/sampling/mcmc.py:328: UserWarning: `idata_kwargs` are currently ignored by the nutpie sampler
warnings.warn(
Sampling: [likelihood]
| policy_share | new_policy_share | relative_change | |
|---|---|---|---|
| product | |||
| gc | 0.635648 | 0.792997 | 0.247541 |
| ec | 0.071388 | 0.089120 | 0.248388 |
| gr | 0.149848 | 0.000000 | -1.000000 |
| er | 0.087477 | 0.117884 | 0.347599 |
| hp | 0.055640 | 0.000000 | -1.000000 |
A Different Market#
Let’s now briefly look at a different market that highlights a limitation of the nested logit model.
data_path = data_dir / "choice_crackers.csv"
df_new = pd.read_csv(data_path)
last_chosen = pd.get_dummies(df_new["lastChoice"]).drop("private", axis=1).astype(int)
last_chosen.columns = [col + "_last_chosen" for col in last_chosen.columns]
df_new[last_chosen.columns] = last_chosen
df_new
| personId | disp_sunshine | disp_keebler | disp_nabisco | disp_private | feat_sunshine | feat_keebler | feat_nabisco | feat_private | price_sunshine | price_keebler | price_nabisco | price_private | choice | lastChoice | personChoiceId | choiceId | keebler_last_chosen | nabisco_last_chosen | sunshine_last_chosen | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.99 | 1.09 | 0.99 | 0.71 | nabisco | nabisco | 1 | 1 | 0 | 1 | 0 |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.49 | 1.09 | 1.09 | 0.78 | sunshine | nabisco | 2 | 2 | 0 | 1 | 0 |
| 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.03 | 1.09 | 0.89 | 0.78 | nabisco | sunshine | 3 | 3 | 0 | 0 | 1 |
| 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.09 | 1.09 | 1.19 | 0.64 | nabisco | nabisco | 4 | 4 | 0 | 1 | 0 |
| 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.89 | 1.09 | 1.19 | 0.84 | nabisco | nabisco | 5 | 5 | 0 | 1 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3151 | 136 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.09 | 1.19 | 0.99 | 0.55 | private | private | 9 | 3152 | 0 | 0 | 0 |
| 3152 | 136 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0.78 | 1.35 | 1.04 | 0.65 | private | private | 10 | 3153 | 0 | 0 | 0 |
| 3153 | 136 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.09 | 1.17 | 1.29 | 0.59 | private | private | 11 | 3154 | 0 | 0 | 0 |
| 3154 | 136 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.09 | 1.22 | 1.29 | 0.59 | private | private | 12 | 3155 | 0 | 0 | 0 |
| 3155 | 136 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.29 | 1.04 | 1.23 | 0.59 | private | private | 13 | 3156 | 0 | 0 | 0 |
3156 rows × 20 columns
utility_formulas = [
"sunshine ~ disp_sunshine + feat_sunshine + price_sunshine ",
"keebler ~ disp_keebler + feat_keebler + price_keebler ",
"nabisco ~ disp_nabisco + feat_nabisco + price_nabisco ",
"private ~ disp_private + feat_private + price_private ",
]
nesting_structure = {
"private": ["private"],
"brand": ["keebler", "sunshine", "nabisco"],
}
nstL_3 = NestedLogit(
choice_df=df_new,
utility_equations=utility_formulas,
depvar="choice",
covariates=["disp", "feat", "price"],
nesting_structure=nesting_structure,
model_config={
"alphas_": Prior("Normal", mu=0, sigma=1, dims="alts"),
"betas": Prior("Normal", mu=0, sigma=1, dims="alt_covariates"),
"lambdas_nests": Prior("Beta", alpha=1, beta=1, dims="nests"),
},
alphas_nests=False,
)
nstL_3
<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x29438e5d0>
nstL_3.sample(
fit_kwargs={
"target_accept": 0.95,
"tune": 2000,
"nuts_sampler": "nutpie",
"progressbar": True,
}
)
Sampling: [alphas_, betas, lambdas_nests, likelihood]
Sampler Progress
Total Chains: 4
Active Chains: 0
Finished Chains: 4
Sampling for a minute
Estimated Time to Completion: now
| Progress | Draws | Divergences | Step Size | Gradients/Draw |
|---|---|---|---|---|
| 3000 | 0 | 0.17 | 31 | |
| 3000 | 0 | 0.17 | 7 | |
| 3000 | 0 | 0.18 | 31 | |
| 3000 | 0 | 0.17 | 31 |
Sampling: [likelihood]
<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x29438e5d0>
az.summary(
nstL_3.idata,
var_names=["alphas", "betas"],
)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
(between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:991: RuntimeWarning: invalid value encountered in scalar divide
varsd = varvar / evar / 4
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| alphas[sunshine] | -0.968 | 0.606 | -2.023 | 0.220 | 0.026 | 0.018 | 511.0 | 645.0 | 1.02 |
| alphas[keebler] | -0.400 | 0.604 | -1.449 | 0.805 | 0.027 | 0.018 | 507.0 | 682.0 | 1.02 |
| alphas[nabisco] | 1.361 | 0.605 | 0.320 | 2.562 | 0.027 | 0.018 | 499.0 | 599.0 | 1.02 |
| alphas[private] | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | NaN | 4000.0 | 4000.0 | NaN |
| betas[disp] | 0.098 | 0.087 | -0.064 | 0.254 | 0.001 | 0.001 | 4290.0 | 2861.0 | 1.00 |
| betas[feat] | 0.336 | 0.143 | 0.075 | 0.607 | 0.002 | 0.002 | 4785.0 | 3045.0 | 1.00 |
| betas[price] | -3.592 | 0.318 | -4.159 | -2.974 | 0.005 | 0.005 | 3520.0 | 2583.0 | 1.00 |
ax = az.plot_forest(
nstL_3.idata,
var_names=["alphas", "betas"],
combined=True,
r_hat=True,
)
ax[0].axvline(0, color="k", linestyle="--")
ax[0].set_title("Parameter Estimates from the Nested Logit Model");
This model suggests plausibly that price increases have a strongly negative effect of the purchase probability of crackers. However, we should note that we’re ignoring information when we use the nested logit in this context. The data set records multiple purchasing decisions for each individual and by failing to model this fact we lose insight into individual heterogeneity in their responses to price shifts. To gain more insight into this aspect of the purchasing decisions of individuals we might augment our model with hierarchical components or switch to alternative Mixed logit model.
Choosing a Market Structure#
Causal inference is hard and predicting the counterfactual actions of agents in a competitive market is very hard. There is no guarantee that a nested logit model will accurately represent the choice of any particular agent, but you can be hopeful that it highlights expected patterns of choice when the nesting structure reflects the natural segmentation of a market. Nests should group alternatives that share unobserved similarities, ideally driven by a transparent theory of the market structure. Well-specified nests should show stronger substitution within nests than across nests, and you can inspect the substitution patterns as above. Ideally you can always try and hold out some test data to evaluate the implications of your fitted model. Discrete choice models are causal inference models and their structural specification should support generalisable inference across future and counterfactual situations.
%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor
Last updated: Mon Jan 19 2026
Python implementation: CPython
Python version : 3.12.12
IPython version : 9.8.0
pytensor: 2.36.3
pandas : 2.3.3
pymc_marketing: 0.17.1
matplotlib : 3.10.8
pymc_extras : 0.6.0
arviz : 0.23.0
Watermark: 2.5.0