Nested Logit and Non-Proportional Patterns of Substitution#

import arviz as az
import matplotlib.pyplot as plt
import pandas as pd
from pymc_extras.prior import Prior

from pymc_marketing.customer_choice.nested_logit import NestedLogit
from pymc_marketing.paths import data_dir

Hide code cell output

/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/pymc_marketing/mmm/multidimensional.py:218: FutureWarning: This functionality is experimental and subject to change. If you encounter any issues or have suggestions, please raise them at: https://github.com/pymc-labs/pymc-marketing/issues/new
  warnings.warn(warning_msg, FutureWarning, stacklevel=1)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/pymc_marketing/mmm/time_slice_cross_validation.py:32: UserWarning: The pymc_marketing.mmm.builders module is experimental and its API may change without warning.
  from pymc_marketing.mmm.builders.yaml import build_mmm_from_yaml
az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100
%config InlineBackend.figure_format = "retina"

We’ve seen in the other example notebook for consumer choice how the Multinomial Logit model suffers from the IIA limitation that leads to implausible counterfactual inferences regarding market behaviour. If you haven’t read that one, we advise you read it before continuing. We will now show how the nested logit model specification avoids this property by adding more explicit structure to the choice scenarios that are modelled.

In this notebook we will re-use the same heating choice data set seen in the Multinomial Logit example and apply a few different specifications of a nested logit model.

data_path = data_dir / "choice_wide_heating.csv"
df = pd.read_csv(data_path)
df
idcase depvar ic_gc ic_gr ic_ec ic_er ic_hp oc_gc oc_gr oc_ec oc_er oc_hp income agehed rooms region
0 1 gc 866.00 962.64 859.90 995.76 1135.50 199.69 151.72 553.34 505.60 237.88 7 25 6 ncostl
1 2 gc 727.93 758.89 796.82 894.69 968.90 168.66 168.66 520.24 486.49 199.19 5 60 5 scostl
2 3 gc 599.48 783.05 719.86 900.11 1048.30 165.58 137.80 439.06 404.74 171.47 4 65 2 ncostl
3 4 er 835.17 793.06 761.25 831.04 1048.70 180.88 147.14 483.00 425.22 222.95 2 50 4 scostl
4 5 er 755.59 846.29 858.86 985.64 883.05 174.91 138.90 404.41 389.52 178.49 2 25 6 valley
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
895 896 gc 766.39 877.71 751.59 869.78 942.70 142.61 136.21 474.48 420.65 203.00 6 20 4 mountn
896 897 gc 1128.50 1167.80 1047.60 1292.60 1297.10 207.40 213.77 705.36 551.61 243.76 7 45 7 scostl
897 898 gc 787.10 1055.20 842.79 1041.30 1064.80 175.05 141.63 478.86 448.61 254.51 5 60 7 scostl
898 899 gc 860.56 1081.30 799.76 1123.20 1218.20 211.04 151.31 495.20 401.56 246.48 5 50 6 scostl
899 900 gc 893.94 1119.90 967.88 1091.70 1387.50 175.80 180.11 518.68 458.53 245.13 2 65 4 ncostl

900 rows × 16 columns

Single Layer Nesting#

The important addition gained through the nested logit specification is the ability to specify “nests” of products in this way we can partition the market into “natural” groups of competing products ensuring that there is an inherent bias in the model towards a selective pattern of preference. As before we specify the models using formulas, but now we also add a nesting structure.

## No Fixed Covariates
utility_formulas = [
    "gc ~ ic_gc + oc_gc |",
    "ec ~ ic_ec + oc_ec | ",
    "gr ~ ic_gr + oc_gr | ",
    "er ~ ic_er + oc_er | ",
    "hp ~ ic_hp + oc_hp | ",
]


nesting_structure = {"central": ["gc", "ec", "hp"], "room": ["gr", "er"]}


nstL_1 = NestedLogit(
    df,
    utility_formulas,
    "depvar",
    covariates=["ic", "oc"],
    nesting_structure=nesting_structure,
    alphas_nests=True,
)
nstL_1
<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x173a81940>

We will dwell a bit on the manner in which these nests are specified and why. The nested logit model partitions the choice set into nests of alternatives that share common unobserved attributes (i.e., are more similar to each other). It computes the overall probability of choosing an alternative as the product of (1) The probability of choosing a nest (marginal probability), and (2) the probability of choosing an alternative within that nest (conditional probability, given that nest). In our case we want to isolate the probability of choosing a central heatings system and a room based heating system.

Each of the alternatives alt is indexed to a nest. So that we can determine (§) the marginal probability of choosing room or central and (2) conditional probability of choosing ec given that you have chosen central. Our utilities are decomposed into contributions from fixed_covariates and alternative specific covariates:

\(U = Y + W + \epsilon\)

and the probabilities are derived from these decomposed utilitiies in the following manner.

\[ P(i) = P( \text{choose nest B}) \cdot P( \text{choose i} | \text{ i } \in \text{B}) \]

where

\[ P(\text{choose nest B}) = \dfrac{e^{W + \lambda_{k}I_{k}}}{\sum_{l=1}^{K} e^{W + \lambda_{l}I_{l}}} \]

and

\[ P(\text{choose i} | \text{ i} \in \text{B}) = \dfrac{e^{Y_{i} / \lambda_{k}}}{\sum_{j \in B_{k}} e^{Y_{j} / \lambda_{k}}} \]

while the inclusive term \(I_{k}\) is:

\[I_{k} = ln \sum_{j \in B_{k}} e^{Y_{j} / \lambda_{k}} \text{ and } \lambda_{k} \sim Beta(1, 1)\]

such that \(I_{k}\) is used to “aggregate” utilities within a nest a “bubble up” their contribution to decision through the product of the marginal and conditional probabilities. More extensive details of this mathematical formulation can be found in Kenneth Train’s “Discrete Choice Methods with Simulation”.

nstL_1.sample(
    fit_kwargs={
        "idata_kwargs": {"log_likelihood": True},
        "nuts_sampler": "nutpie",
        "progressbar": True,
    }
)
Sampling: [alphas_, alphas_nests, betas, lambdas_nests, likelihood]
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/pymc/sampling/mcmc.py:328: UserWarning: `idata_kwargs` are currently ignored by the nutpie sampler
  warnings.warn(

Sampler Progress

Total Chains: 4

Active Chains: 0

Finished Chains: 4

Sampling for a minute

Estimated Time to Completion: now

Progress Draws Divergences Step Size Gradients/Draw
2000 0 0.11 63
2000 1 0.12 175
2000 2 0.11 95
2000 1 0.11 63
Sampling: [likelihood]

<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x173a81940>

The model structure is quite a bit more complicated now than the simpler multinomial logit as we need to calculate the marginal and conditional probabilities within each of the nests seperately and then “bubble up” the probabilies as a product over the branching nests. These probabilities are deterministic functions of the summed utilities and are then fed into our categorical likelihood to calibrate our parameters against the observed data.

nstL_1.graphviz()
../../_images/c7e25038286ab2b97098f46aad53349e5851781e382b4365a05843f657dd3924.svg

But again we are able to derive parameter estimates for the drivers of consumer choice and consult the model implications is in a standard Bayesian model.

az.summary(
    nstL_1.idata,
    var_names=["betas", "alphas", "lambdas_nests", "alphas_nests"],
    round_to=5,
)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:991: RuntimeWarning: invalid value encountered in scalar divide
  varsd = varvar / evar / 4
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
betas[ic] -0.00245 0.00071 -0.00382 -0.00114 0.00001 0.00001 3100.30564 2664.58751 1.00095
betas[oc] -0.00617 0.00165 -0.00930 -0.00317 0.00004 0.00003 1798.24975 1576.41780 1.00171
alphas[gc] 1.12221 0.29903 0.57939 1.71076 0.00647 0.00462 2128.38565 2277.73719 1.00115
alphas[ec] 1.20962 0.43109 0.40850 1.99476 0.00952 0.00687 2043.42174 2164.58576 1.00033
alphas[gr] -0.64209 3.64331 -7.16379 6.69013 0.16307 0.09298 502.30586 735.44040 1.00538
alphas[er] 0.72538 3.64851 -6.06945 7.58250 0.16347 0.09275 503.42563 696.82660 1.00494
alphas[hp] 0.00000 0.00000 0.00000 0.00000 0.00000 NaN 4000.00000 4000.00000 NaN
lambdas_nests[central] 0.82972 0.12952 0.59742 0.99990 0.00371 0.00264 1258.11306 1893.71359 1.00190
lambdas_nests[room] 0.78966 0.14615 0.53154 0.99998 0.00315 0.00204 1722.77268 1141.52465 1.00175
alphas_nests[central] 0.68631 0.71647 -0.67748 1.99964 0.01827 0.01451 1536.95101 1524.19559 1.00477
alphas_nests[room] -0.67159 0.71305 -2.07221 0.59883 0.01833 0.01457 1511.39872 1587.96716 1.00482

Here we see additionally lambda parameters for each of the nests. These terms measure how strongly correlated the unobserved utility components are for alternatives within the same nest. Closer to 0 indicates a high correlation, substitution happens mostly within the nest. Whereas a value closer to 1 implies lower within nest correlation suggesting IIA approximately holds within the nest. The options available for structuring a market can be quite extensive. As we might have “nests within nests” where the conditional probabilities flow through successive choices within segments of the market.

az.plot_trace(nstL_1.idata, var_names=["betas"]);

Again the parameter estimates seem to be recovered sensibly on the beta coefficients, but the question remains as to whether this additional nesting structure will help support plausible counterfactual reasoning. Note how the model struggles to identify the the intercept terms and places weight on

Making Interventions in Structured Markets#

new_policy_df = df.copy()
new_policy_df[["ic_ec", "ic_er"]] = new_policy_df[["ic_ec", "ic_er"]] * 1.5

idata_new_policy_1 = nstL_1.apply_intervention(new_choice_df=new_policy_df)
Sampling: [likelihood]

Here we can see that both nesting structures recover non-proportional patterns of product substitution. We have elided the IIA feature of the multinomial logit and can continue to assess whether or not the behaviour implications of these utility theory makes sense.

change_df_1 = nstL_1.calculate_share_change(nstL_1.idata, nstL_1.intervention_idata)
change_df_1
policy_share new_policy_share relative_change
product
gc 0.635648 0.703700 0.107059
ec 0.071388 0.026369 -0.630621
gr 0.149848 0.179990 0.201150
er 0.087477 0.028179 -0.677865
hp 0.055640 0.061762 0.110034

Visualising the Substitution Patterns#

fig = nstL_1.plot_change(change_df=change_df_1, figsize=(10, 6))

We can additionally test counterfactuals about the removal of products from the market.

fit_kwargs = {
    "target_accept": 0.97,
    "tune": 2000,
    "nuts_sampler": "nutpie",
    "idata_kwargs": {"log_likelihood": True},
    "progressbar": False,
}

new_policy_df = nstL_1.choice_df.copy()
new_policy_df = new_policy_df[
    (new_policy_df["depvar"] != "hp") & (new_policy_df["depvar"] != "gr")
]

new_utility_formulas = [
    "gc ~ ic_gc + oc_gc | income",
    "ec ~ ic_ec + oc_ec | income ",
    "er ~ ic_er + oc_er | income  ",
]
nstL_1.nesting_structure = {"central": ["gc", "ec"], "room": ["er"]}
nstL_1.alternatives = ["gc", "ec", "er"]
idata_new_policy_3 = nstL_1.apply_intervention(
    new_choice_df=new_policy_df,
    new_utility_equations=new_utility_formulas,
    fit_kwargs=fit_kwargs,
)

change_df_3 = nstL_1.calculate_share_change(nstL_1.idata, nstL_1.intervention_idata)
change_df_3
Sampling: [alphas_, alphas_nests, betas, betas_fixed_, lambdas_nests, likelihood]
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/pymc/sampling/mcmc.py:328: UserWarning: `idata_kwargs` are currently ignored by the nutpie sampler
  warnings.warn(
Sampling: [likelihood]

policy_share new_policy_share relative_change
product
gc 0.635648 0.792997 0.247541
ec 0.071388 0.089120 0.248388
gr 0.149848 0.000000 -1.000000
er 0.087477 0.117884 0.347599
hp 0.055640 0.000000 -1.000000

A Different Market#

Let’s now briefly look at a different market that highlights a limitation of the nested logit model.

data_path = data_dir / "choice_crackers.csv"
df_new = pd.read_csv(data_path)
last_chosen = pd.get_dummies(df_new["lastChoice"]).drop("private", axis=1).astype(int)
last_chosen.columns = [col + "_last_chosen" for col in last_chosen.columns]
df_new[last_chosen.columns] = last_chosen
df_new
personId disp_sunshine disp_keebler disp_nabisco disp_private feat_sunshine feat_keebler feat_nabisco feat_private price_sunshine price_keebler price_nabisco price_private choice lastChoice personChoiceId choiceId keebler_last_chosen nabisco_last_chosen sunshine_last_chosen
0 1 0 0 0 0 0 0 0 0 0.99 1.09 0.99 0.71 nabisco nabisco 1 1 0 1 0
1 1 1 0 0 0 0 0 0 0 0.49 1.09 1.09 0.78 sunshine nabisco 2 2 0 1 0
2 1 0 0 0 0 0 0 0 0 1.03 1.09 0.89 0.78 nabisco sunshine 3 3 0 0 1
3 1 0 0 0 0 0 0 0 0 1.09 1.09 1.19 0.64 nabisco nabisco 4 4 0 1 0
4 1 0 0 0 0 0 0 0 0 0.89 1.09 1.19 0.84 nabisco nabisco 5 5 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3151 136 0 0 0 0 0 0 0 0 1.09 1.19 0.99 0.55 private private 9 3152 0 0 0
3152 136 0 0 0 1 0 0 0 0 0.78 1.35 1.04 0.65 private private 10 3153 0 0 0
3153 136 0 0 0 0 0 0 0 0 1.09 1.17 1.29 0.59 private private 11 3154 0 0 0
3154 136 0 0 0 0 0 0 0 0 1.09 1.22 1.29 0.59 private private 12 3155 0 0 0
3155 136 0 0 0 0 0 0 0 0 1.29 1.04 1.23 0.59 private private 13 3156 0 0 0

3156 rows × 20 columns

utility_formulas = [
    "sunshine ~ disp_sunshine + feat_sunshine + price_sunshine ",
    "keebler ~ disp_keebler + feat_keebler + price_keebler  ",
    "nabisco ~ disp_nabisco + feat_nabisco + price_nabisco  ",
    "private ~ disp_private + feat_private + price_private  ",
]


nesting_structure = {
    "private": ["private"],
    "brand": ["keebler", "sunshine", "nabisco"],
}


nstL_3 = NestedLogit(
    choice_df=df_new,
    utility_equations=utility_formulas,
    depvar="choice",
    covariates=["disp", "feat", "price"],
    nesting_structure=nesting_structure,
    model_config={
        "alphas_": Prior("Normal", mu=0, sigma=1, dims="alts"),
        "betas": Prior("Normal", mu=0, sigma=1, dims="alt_covariates"),
        "lambdas_nests": Prior("Beta", alpha=1, beta=1, dims="nests"),
    },
    alphas_nests=False,
)
nstL_3
<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x29438e5d0>
nstL_3.sample(
    fit_kwargs={
        "target_accept": 0.95,
        "tune": 2000,
        "nuts_sampler": "nutpie",
        "progressbar": True,
    }
)
Sampling: [alphas_, betas, lambdas_nests, likelihood]

Sampler Progress

Total Chains: 4

Active Chains: 0

Finished Chains: 4

Sampling for a minute

Estimated Time to Completion: now

Progress Draws Divergences Step Size Gradients/Draw
3000 0 0.17 31
3000 0 0.17 7
3000 0 0.18 31
3000 0 0.17 31
Sampling: [likelihood]

<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x29438e5d0>
az.summary(
    nstL_3.idata,
    var_names=["alphas", "betas"],
)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:991: RuntimeWarning: invalid value encountered in scalar divide
  varsd = varvar / evar / 4
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alphas[sunshine] -0.968 0.606 -2.023 0.220 0.026 0.018 511.0 645.0 1.02
alphas[keebler] -0.400 0.604 -1.449 0.805 0.027 0.018 507.0 682.0 1.02
alphas[nabisco] 1.361 0.605 0.320 2.562 0.027 0.018 499.0 599.0 1.02
alphas[private] 0.000 0.000 0.000 0.000 0.000 NaN 4000.0 4000.0 NaN
betas[disp] 0.098 0.087 -0.064 0.254 0.001 0.001 4290.0 2861.0 1.00
betas[feat] 0.336 0.143 0.075 0.607 0.002 0.002 4785.0 3045.0 1.00
betas[price] -3.592 0.318 -4.159 -2.974 0.005 0.005 3520.0 2583.0 1.00
ax = az.plot_forest(
    nstL_3.idata,
    var_names=["alphas", "betas"],
    combined=True,
    r_hat=True,
)
ax[0].axvline(0, color="k", linestyle="--")
ax[0].set_title("Parameter Estimates from the Nested Logit Model");
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
../../_images/b0de9ffa7663935b476e2a8febb5a21075ddbb86a8611edee8a90b3be7367eac.png

This model suggests plausibly that price increases have a strongly negative effect of the purchase probability of crackers. However, we should note that we’re ignoring information when we use the nested logit in this context. The data set records multiple purchasing decisions for each individual and by failing to model this fact we lose insight into individual heterogeneity in their responses to price shifts. To gain more insight into this aspect of the purchasing decisions of individuals we might augment our model with hierarchical components or switch to alternative Mixed logit model.

Choosing a Market Structure#

Causal inference is hard and predicting the counterfactual actions of agents in a competitive market is very hard. There is no guarantee that a nested logit model will accurately represent the choice of any particular agent, but you can be hopeful that it highlights expected patterns of choice when the nesting structure reflects the natural segmentation of a market. Nests should group alternatives that share unobserved similarities, ideally driven by a transparent theory of the market structure. Well-specified nests should show stronger substitution within nests than across nests, and you can inspect the substitution patterns as above. Ideally you can always try and hold out some test data to evaluate the implications of your fitted model. Discrete choice models are causal inference models and their structural specification should support generalisable inference across future and counterfactual situations.

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor
Last updated: Mon Jan 19 2026

Python implementation: CPython
Python version       : 3.12.12
IPython version      : 9.8.0

pytensor: 2.36.3

pandas        : 2.3.3
pymc_marketing: 0.17.1
matplotlib    : 3.10.8
pymc_extras   : 0.6.0
arviz         : 0.23.0

Watermark: 2.5.0