Nested Logit and Non-Proportional Patterns of Substitution#

import arviz as az
import matplotlib.pyplot as plt
import pandas as pd
from pymc_extras.prior import Prior

from pymc_marketing.customer_choice.nested_logit import NestedLogit
from pymc_marketing.paths import data_dir

az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100
%config InlineBackend.figure_format = "retina"

We’ve seen in the other example notebook for consumer choice how the Multinomial Logit model suffers from the IIA limitation that leads to implausible counterfactual inferences regarding market behaviour. If you haven’t read that one, we advise you read it before continuing. We will now show how the nested logit model specification avoids this property by adding more explicit structure to the choice scenarios that are modelled.

In this notebook we will re-use the same heating choice data set seen in the Multinomial Logit example and apply a few different specifications of a nested logit model.

data_path = data_dir / "choice_wide_heating.csv"
df = pd.read_csv(data_path)
df

	idcase	depvar	ic_gc	ic_gr	ic_ec	ic_er	ic_hp	oc_gc	oc_gr	oc_ec	oc_er	oc_hp	income	agehed	rooms	region
0	1	gc	866.00	962.64	859.90	995.76	1135.50	199.69	151.72	553.34	505.60	237.88	7	25	6	ncostl
1	2	gc	727.93	758.89	796.82	894.69	968.90	168.66	168.66	520.24	486.49	199.19	5	60	5	scostl
2	3	gc	599.48	783.05	719.86	900.11	1048.30	165.58	137.80	439.06	404.74	171.47	4	65	2	ncostl
3	4	er	835.17	793.06	761.25	831.04	1048.70	180.88	147.14	483.00	425.22	222.95	2	50	4	scostl
4	5	er	755.59	846.29	858.86	985.64	883.05	174.91	138.90	404.41	389.52	178.49	2	25	6	valley
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
895	896	gc	766.39	877.71	751.59	869.78	942.70	142.61	136.21	474.48	420.65	203.00	6	20	4	mountn
896	897	gc	1128.50	1167.80	1047.60	1292.60	1297.10	207.40	213.77	705.36	551.61	243.76	7	45	7	scostl
897	898	gc	787.10	1055.20	842.79	1041.30	1064.80	175.05	141.63	478.86	448.61	254.51	5	60	7	scostl
898	899	gc	860.56	1081.30	799.76	1123.20	1218.20	211.04	151.31	495.20	401.56	246.48	5	50	6	scostl
899	900	gc	893.94	1119.90	967.88	1091.70	1387.50	175.80	180.11	518.68	458.53	245.13	2	65	4	ncostl

900 rows × 16 columns

Single Layer Nesting#

The important addition gained through the nested logit specification is the ability to specify “nests” of products in this way we can partition the market into “natural” groups of competing products ensuring that there is an inherent bias in the model towards a selective pattern of preference. As before we specify the models using formulas, but now we also add a nesting structure.

## No Fixed Covariates
utility_formulas = [
    "gc ~ ic_gc + oc_gc |",
    "ec ~ ic_ec + oc_ec | ",
    "gr ~ ic_gr + oc_gr | ",
    "er ~ ic_er + oc_er | ",
    "hp ~ ic_hp + oc_hp | ",
]


nesting_structure = {"central": ["gc", "ec", "hp"], "room": ["gr", "er"]}


nstL_1 = NestedLogit(
    df,
    utility_formulas,
    "depvar",
    covariates=["ic", "oc"],
    nesting_structure=nesting_structure,
    alphas_nests=True,
)
nstL_1

<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x173a81940>

We will dwell a bit on the manner in which these nests are specified and why. The nested logit model partitions the choice set into nests of alternatives that share common unobserved attributes (i.e., are more similar to each other). It computes the overall probability of choosing an alternative as the product of (1) The probability of choosing a nest (marginal probability), and (2) the probability of choosing an alternative within that nest (conditional probability, given that nest). In our case we want to isolate the probability of choosing a central heatings system and a room based heating system.

Each of the alternatives alt is indexed to a nest. So that we can determine (§) the marginal probability of choosing room or central and (2) conditional probability of choosing ec given that you have chosen central. Our utilities are decomposed into contributions from fixed_covariates and alternative specific covariates:

\(U = Y + W + \epsilon\)

and the probabilities are derived from these decomposed utilitiies in the following manner.

\[ P(i) = P( \text{choose nest B}) \cdot P( \text{choose i} | \text{ i } \in \text{B}) \]

where

\[ P(\text{choose nest B}) = \dfrac{e^{W + \lambda_{k}I_{k}}}{\sum_{l=1}^{K} e^{W + \lambda_{l}I_{l}}} \]

and

\[ P(\text{choose i} | \text{ i} \in \text{B}) = \dfrac{e^{Y_{i} / \lambda_{k}}}{\sum_{j \in B_{k}} e^{Y_{j} / \lambda_{k}}} \]

while the inclusive term \(I_{k}\) is:

\[I_{k} = ln \sum_{j \in B_{k}} e^{Y_{j} / \lambda_{k}} \text{ and } \lambda_{k} \sim Beta(1, 1)\]

such that \(I_{k}\) is used to “aggregate” utilities within a nest a “bubble up” their contribution to decision through the product of the marginal and conditional probabilities. More extensive details of this mathematical formulation can be found in Kenneth Train’s “Discrete Choice Methods with Simulation”.

nstL_1.sample(
    fit_kwargs={
        "idata_kwargs": {"log_likelihood": True},
        "nuts_sampler": "nutpie",
        "progressbar": True,
    }
)

Sampling: [alphas_, alphas_nests, betas, lambdas_nests, likelihood]
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/pymc/sampling/mcmc.py:328: UserWarning: `idata_kwargs` are currently ignored by the nutpie sampler
  warnings.warn(

Sampler Progress

Total Chains: 4

Active Chains: 0

Finished Chains: 4

Sampling for a minute

Estimated Time to Completion: now

Draws	Divergences	Step Size	Gradients/Draw
2000	0	0.11	63
2000	1	0.12	175
2000	2	0.11	95
2000	1	0.11	63

Sampling: [likelihood]

<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x173a81940>

The model structure is quite a bit more complicated now than the simpler multinomial logit as we need to calculate the marginal and conditional probabilities within each of the nests seperately and then “bubble up” the probabilies as a product over the branching nests. These probabilities are deterministic functions of the summed utilities and are then fed into our categorical likelihood to calibrate our parameters against the observed data.

nstL_1.graphviz()

../../_images/c7e25038286ab2b97098f46aad53349e5851781e382b4365a05843f657dd3924.svg

But again we are able to derive parameter estimates for the drivers of consumer choice and consult the model implications is in a standard Bayesian model.

az.summary(
    nstL_1.idata,
    var_names=["betas", "alphas", "lambdas_nests", "alphas_nests"],
    round_to=5,
)

/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:991: RuntimeWarning: invalid value encountered in scalar divide
  varsd = varvar / evar / 4

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
betas[ic]	-0.00245	0.00071	-0.00382	-0.00114	0.00001	0.00001	3100.30564	2664.58751	1.00095
betas[oc]	-0.00617	0.00165	-0.00930	-0.00317	0.00004	0.00003	1798.24975	1576.41780	1.00171
alphas[gc]	1.12221	0.29903	0.57939	1.71076	0.00647	0.00462	2128.38565	2277.73719	1.00115
alphas[ec]	1.20962	0.43109	0.40850	1.99476	0.00952	0.00687	2043.42174	2164.58576	1.00033
alphas[gr]	-0.64209	3.64331	-7.16379	6.69013	0.16307	0.09298	502.30586	735.44040	1.00538
alphas[er]	0.72538	3.64851	-6.06945	7.58250	0.16347	0.09275	503.42563	696.82660	1.00494
alphas[hp]	0.00000	0.00000	0.00000	0.00000	0.00000	NaN	4000.00000	4000.00000	NaN
lambdas_nests[central]	0.82972	0.12952	0.59742	0.99990	0.00371	0.00264	1258.11306	1893.71359	1.00190
lambdas_nests[room]	0.78966	0.14615	0.53154	0.99998	0.00315	0.00204	1722.77268	1141.52465	1.00175
alphas_nests[central]	0.68631	0.71647	-0.67748	1.99964	0.01827	0.01451	1536.95101	1524.19559	1.00477
alphas_nests[room]	-0.67159	0.71305	-2.07221	0.59883	0.01833	0.01457	1511.39872	1587.96716	1.00482

Here we see additionally lambda parameters for each of the nests. These terms measure how strongly correlated the unobserved utility components are for alternatives within the same nest. Closer to 0 indicates a high correlation, substitution happens mostly within the nest. Whereas a value closer to 1 implies lower within nest correlation suggesting IIA approximately holds within the nest. The options available for structuring a market can be quite extensive. As we might have “nests within nests” where the conditional probabilities flow through successive choices within segments of the market.

az.plot_trace(nstL_1.idata, var_names=["betas"]);

../../_images/fe13ad6df89595ee9be3856fdb32e284f7accfffeeae61e532cb71c3a3b72bb5.png

Again the parameter estimates seem to be recovered sensibly on the beta coefficients, but the question remains as to whether this additional nesting structure will help support plausible counterfactual reasoning. Note how the model struggles to identify the the intercept terms and places weight on

Making Interventions in Structured Markets#

new_policy_df = df.copy()
new_policy_df[["ic_ec", "ic_er"]] = new_policy_df[["ic_ec", "ic_er"]] * 1.5

idata_new_policy_1 = nstL_1.apply_intervention(new_choice_df=new_policy_df)

Sampling: [likelihood]

Here we can see that both nesting structures recover non-proportional patterns of product substitution. We have elided the IIA feature of the multinomial logit and can continue to assess whether or not the behaviour implications of these utility theory makes sense.

change_df_1 = nstL_1.calculate_share_change(nstL_1.idata, nstL_1.intervention_idata)
change_df_1

	policy_share	new_policy_share	relative_change
product
gc	0.635648	0.703700	0.107059
ec	0.071388	0.026369	-0.630621
gr	0.149848	0.179990	0.201150
er	0.087477	0.028179	-0.677865
hp	0.055640	0.061762	0.110034

Visualising the Substitution Patterns#

fig = nstL_1.plot_change(change_df=change_df_1, figsize=(10, 6))

../../_images/738d6a5b3c8e12bc5c0998d5324c9a6ec96626d74601f94c8800551e4d36ed4d.png

We can additionally test counterfactuals about the removal of products from the market.

fit_kwargs = {
    "target_accept": 0.97,
    "tune": 2000,
    "nuts_sampler": "nutpie",
    "idata_kwargs": {"log_likelihood": True},
    "progressbar": False,
}

new_policy_df = nstL_1.choice_df.copy()
new_policy_df = new_policy_df[
    (new_policy_df["depvar"] != "hp") & (new_policy_df["depvar"] != "gr")
]

new_utility_formulas = [
    "gc ~ ic_gc + oc_gc | income",
    "ec ~ ic_ec + oc_ec | income ",
    "er ~ ic_er + oc_er | income  ",
]
nstL_1.nesting_structure = {"central": ["gc", "ec"], "room": ["er"]}
nstL_1.alternatives = ["gc", "ec", "er"]
idata_new_policy_3 = nstL_1.apply_intervention(
    new_choice_df=new_policy_df,
    new_utility_equations=new_utility_formulas,
    fit_kwargs=fit_kwargs,
)

change_df_3 = nstL_1.calculate_share_change(nstL_1.idata, nstL_1.intervention_idata)
change_df_3

Sampling: [alphas_, alphas_nests, betas, betas_fixed_, lambdas_nests, likelihood]
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/pymc/sampling/mcmc.py:328: UserWarning: `idata_kwargs` are currently ignored by the nutpie sampler
  warnings.warn(
Sampling: [likelihood]

	policy_share	new_policy_share	relative_change
product
gc	0.635648	0.792997	0.247541
ec	0.071388	0.089120	0.248388
gr	0.149848	0.000000	-1.000000
er	0.087477	0.117884	0.347599
hp	0.055640	0.000000	-1.000000

A Different Market#

Let’s now briefly look at a different market that highlights a limitation of the nested logit model.

data_path = data_dir / "choice_crackers.csv"
df_new = pd.read_csv(data_path)
last_chosen = pd.get_dummies(df_new["lastChoice"]).drop("private", axis=1).astype(int)
last_chosen.columns = [col + "_last_chosen" for col in last_chosen.columns]
df_new[last_chosen.columns] = last_chosen
df_new

	personId	disp_sunshine	disp_keebler	disp_nabisco	disp_private	feat_sunshine	feat_keebler	feat_nabisco	feat_private	price_sunshine	price_keebler	price_nabisco	price_private	choice	lastChoice	personChoiceId	choiceId	keebler_last_chosen	nabisco_last_chosen	sunshine_last_chosen
0	1	0	0	0	0	0	0	0	0	0.99	1.09	0.99	0.71	nabisco	nabisco	1	1	0	1	0
1	1	1	0	0	0	0	0	0	0	0.49	1.09	1.09	0.78	sunshine	nabisco	2	2	0	1	0
2	1	0	0	0	0	0	0	0	0	1.03	1.09	0.89	0.78	nabisco	sunshine	3	3	0	0	1
3	1	0	0	0	0	0	0	0	0	1.09	1.09	1.19	0.64	nabisco	nabisco	4	4	0	1	0
4	1	0	0	0	0	0	0	0	0	0.89	1.09	1.19	0.84	nabisco	nabisco	5	5	0	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3151	136	0	0	0	0	0	0	0	0	1.09	1.19	0.99	0.55	private	private	9	3152	0	0	0
3152	136	0	0	0	1	0	0	0	0	0.78	1.35	1.04	0.65	private	private	10	3153	0	0	0
3153	136	0	0	0	0	0	0	0	0	1.09	1.17	1.29	0.59	private	private	11	3154	0	0	0
3154	136	0	0	0	0	0	0	0	0	1.09	1.22	1.29	0.59	private	private	12	3155	0	0	0
3155	136	0	0	0	0	0	0	0	0	1.29	1.04	1.23	0.59	private	private	13	3156	0	0	0

3156 rows × 20 columns

utility_formulas = [
    "sunshine ~ disp_sunshine + feat_sunshine + price_sunshine ",
    "keebler ~ disp_keebler + feat_keebler + price_keebler  ",
    "nabisco ~ disp_nabisco + feat_nabisco + price_nabisco  ",
    "private ~ disp_private + feat_private + price_private  ",
]


nesting_structure = {
    "private": ["private"],
    "brand": ["keebler", "sunshine", "nabisco"],
}


nstL_3 = NestedLogit(
    choice_df=df_new,
    utility_equations=utility_formulas,
    depvar="choice",
    covariates=["disp", "feat", "price"],
    nesting_structure=nesting_structure,
    model_config={
        "alphas_": Prior("Normal", mu=0, sigma=1, dims="alts"),
        "betas": Prior("Normal", mu=0, sigma=1, dims="alt_covariates"),
        "lambdas_nests": Prior("Beta", alpha=1, beta=1, dims="nests"),
    },
    alphas_nests=False,
)
nstL_3

<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x29438e5d0>

nstL_3.sample(
    fit_kwargs={
        "target_accept": 0.95,
        "tune": 2000,
        "nuts_sampler": "nutpie",
        "progressbar": True,
    }
)

Sampling: [alphas_, betas, lambdas_nests, likelihood]

Sampler Progress

Total Chains: 4

Active Chains: 0

Finished Chains: 4

Sampling for a minute

Estimated Time to Completion: now

Draws	Step Size	Gradients/Draw
3000	0.17	31
3000	0.17	7
3000	0.18	31
3000	0.17	31

Sampling: [likelihood]

<pymc_marketing.customer_choice.nested_logit.NestedLogit at 0x29438e5d0>

az.summary(
    nstL_3.idata,
    var_names=["alphas", "betas"],
)

/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:991: RuntimeWarning: invalid value encountered in scalar divide
  varsd = varvar / evar / 4

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alphas[sunshine]	-0.968	0.606	-2.023	0.220	0.026	0.018	511.0	645.0	1.02
alphas[keebler]	-0.400	0.604	-1.449	0.805	0.027	0.018	507.0	682.0	1.02
alphas[nabisco]	1.361	0.605	0.320	2.562	0.027	0.018	499.0	599.0	1.02
alphas[private]	0.000	0.000	0.000	0.000	0.000	NaN	4000.0	4000.0	NaN
betas[disp]	0.098	0.087	-0.064	0.254	0.001	0.001	4290.0	2861.0	1.00
betas[feat]	0.336	0.143	0.075	0.607	0.002	0.002	4785.0	3045.0	1.00
betas[price]	-3.592	0.318	-4.159	-2.974	0.005	0.005	3520.0	2583.0	1.00

ax = az.plot_forest(
    nstL_3.idata,
    var_names=["alphas", "betas"],
    combined=True,
    r_hat=True,
)
ax[0].axvline(0, color="k", linestyle="--")
ax[0].set_title("Parameter Estimates from the Nested Logit Model");

/Users/nathanielforde/mambaforge/envs/pymc-marketing-dev/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)

../../_images/b0de9ffa7663935b476e2a8febb5a21075ddbb86a8611edee8a90b3be7367eac.png

This model suggests plausibly that price increases have a strongly negative effect of the purchase probability of crackers. However, we should note that we’re ignoring information when we use the nested logit in this context. The data set records multiple purchasing decisions for each individual and by failing to model this fact we lose insight into individual heterogeneity in their responses to price shifts. To gain more insight into this aspect of the purchasing decisions of individuals we might augment our model with hierarchical components or switch to alternative Mixed logit model.

Choosing a Market Structure#

Causal inference is hard and predicting the counterfactual actions of agents in a competitive market is very hard. There is no guarantee that a nested logit model will accurately represent the choice of any particular agent, but you can be hopeful that it highlights expected patterns of choice when the nesting structure reflects the natural segmentation of a market. Nests should group alternatives that share unobserved similarities, ideally driven by a transparent theory of the market structure. Well-specified nests should show stronger substitution within nests than across nests, and you can inspect the substitution patterns as above. Ideally you can always try and hold out some test data to evaluate the implications of your fitted model. Discrete choice models are causal inference models and their structural specification should support generalisable inference across future and counterfactual situations.

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor

Last updated: Mon Jan 19 2026

Python implementation: CPython
Python version       : 3.12.12
IPython version      : 9.8.0

pytensor: 2.36.3

pandas        : 2.3.3
pymc_marketing: 0.17.1
matplotlib    : 3.10.8
pymc_extras   : 0.6.0
arviz         : 0.23.0

Watermark: 2.5.0