---
title: "Mixed logit and preference heterogeneity"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Mixed logit and preference heterogeneity}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(digits = 4)
```

The mixed (random-coefficients) logit lets tastes vary across people. Instead of
a single coefficient on each random attribute, choicer estimates a *distribution*
of coefficients. Substitution then reflects both observed covariates and the
estimated distribution of tastes. Estimation is by simulated maximum likelihood
using Halton draws, with the likelihood, gradient and Hessian evaluated in
parallel C++.

A useful way to see the mechanism is to write the mixed-logit probability as a
logit kernel averaged over the taste distribution,
$P_{ij} = \int L_{ij}(\beta)\, f(\beta)\, d\beta$, where
$L_{ij}(\beta) = \exp(X_{ij}\beta) / \sum_k \exp(X_{ik}\beta)$. *Conditional on a
draw* $\beta$, the kernel is an ordinary logit. The empirical content of the
mixed logit is in the averaging: people with different $\beta$'s place different
values on the same attributes, so demand leaving an alternative need not go to
the same destinations for all consumers. For counterfactual work, the distinction
between observed heterogeneity and the estimated mixing distribution is central:
flexibility that is not disciplined by the available variation is supplied by the
maintained distribution $f(\beta)$.

```{r setup}
library(choicer)
set_num_threads(2)
```

## Simulate correlated random coefficients

`simulate_mxl_data()` draws choices in which the coefficients on `w1` and `w2`
are themselves random and *correlated* across the population.

```{r sim}
sim <- simulate_mxl_data(N = 2000, J = 4, seed = 1)
sim
```

## Fit

A robust recipe for mixed logit: warm-start from a plain MNL, scale the
variables so the Hessian is well conditioned, and use enough Halton draws. Here
we estimate a full (correlated) covariance of the random coefficients.

```{r fit}
fit <- run_mxlogit(
  data            = sim$data,
  id_col          = "id",
  alt_col         = "alt",
  choice_col      = "choice",
  covariate_cols  = c("x1", "x2"),  # fixed coefficients
  random_var_cols = c("w1", "w2"),  # random coefficients
  rc_correlation  = TRUE,           # estimate their full covariance
  S               = 100L,           # Halton draws per person
  draws           = "generate",     # generate draws on the fly (low memory)
  seed            = 7L,
  scale_vars      = "sd",           # condition the Hessian across blocks
  se_method       = "bhhh"
)
summary(fit)
```

This vignette uses `se_method = "bhhh"` because the outer-product calculation is
fast and keeps package-build time short. For final empirical work, compare it
with the default analytical-Hessian standard errors; when the data come from a
choice-based or otherwise weighted sample, use `se_method = "sandwich"` so the
reported covariance is the robust WESML sandwich rather than the inverse weighted
Hessian.

> **Tip.** For real applications increase the number of draws (`S`) until your
> estimates are stable, and keep `scale_vars = "sd"`. If the solver struggles,
> pass an explicit `theta_init` (for example the MNL coefficients) and bounds on
> the Cholesky diagonal. See `inst/simulations/mxl_simulation.R` for a fully
> hardened example.

## Parameter recovery

```{r recovery}
recovery_table(fit, sim$true_params)
```

The `beta` rows are the fixed coefficients, the `sigma` rows describe the
covariance of the random coefficients (its Cholesky elements), and the `asc`
rows are the alternative-specific constants.

## Substitution through taste heterogeneity

With random coefficients, diversion is mediated by the distribution of tastes. If
the estimated mixing distribution captures economically meaningful heterogeneity,
people who value one alternative tend to value nearby substitutes on that latent
margin. Diversion can therefore depend on *which* attribute is changing, so
`diversion_ratios()` takes a `wrt_var`:

```{r diversion}
elasticities(fit, elast_var = "x2")
diversion_ratios(fit, wrt_var = "x2")
```

The rest of the toolkit — `predict()`, `wtp()`, `consumer_surplus()`, `blp()` —
behaves exactly as in the [getting-started vignette](choicer.html), integrating
over the distribution of tastes automatically.

## Identification and tails

The mixed logit is a genuine generalization of the MNL — if tastes are in fact
homogeneous, the estimator simply returns a near-zero variance and you are back
to a logit. The issue is not that random coefficients are intrinsically fragile.
The issue is identification: the additional substitution structure is carried by
the mixing distribution $f(\beta)$, and $f(\beta)$ is often hardest to pin down
where it matters most for welfare and diversion — in the tails.

Two consequences are worth keeping in front of you:

- **The tails drive the economics you report.** A lognormal price coefficient
  puts a slice of the population at near-zero price sensitivity, which can give a
  willingness-to-pay distribution with *no finite mean* and explosive welfare
  numbers. An unbounded normal coefficient implies a fraction of consumers with
  the *wrong sign* (who prefer paying more). These artifacts come from the
  assumed shape of $f$, not from the data, and the estimator will happily contort
  a tail to match an aggregate moment.

- **$f(\beta)$ is hard to identify and estimate in practice.** A single
  cross-section of choices — one decision per person, a fixed menu — carries
  little information about the spread of tastes. Reliable estimation typically
  needs **repeated choices from the same individual** (panel data), **substantial
  variation in choice sets or attributes across markets** (the BLP setting), or
  the rich, designed attribute variation of a stated-preference experiment.
  Without one of these, the random-coefficient variances are weakly identified
  and the estimates can be fragile.

Practical defenses follow directly: report sensitivity to the number of draws,
starting values, and distributional assumptions; compare substitution and welfare
under a simpler baseline; and prefer **bounded or sign-constrained mixing
distributions** (triangular, censored normal) so a tail cannot run away;
**estimate in WTP-space** (Train & Weeks, 2005) to sidestep the ratio-of-normals
pathology; or use a **latent-class** specification when a handful of discrete
types is more credible than a continuous distribution. None of these is free —
each trades one restriction for another — but each puts the restriction in a
place you can defend. The broader tradeoff is laid out in
[Choosing among choice models](choicer.html#choosing-among-choice-models).