--- title: "Mixed logit and preference heterogeneity" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Mixed logit and preference heterogeneity} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(digits = 4) ``` The mixed (random-coefficients) logit lets tastes vary across people. Instead of a single coefficient on each random attribute, choicer estimates a *distribution* of coefficients. Substitution then reflects both observed covariates and the estimated distribution of tastes. Estimation is by simulated maximum likelihood using Halton draws, with the likelihood, gradient and Hessian evaluated in parallel C++. A useful way to see the mechanism is to write the mixed-logit probability as a logit kernel averaged over the taste distribution, $P_{ij} = \int L_{ij}(\beta)\, f(\beta)\, d\beta$, where $L_{ij}(\beta) = \exp(X_{ij}\beta) / \sum_k \exp(X_{ik}\beta)$. *Conditional on a draw* $\beta$, the kernel is an ordinary logit. The empirical content of the mixed logit is in the averaging: people with different $\beta$'s place different values on the same attributes, so demand leaving an alternative need not go to the same destinations for all consumers. For counterfactual work, the distinction between observed heterogeneity and the estimated mixing distribution is central: flexibility that is not disciplined by the available variation is supplied by the maintained distribution $f(\beta)$. ```{r setup} library(choicer) set_num_threads(2) ``` ## Simulate correlated random coefficients `simulate_mxl_data()` draws choices in which the coefficients on `w1` and `w2` are themselves random and *correlated* across the population. ```{r sim} sim <- simulate_mxl_data(N = 2000, J = 4, seed = 1) sim ``` ## Fit A robust recipe for mixed logit: warm-start from a plain MNL, scale the variables so the Hessian is well conditioned, and use enough Halton draws. Here we estimate a full (correlated) covariance of the random coefficients. ```{r fit} fit <- run_mxlogit( data = sim$data, id_col = "id", alt_col = "alt", choice_col = "choice", covariate_cols = c("x1", "x2"), # fixed coefficients random_var_cols = c("w1", "w2"), # random coefficients rc_correlation = TRUE, # estimate their full covariance S = 100L, # Halton draws per person draws = "generate", # generate draws on the fly (low memory) seed = 7L, scale_vars = "sd", # condition the Hessian across blocks se_method = "bhhh" ) summary(fit) ``` This vignette uses `se_method = "bhhh"` because the outer-product calculation is fast and keeps package-build time short. For final empirical work, compare it with the default analytical-Hessian standard errors; when the data come from a choice-based or otherwise weighted sample, use `se_method = "sandwich"` so the reported covariance is the robust WESML sandwich rather than the inverse weighted Hessian. > **Tip.** For real applications increase the number of draws (`S`) until your > estimates are stable, and keep `scale_vars = "sd"`. If the solver struggles, > pass an explicit `theta_init` (for example the MNL coefficients) and bounds on > the Cholesky diagonal. See `inst/simulations/mxl_simulation.R` for a fully > hardened example. ## Parameter recovery ```{r recovery} recovery_table(fit, sim$true_params) ``` The `beta` rows are the fixed coefficients, the `sigma` rows describe the covariance of the random coefficients (its Cholesky elements), and the `asc` rows are the alternative-specific constants. ## Substitution through taste heterogeneity With random coefficients, diversion is mediated by the distribution of tastes. If the estimated mixing distribution captures economically meaningful heterogeneity, people who value one alternative tend to value nearby substitutes on that latent margin. Diversion can therefore depend on *which* attribute is changing, so `diversion_ratios()` takes a `wrt_var`: ```{r diversion} elasticities(fit, elast_var = "x2") diversion_ratios(fit, wrt_var = "x2") ``` The rest of the toolkit — `predict()`, `wtp()`, `consumer_surplus()`, `blp()` — behaves exactly as in the [getting-started vignette](choicer.html), integrating over the distribution of tastes automatically. ## Identification and tails The mixed logit is a genuine generalization of the MNL — if tastes are in fact homogeneous, the estimator simply returns a near-zero variance and you are back to a logit. The issue is not that random coefficients are intrinsically fragile. The issue is identification: the additional substitution structure is carried by the mixing distribution $f(\beta)$, and $f(\beta)$ is often hardest to pin down where it matters most for welfare and diversion — in the tails. Two consequences are worth keeping in front of you: - **The tails drive the economics you report.** A lognormal price coefficient puts a slice of the population at near-zero price sensitivity, which can give a willingness-to-pay distribution with *no finite mean* and explosive welfare numbers. An unbounded normal coefficient implies a fraction of consumers with the *wrong sign* (who prefer paying more). These artifacts come from the assumed shape of $f$, not from the data, and the estimator will happily contort a tail to match an aggregate moment. - **$f(\beta)$ is hard to identify and estimate in practice.** A single cross-section of choices — one decision per person, a fixed menu — carries little information about the spread of tastes. Reliable estimation typically needs **repeated choices from the same individual** (panel data), **substantial variation in choice sets or attributes across markets** (the BLP setting), or the rich, designed attribute variation of a stated-preference experiment. Without one of these, the random-coefficient variances are weakly identified and the estimates can be fragile. Practical defenses follow directly: report sensitivity to the number of draws, starting values, and distributional assumptions; compare substitution and welfare under a simpler baseline; and prefer **bounded or sign-constrained mixing distributions** (triangular, censored normal) so a tail cannot run away; **estimate in WTP-space** (Train & Weeks, 2005) to sidestep the ratio-of-normals pathology; or use a **latent-class** specification when a handful of discrete types is more credible than a continuous distribution. None of these is free — each trades one restriction for another — but each puts the restriction in a place you can defend. The broader tradeoff is laid out in [Choosing among choice models](choicer.html#choosing-among-choice-models).