--- title: "Multinomial logit" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Multinomial logit} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(digits = 4) ``` The multinomial logit (MNL) is the workhorse of discrete choice. choicer fits it by maximum likelihood with a C++ core, analytical gradients and an analytical Hessian, so estimation and standard errors are fast even with many alternative-specific constants. This vignette does two things: it shows the full MNL workflow, and — because the data are simulated from a known process — it checks that choicer **recovers the true parameters**. ```{r setup} library(choicer) set_num_threads(2) ``` ## Simulate a known data-generating process `simulate_mnl_data()` draws choices from a logit model with i.i.d. Gumbel errors. The returned object carries both the data and the true parameters. ```{r sim} sim <- simulate_mnl_data(N = 2000, J = 4, seed = 1) sim ``` ## Fit ```{r fit} fit <- run_mnlogit( data = sim$data, id_col = "id", alt_col = "alt", choice_col = "choice", covariate_cols = c("x1", "x2") ) summary(fit) ``` ## Did we recover the truth? `recovery_table()` lines up each estimate against the value that generated the data, with the bias, a z-score, and whether the 95% confidence interval covers the truth. ```{r recovery} recovery_table(fit, sim$true_params) ``` In this run the intervals cover their true values, which is indicative of correct behavior. What matters formally, though, is coverage over repeated simulations; a full Monte Carlo exercise is left outside this vignette for brevity. ## Post-estimation The full demand and welfare toolkit is available on the fitted object. Treating `x2` as price: ```{r post} predict(fit, type = "shares") # predicted market shares elasticities(fit, elast_var = "x2") # own- and cross-price elasticities diversion_ratios(fit) # where demand goes wtp(fit, price_var = "x2") # willingness to pay, with delta-method SEs gof(fit) # McFadden R2 and hit rate ``` ## Substitution restrictions The MNL is useful partly because its substitution structure is transparent. Conditional on the included covariates, the odds ratio $P_{ij}/P_{ik} = \exp(V_{ij} - V_{ik})$ depends only on alternatives $j$ and $k$. This is the familiar individual-level IIA implication, but the more useful empirical statement is about counterfactual diversion: for a given decision maker, demand leaving one alternative is reallocated across the remaining alternatives in proportion to their fitted probabilities. Aggregate substitution can be less mechanical than that statement suggests. The elasticities and diversion ratios reported above are averages over choice situations. When covariates, demographics or choice sets vary across situations, the aggregate diversion matrix need not equal the simple market-share formula $DR(j\to k) = s_k / (1 - s_j)$. The [math companion](multinomial_logit_math.html) gives the derivation. The remaining restriction is substantive: all heterogeneity that matters for substitution must be observed and included in the utility index. If closeness is driven by unobserved tastes, product groupings, networks, peer groups or other latent features, the MNL will not recover that margin. It is then not enough to fit shares well; the model may still give the wrong diversion matrix for the counterfactual of interest. That makes the MNL a disciplined baseline, not a straw man. Use it when the included variables carry the relevant substitution margin, or when the target is an object that does not require richer unobserved structure. Move to [nested logit](nl.html), [mixed logit](mxl.html), or [multinomial probit](mnp.html) when the empirical question requires grouped substitution, random tastes or correlated utility shocks. The [getting-started vignette](choicer.html#choosing-among-choice-models) compares those choices directly.