It’s often the case in an A/B test that covariates are added to a model in order to reduce variance, improve the precision of estimates, or look for conditional effects. However, often this either relies on the assumption that the covariate effects are linear, or uses unwieldy basis expansions like polynomials to account for nonlinear relationships. In this post I show how to use generalized additive models (GAMs) to account for nonlinearities in the relationships between covariates and outcome measures. I use data from a randomized A/B test that looked for differences in profit between two groups of businesses. The effect is small and there’s a lot of variance in the data, but this sort of messy data is exactly what we often see in applications, and it’s exactly where the easy gains in variance explained by nonlinearities are most useful.
Something I’ve been trying to understand better over the pandemic has been the algorithms and mathematics that underpin commonly used statistical models. I’ve been reading a lot of great books. At the moment, I’m reading Wood’s book on Generalized Additive Models. He has a great chapter on mixed models. I see these models used a lot, but in many contexts there isn’t much discussion of what’s going on under the hood of whatever software package computed the model. So in this post, I’ll describe the logic of fitting these models with maximum likelihood estimation and write a function that does so in Python.