Python implementation of regularized generalized linear models

Pyglmnet is a Python 3.5+ library implementing generalized linear models (GLMs) with advanced regularization options. It provides a wide range of noise models (with paired canonical link functions) including gaussian, binomial, probit, gamma, poisson, and softplus. It supports a wide range of regularizers: ridge, lasso, elastic net, group lasso, and Tikhonov regularization.

[Repository] [Documentation (stable release)] [Documentation (development version)]

A brief introduction to GLMs

For linear models specified as

\[y = \beta_0 + X\beta + \epsilon.\]

The parameters \(\beta_0, \beta\) are estimated using ordinary least squares, under the implicit assumption that \(y\) is normally distributed.

Generalized linear models allow us to generalize this approach to point-wise nonlinearities \(q(\cdot)\) and corresponding exponential family noise distributions for \(\epsilon\).

\[y = q(\beta_0 + X\beta) + \epsilon\]

Regularized GLMs are estimated by minimizing a loss function specified by the penalized negative log-likelihood. The elastic net penalty interpolates between the L2 and L1 norm. We solve the following optimization problem:

\[\min_{\beta_0, \beta} \frac{1}{N} \sum_{i = 1}^N \mathcal{L} (y_i, \beta_0 + \beta^T x_i) + \lambda [ \frac{1}{2}(1 - \alpha) \mathcal{P}_2 + \alpha \mathcal{P}_1 ]\]

where \(\mathcal{P}_2\) and \(\mathcal{P}_1\) are the generalized L2 (Tikhonov) and generalized L1 (Group Lasso) penalties, given by:

\[\begin{split}\mathcal{P}_2 & = & \|\Gamma \beta \|_2^2 \\ \mathcal{P}_1 & = & \sum_g \|\beta_{j,g}\|_2\end{split}\]

where \(\Gamma\) is the Tikhonov matrix: a square factorization of the inverse covariance matrix and \(\beta_{j,g}\) is the \(j\) th coefficient of group \(g\).

Questions / Errors / Bugs

If you have questions about the code or find errors or bugs, please report it here. For more specific questions, feel free to email us directly.