API Documentation

Classes

class pyglmnet.GLM(distr='poisson', alpha=0.5, Tau=None, group=None, reg_lambda=None, solver='batch-gradient', learning_rate=0.2, max_iter=1000, tol=0.001, eta=2.0, score_metric='deviance', random_state=0, verbose=False)

Class for estimating regularized generalized linear models (GLM). The regularized GLM minimizes the penalized negative log likelihood:

\[\min_{\beta_0, \beta} \frac{1}{N} \sum_{i = 1}^N \mathcal{L} (y_i, \beta_0 + \beta^T x_i) + \lambda [ \frac{1}{2}(1 - \alpha) \mathcal{P}_2 + \alpha \mathcal{P}_1 ]\]

where \(\mathcal{P}_2\) and \(\mathcal{P}_1\) are the generalized L2 (Tikhonov) and generalized L1 (Group Lasso) penalties, given by:

\[\mathcal{P}_2 = \|\Gamma \beta \|_2^2 \ \mathcal{P}_1 = \sum_g \|\beta_{j,g}\|_2\]

where \(\Gamma\) is the Tikhonov matrix: a square factorization of the inverse covariance matrix and \(\beta_{j,g}\) is the \(j\) th coefficient of group \(g\).

The generalized L2 penalty defaults to the ridge penalty when \(\Gamma\) is identity.

The generalized L1 penalty defaults to the lasso penalty when each \(\beta\) belongs to its own group.

Parameters:

distr : str

distribution family can be one of the following ‘gaussian’ | ‘binomial’ | ‘poisson’ | ‘softplus’ | ‘multinomial’

default: ‘poisson’.

alpha : float

the weighting between L1 penalty and L2 penalty term of the loss function.

default: 0.5

Tau : array | None

the (n_features, n_features) Tikhonov matrix.

default: None, in which case Tau is identity and the L2 penalty is ridge-like

group : array | list | None

the (n_features, ) list or array of group identities for each parameter \(\beta\).

Each entry of the list/ array should contain an int from 1 to n_groups that specify group membership for each parameter (except \(\beta_0\)).

If you do not want to specify a group for a specific parameter, set it to zero.

default: None, in which case it defaults to L1 regularization

reg_lambda : array | list | None

array of regularized parameters \(\lambda\) of penalty term.

default: None, a list of 10 floats spaced logarithmically (base e) between 0.5 and 0.01.

solver : str

optimization method, can be one of the following ‘batch-gradient’ (vanilla batch gradient descent) ‘cdfast’ (Newton coordinate gradient descent).

default: ‘batch-gradient’

learning_rate : float

learning rate for gradient descent.

default: 2e-1

max_iter : int

maximum iterations for the model.

default: 1000

tol : float

convergence threshold or stopping criteria. Optimization loop will stop below setting threshold.

default: 1e-3

eta : float

a threshold parameter that linearizes the exp() function above eta.

default: 2.0

score_metric : str

specifies the scoring metric. one of either ‘deviance’ or ‘pseudo_R2’.

default: ‘deviance’

random_state : int

seed of the random number generator used to initialize the solution.

default: 0

verbose : boolean or int

default: False

Notes

To select subset of fitted glm models, you can simply do:

>>> glm = glm[1:3]
>>> glm[2].predict(X_test)
copy()

Return a copy of the object.

Parameters:

none:

Returns:

self: instance of GLM

A copy of the GLM instance.

fit(X, y)

The fit function.

Parameters:

X : array

The input data of shape (n_samples, n_features)

y : array

The target data

Returns:

self : instance of GLM

The fitted model.

fit_predict(X, y)

Fit the model and predict on the same data.

Parameters:

X : array

The input data to fit and predict, of shape (n_samples, n_features)

Returns:

yhat : array

The predicted targets of shape ([n_lambda], n_samples).

A 1D array if predicting on only one lambda (compatible with scikit-learn API).

Otherwise, returns a 2D array.

predict(X)

Predict targets.

Parameters:

X : array

Input data for prediction, of shape (n_samples, n_features)

Returns:

yhat : array

The predicted targets of shape ([n_lambda], n_samples)

A 1D array if predicting on only one reg_lambda (compatible with scikit-learn API).

Otherwise, returns a 2D array.

predict_proba(X)

Predict class probability for multinomial

Parameters:

X : array

Input data for prediction, of shape (n_samples, n_features)

Returns:

yhat : array

The predicted targets of shape ([n_lambda], n_samples, n_classes).

A 2D array if predicting on only one lambda (compatible with scikit-learn API).

Otherwise, returns a 3D array.

Raises:

Works only for the multinomial distribution.

Raises error otherwise.

score(X, y)

Score the model.

Parameters:

X : array

The input data whose prediction will be scored, of shape (n_samples, n_features).

y : array

The true targets against which to score the predicted targets, of shape (n_samples, [n_classes]).

Returns:

score: array

array when score is called by a list of estimators: glm.score()

singleton array when score is called by a sliced estimator: glm[0].score()

Note that if you want compatibility with sciki-learn’s pipeline(), cross_val_score(), or GridSearchCV() then you should only pass sliced estimators:

from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import cross_val_score
grid = GridSearchCV(glm[0])
grid = cross_val_score(glm[0], X, y, cv=10)
simulate(beta0, beta, X)

Simulate target data under a generative model.

Parameters:

X: array

design matrix of shape (n_samples, n_features)

beta0: float

intercept coefficient

beta: array

coefficients of shape (n_features, 1)

Returns:

y: array

simulated target data of shape (n_samples, 1)