API Documentation¶
Classes¶
-
class
pyglmnet.
GLM
(distr='poisson', alpha=0.5, Tau=None, group=None, reg_lambda=0.1, solver='batch-gradient', learning_rate=0.2, max_iter=1000, tol=1e-06, eta=2.0, score_metric='deviance', fit_intercept=True, random_state=0, callback=None, verbose=False)[source]¶ Class for estimating regularized generalized linear models (GLM). The regularized GLM minimizes the penalized negative log likelihood:
\[\min_{\beta_0, \beta} \frac{1}{N} \sum_{i = 1}^N \mathcal{L} (y_i, \beta_0 + \beta^T x_i) + \lambda [ \frac{1}{2}(1 - \alpha) \mathcal{P}_2 + \alpha \mathcal{P}_1 ]\]where \(\mathcal{P}_2\) and \(\mathcal{P}_1\) are the generalized L2 (Tikhonov) and generalized L1 (Group Lasso) penalties, given by:
\[\mathcal{P}_2 = \|\Gamma \beta \|_2^2 \ \mathcal{P}_1 = \sum_g \|\beta_{j,g}\|_2\]where \(\Gamma\) is the Tikhonov matrix: a square factorization of the inverse covariance matrix and \(\beta_{j,g}\) is the \(j\) th coefficient of group \(g\).
The generalized L2 penalty defaults to the ridge penalty when \(\Gamma\) is identity.
The generalized L1 penalty defaults to the lasso penalty when each \(\beta\) belongs to its own group.
- Parameters
- distr: str
distribution family can be one of the following ‘gaussian’ | ‘binomial’ | ‘poisson’ | ‘softplus’ | ‘probit’ | ‘gamma’ default: ‘poisson’.
- alpha: float
the weighting between L1 penalty and L2 penalty term of the loss function. default: 0.5
- Tau: array | None
the (n_features, n_features) Tikhonov matrix. default: None, in which case Tau is identity and the L2 penalty is ridge-like
- group: array | list | None
the (n_features, ) list or array of group identities for each parameter \(\beta\). Each entry of the list/ array should contain an int from 1 to n_groups that specify group membership for each parameter (except \(\beta_0\)). If you do not want to specify a group for a specific parameter, set it to zero. default: None, in which case it defaults to L1 regularization
- reg_lambda: float
regularization parameter \(\lambda\) of penalty term. default: 0.1
- solver: str
optimization method, can be one of the following ‘batch-gradient’ (vanilla batch gradient descent) ‘cdfast’ (Newton coordinate gradient descent). default: ‘batch-gradient’
- learning_rate: float
learning rate for gradient descent. default: 2e-1
- max_iter: int
maximum iterations for the model. default: 1000
- tol: float
convergence threshold or stopping criteria. Optimization loop will stop when norm(gradient) is below the threshold. default: 1e-6
- eta: float
a threshold parameter that linearizes the exp() function above eta. default: 2.0
- score_metric: str
specifies the scoring metric. one of either ‘deviance’ or ‘pseudo_R2’. default: ‘deviance’
- fit_intercept: boolean
specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. default: True
- random_stateint
seed of the random number generator used to initialize the solution. default: 0
- verbose: boolean or int
default: False
Examples
>>> import numpy as np >>> random_state = 1 >>> n_samples, n_features = 100, 4 >>> rng = np.random.RandomState(random_state) >>> X = rng.normal(0, 1, (n_samples, n_features)) >>> y = 2.2 * X[:, 0] -1.0 * X[:, 1] + 0.3 * X[:, 3] + 1.0 >>> glm = GLM(distr='gaussian', verbose=False, random_state=random_state) >>> glm = glm.fit(X, y) >>> glm.beta0_ # The intercept 1.005380485553247 >>> glm.beta_ # The coefficients array([ 1.90216711, -0.78782533, -0. , 0.03227455]) >>> y_pred = glm.predict(X)
- Attributes
- beta0_: int
The intercept
- beta_: array, (n_features)
The learned betas
- n_iter_: int
The number of iterations
-
copy
(self)[source]¶ Return a copy of the object.
- Parameters
- none
- Returns
- self: instance of GLM
A copy of the GLM instance.
-
fit
(self, X, y)[source]¶ The fit function.
- Parameters
- X: array
The 2D input data of shape (n_samples, n_features)
- y: array
The 1D target data of shape (n_samples,)
- Returns
- self: instance of GLM
The fitted model.
-
fit_predict
(self, X, y)[source]¶ Fit the model and predict on the same data.
- Parameters
- X: array
The input data to fit and predict, of shape (n_samples, n_features)
- Returns
- yhat: array
The predicted targets of shape (n_samples,).
-
predict
(self, X)[source]¶ Predict targets.
- Parameters
- X: array
Input data for prediction, of shape (n_samples, n_features)
- Returns
- yhat: array
The predicted targets of shape (n_samples,)
-
class
pyglmnet.
GLMCV
(distr='poisson', alpha=0.5, Tau=None, group=None, reg_lambda=None, cv=10, solver='batch-gradient', learning_rate=0.2, max_iter=1000, tol=1e-06, eta=2.0, score_metric='deviance', fit_intercept=True, random_state=0, verbose=False)[source]¶ Class for estimating regularized generalized linear models (GLM) along a regularization path with warm restarts.
The regularized GLM minimizes the penalized negative log likelihood:
\[\min_{\beta_0, \beta} \frac{1}{N} \sum_{i = 1}^N \mathcal{L} (y_i, \beta_0 + \beta^T x_i) + \lambda [ \frac{1}{2}(1 - \alpha) \mathcal{P}_2 + \alpha \mathcal{P}_1 ]\]where \(\mathcal{P}_2\) and \(\mathcal{P}_1\) are the generalized L2 (Tikhonov) and generalized L1 (Group Lasso) penalties, given by:
\[\mathcal{P}_2 = \|\Gamma \beta \|_2^2 \ \mathcal{P}_1 = \sum_g \|\beta_{j,g}\|_2\]where \(\Gamma\) is the Tikhonov matrix: a square factorization of the inverse covariance matrix and \(\beta_{j,g}\) is the \(j\) th coefficient of group \(g\).
The generalized L2 penalty defaults to the ridge penalty when \(\Gamma\) is identity.
The generalized L1 penalty defaults to the lasso penalty when each \(\beta\) belongs to its own group.
- Parameters
- distr: str
distribution family can be one of the following ‘gaussian’ | ‘binomial’ | ‘poisson’ | ‘softplus’ | ‘probit’ | ‘gamma’ default: ‘poisson’.
- alpha: float
the weighting between L1 penalty and L2 penalty term of the loss function. default: 0.5
- Tau: array | None
the (n_features, n_features) Tikhonov matrix. default: None, in which case Tau is identity and the L2 penalty is ridge-like
- group: array | list | None
the (n_features, ) list or array of group identities for each parameter \(\beta\). Each entry of the list/ array should contain an int from 1 to n_groups that specify group membership for each parameter (except \(\beta_0\)). If you do not want to specify a group for a specific parameter, set it to zero. default: None, in which case it defaults to L1 regularization
- reg_lambda: array | list | None
array of regularized parameters \(\lambda\) of penalty term. default: None, a list of 10 floats spaced logarithmically (base e) between 0.5 and 0.01.
- cv: cross validation object (default 10)
Iterator for doing cross validation
- solver: str
optimization method, can be one of the following ‘batch-gradient’ (vanilla batch gradient descent) ‘cdfast’ (Newton coordinate gradient descent). default: ‘batch-gradient’
- learning_rate: float
learning rate for gradient descent. default: 2e-1
- max_iter: int
maximum iterations for the model. default: 1000
- tol: float
convergence threshold or stopping criteria. Optimization loop will stop when norm(gradient) is below the threshold. default: 1e-6
- eta: float
a threshold parameter that linearizes the exp() function above eta. default: 2.0
- score_metric: str
specifies the scoring metric. one of either ‘deviance’ or ‘pseudo_R2’. default: ‘deviance’
- fit_intercept: boolean
specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. default: True
- random_stateint
seed of the random number generator used to initialize the solution. default: 0
- verbose: boolean or int
default: False
Notes
To select subset of fitted glm models, you can simply do:
glm = glm[1:3] glm[2].predict(X_test)
- Attributes
- beta0_: int
The intercept
- beta_: array, (n_features)
The learned betas
- glm_: instance of GLM
The GLM object with best score
- reg_lambda_opt_: float
The reg_lambda parameter for best GLM object
- n_iter_: int
The number of iterations
-
copy
(self)[source]¶ Return a copy of the object.
- Parameters
- none
- Returns
- self: instance of GLM
A copy of the GLM instance.
-
fit
(self, X, y)[source]¶ The fit function.
- Parameters
- X: array
The input data of shape (n_samples, n_features)
- y: array
The target data
- Returns
- self: instance of GLM
The fitted model.
-
fit_predict
(self, X, y)[source]¶ Fit the model and predict on the same data.
- Parameters
- X: array
The input data to fit and predict, of shape (n_samples, n_features)
- Returns
- yhat: array
The predicted targets of shape based on the model with optimal reg_lambda (n_samples,)
-
predict
(self, X)[source]¶ Predict targets.
- Parameters
- X: array
Input data for prediction, of shape (n_samples, n_features)
- Returns
- yhat: array
The predicted targets of shape based on the model with optimal reg_lambda (n_samples,)
-
predict_proba
(self, X)[source]¶ Predict class probability for binomial.
- Parameters
- X: array
Input data for prediction, of shape (n_samples, n_features)
- Returns
- yhat: array
The predicted targets of shape (n_samples, ).
- Raises
- Works only for the binomial distribution.
- Raises error otherwise.
-
score
(self, X, y)[source]¶ Score the model.
- Parameters
- X: array
The input data whose prediction will be scored, of shape (n_samples, n_features).
- y: array
The true targets against which to score the predicted targets, of shape (n_samples,).
- Returns
- score: float
The score metric for the optimal reg_lambda