Lasso regression (lasso)¶
The lasso (least absolute shrinkage and selection operator) is a regularized version of least squares regression. It minimizes the sum of squared errors while also penalizing the norm (sum of absolute values) of the coefficients.
Concretely, the function that is minimized in Orange is:
Where is a data matrix, the vector of class values and the regression coefficients to be estimated.
- class Orange.regression.lasso.LassoRegressionLearner(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)¶
Bases: Orange.regression.base.BaseRegressionLearner
Fits the lasso regression model using FISTA (Fast Iterative Shrinkage-Thresholding Algorithm).
- __call__(data, weight=None)¶
Parameters: - data (Orange.data.Table) – Training data.
- weight – Weights for instances. Not implemented yet.
- __init__(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)¶
Parameters: - lasso_lambda (float) – Regularization parameter.
- max_iter (int) – Maximum number of iterations for the optimization method.
- eps (float) – Stop optimization when improvements are lower than eps.
- n_boot (int) – Number of bootstrap samples used for non-parametric estimation of standard errors.
- n_perm (int) – Number of permuations used for non-parametric estimation of p-values.
- name (str) – Learner name.
- fista(X, y, l, lipschitz, w_init=None)¶
Fast Iterative Shrinkage-Thresholding Algorithm (FISTA).
- get_lipschitz(X)¶
Return the Lipschitz constant of , where .
- class Orange.regression.lasso.LassoRegression(domain=None, class_var=None, coef0=None, coefficients=None, std_errors=None, p_vals=None, model=None, mu_x=None)¶
Bases: Orange.classification.Classifier
Lasso regression predicts the value of the response variable based on the values of independent variables.
- coef0¶
Intercept (sample mean of the response variable).
- coefficients¶
Regression coefficients.
- std_errors¶
Standard errors of coefficient estimates for a fixed regularization parameter. The standard errors are estimated using the bootstrapping method.
- p_vals¶
List of p-values for the null hypotheses that the regression coefficients equal 0 based on a non-parametric permutation test.
- model¶
Dictionary with the statistical properties of the model: Keys - names of the independent variables Values - tuples (coefficient, standard error, p-value)
- mu_x¶
Sample mean of independent variables.
- __call__(instance, result_type=0)¶
Parameters: instance (Orange.data.Instance) – Data instance for which the value of the response variable will be predicted.
- to_string(skip_zero=True)¶
Pretty-prints a lasso regression model, i.e. estimated regression coefficients with standard errors and significances. Standard errors are obtained using the bootstrapping method and significances by a permuation test.
Parameters: skip_zero (bool) – If True, variables with estimated coefficient equal to 0 are omitted.
Utility functions¶
- Orange.regression.lasso.get_bootstrap_sample(data)¶
Generate a bootstrap sample of a given data set.
Parameters: data (Orange.data.Table) – the original data sample
- Orange.regression.lasso.permute_responses(data)¶
Permute values of the class (response) variable. The independence between independent variables and the response is obtained but the distribution of the response variable is kept.
Parameters: data (Orange.data.Table) – Original data.
Examples¶
To fit the regression parameters on housing data set use the following code:
housing = Orange.data.Table("housing")
learner = Orange.regression.lasso.LassoRegressionLearner(
lasso_lambda=1, n_boot=100, n_perm=100)
To predict values of the response for the first five instances:
for ins in housing[:5]:
print "Actual: %3.2f, predicted: %3.2f" % (
Output:
Actual: 24.00, predicted: 30.45
Actual: 21.60, predicted: 25.60
Actual: 34.70, predicted: 31.48
Actual: 33.40, predicted: 30.18
Actual: 36.20, predicted: 29.59
To see the fitted regression coefficients, print the model:
print classifier
Output:
Variable Coeff Est Std Error p
Intercept 22.533
CRIM -0.023 0.024 0.050 .
CHAS 1.970 1.331 0.040 *
NOX -4.226 2.944 0.010 *
RM 4.270 0.934 0.000 ***
DIS -0.373 0.170 0.010 *
PTRATIO -0.798 0.117 0.000 ***
B 0.007 0.003 0.020 *
LSTAT -0.519 0.102 0.000 ***
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1
For 5 variables the regression coefficient equals 0:
ZN, INDUS, AGE, RAD, TAX
Note that some of the regression coefficients are equal to 0.