top of page

LEAST ANGLE REGRESSION

Writer's picture: madrasresearchorgmadrasresearchorg

Author : Aryan Tiwari

Least Angle Regression (LARS) is an algorithm used in regression for high dimensional data (i.e., data with a large number of attributes). Least Angle Regression is somewhat similar to forward stepwise regression. Since it is used with data that has lots of attributes, at each step, LARS finds the attribute which is most highly correlated to the target value. There may be more than one attribute that has the same correlation. In this scenario, LARS averages the attributes and proceeds in a direction that is at the same angle to the attributes. This is exactly why this algorithm is called Least Angle regression. Basically, LARS makes leaps in the most optimally calculated direction without overfitting the model.
 

Contents of the Blog:

  • Introduction to Least Angle Regression.

  • Least Angle Regression Geometric Representation.

  • Least Angle Regression Algorithm.

  • Advantages and Disadvantages of LARS.

  • Implementation of LARS in Python.

  • Conclusion.

Figure-1

Least Angle Regression (aka LARS) is a model selection method for linear regression (when you’re worried about overfitting or want your model to be easily interpretable). To motivate it, let’s consider some other model selection methods:

  • Forward selection starts with no variables in the model, and at each step it adds to the model the variable with the most explanatory power, stopping if the explanatory power falls below some threshold. This is a fast and simple method, but it can also be too greedy: we fully add variables at each step, so correlated predictors don’t get much of a chance to be included in the model. (For example, suppose we want to build a model for the deliciousness of a PB&J sandwich, and two of our variables are the amount of peanut butter and the amount of jelly. We’d like both variables to appear in our model, but since amount of peanut butter is (let’s assume) strongly correlated with the amount of jelly, once we fully add peanut butter to our model, jelly doesn’t add much explanatory power anymore, and so it’s unlikely to be added.)

  • Forward stagewise regression tries to remedy the greediness of forward selection by only partially adding variables. Whereas forward selection finds the variable with the most explanatory power and goes all out in adding it to the model, forward stagewise finds the variable with the most explanatory power and updates its weight by only epsilon in the correct direction. (So we might first increase the weight of peanut butter a little bit, then increase the weight of peanut butter again, then increase the weight of jelly, then increase the weight of bread, and then increase the weight of peanut butter once more.) The problem now is that we have to make a ton of updates, so forward stagewise can be very inefficient.

With a single input variable, this relationship is a line, and with higher dimensions, this relationship can be thought of as a hyperplane that connects the input variables to the target variable. The coefficients of the model are found via an optimization process that seeks to minimize the sum squared error between the predictions (yhat) and the expected target values (y).

  • loss = sum i=0 to n (y_i – yhat_i)^2

A problem with linear regression is that estimated coefficients of the model can become large, making the model sensitive to inputs and possibly unstable. This is particularly true for problems with few observations (samples) or more input predictors (p) than variables than samples (n) (so-called p >> n problems).

Figure-2

Least angle regression algorithm:

  • Start with all coefficients bj equal to zero.

  • Find the predictor xj most correlated with y.

  • Increase the coefficient bj in the direction of the sign of its correlation with y. Take residuals r=y-yhat along the way. Stop when some other predictor xk has as much correlation with r as xj has.

  • Increase (bj, bk) in their joint least squares direction, until some other predictor xm has as much correlation with the residual r.

  • Continue until: all predictors are in the model.

Figure-3

The Lasso is a constrained version of ordinary least squares (OLS). Let x1,x2, . . . ,xm be n-vectors representing the covariates, m = 10 and n = 442 in the diabetes study, and let y be the vector of responses for the n cases. By location and scale transformations we can always assume that the covariates have been standardized to have mean 0 and unit length, and that the response has mean 0, Xn i=1 yi = 0, Xn i=1 xij = 0, Xn i=1 x 2 (1.1) ij = 1 for j = 1, 2, . . . ,m. This is assumed to be the case in the theory which follows, except that numerical results are expressed in the original units of the diabetes example. A candidate vector of regression coefficients βb = ( bβ1, bβ2, . . ., bβm) ′ gives prediction vector µb, µb = Xm j=1 xj bβj = Xβb (1.2) [Xn×m = (x1,x2, . . .,xm)] with total squared error S(βb) = ky − µbk 2 = Xn i=1 (yi − µbi) 2 (1.3) . Let T(βb) be the absolute norm of βb, T(βb) = Xm j=1 | b (1.4) βj |.

In more detail, LARS works as follows:

  • Assume for simplicity that we’ve standardized our explanatory variables to have zero mean and unit variance, and that our response variable also has zero mean.

  • Start with no variables in your model.

  • Find the variable $ x_1 $ most correlated with the residual. (Note that the variable most correlated with the residual is equivalently the one that makes the least angle with the residual, whence the name.)

  • Move in the direction of this variable until some other variable $ x_2 $ is just as correlated.

  • At this point, start moving in a direction such that the residual stays equally correlated with $ x_1 $ and $ x_2 $ (i.e., so that the residual makes equal angles with both variables), and keep moving until some variable $ x_3 $ becomes equally correlated with our residual.

  • And so on, stopping when we’ve decided our model is big enough.

Figure-4

Advantages of using LARS:

  • Computationally as fast as forward selection but may sometimes be more accurate.

  • Numerically very efficient when the number of features is much larger than the number of data instances.

  • It can easily be modified to produce solutions for other estimators

Disadvantages of using LARS:

  • LARS is highly sensitive to noise and can produce unpredictable results sometimes.

Implementation of LARS in Python3:

Importing Necessary Libraries:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

Loading the Dataset:

data=pd.read_csv('../input/graduate-admissions/Admission_Predict.csv')
data

data.columns

Importing Least Angle Regression from sklearn:

from sklearn.linear_model import LassoLars

x = data.drop('Serial No.',axis=1)
y = data['Serial No.']

TRAINING AND TESTING DATA:

In [22]:

# Splitting training and testing data
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y,
                     test_size = 0.3, random_state = 30)

Creating and fitting the regressor:

regressor = LassoLars(alpha = 0.1)
regressor.fit(x_train, y_train)

Prediction of R2 Score and Accuracy:


y_pred = regressor.predict(x_test)
 
print(f"r2 Score of test set : {r2_score(y_test, y_pred)}")

regressor.score(x_test,y_test)

Conclusion :-

LAR is a state of art algorithm to solve L1 regularized linear regression or logistic regression. It offers unified explanation of forward selection, stagewise regression (boosting/gbm) and Lasso.

Github Link of Implementation :-

References:-

Recent Posts

See All

Comments


Madras Scientific  Research Foundation

About US

 

Contact

 

Blog

 

Internship

 

Join us 

Know How In Action 

bottom of page