Linear and logistic regression analysis

Apr 18, 20214 min read

Updated: Aug 26, 2021

Authors: Rimpa Poria, Abhinav kale

The research paper titled “Linear and logistic regression analysis” provides the way through which the artificially intelligent systems can learn analysis representations. Linear Regression and Logistic Regression are the two famous Machine Learning Algorithms that come under supervised learning technique. Since both the algorithms are of supervised in nature hence these algorithms use labeled dataset to make the predictions. But the main difference between them is how they are being used. The Linear Regression is used for solving Regression problems whereas Logistic Regression is used for solving the Classification problems.

Linear Regression

Linear Regression is a commonly used supervised Machine Learning algorithm that predicts continuous values. It assumes that there is a linear relationship present between dependent and independent variables. Regression models are target prediction value based on independent variables.

Figure_1

Advantages:

Linear Regression is simple to implement and easier to interpret the output coefficients.
When the relationship between the independent and dependent variable have a linear relationship then this algorithm is the best to use because of its less complexity to compared to other algorithms

Disadvantages:

In linear regression technique outliers can have huge effects on the regression and boundaries are linear in this technique.

Logistic Regression

Logistic Regression is another supervised Machine Learning algorithm that helps fundamentally in binary classification (separating discreet values). It supports categorizing data into discrete classes by studying the relationship from a given set of labeled data.

Logistic regression can be binomial, ordinal or multinomial. Binomial or binary logistic regression deals with situations in which the observed outcome for a dependent variable can have only two possible types, "0" and "1" (which may represent, for example, "dead" vs. "alive" or "win" vs. "loss"). Multinomial logistic regression deals with situations where the outcome can have three or more possible types (e.g., "disease A" vs. "disease B" vs. "disease C") that are not ordered. Ordinal logistic regression deals with dependent variables that are ordered.

In binary logistic regression, the outcome is usually coded as "0" or "1", as this leads to the most straightforward interpretation.[15] If a particular observed outcome for the dependent variable is the noteworthy possible outcome (referred to as a "success" or an "instance" or a "case") it is usually coded as "1" and the contrary outcome (referred to as a "failure" or a "noninstance" or a "noncase") as "0". Binary logistic regression is used to predict the odds of being a case based on the values of the independent variables (predictors). The odds are defined as the probability that a particular outcome is a case divided by the probability that it is a noninstance.

Figure_2

Advantages:

Logistic regression is easier to implement, interpret, and very efficient to train.
It is very fast at classifying unknown records
It can easily extend to multiple classes (multinomial regression) and a natural probabilistic view of class predictions.

Disadvantages:

If the number of observations is lesser than the number of features, Logistic Regression should not be used; otherwise, it may lead to overfitting.
It constructs linear boundaries

Simple Linear Regression Model:

Simple Linear Regression is a type of Regression algorithms that models the relationship between a dependent variable and a single independent variable. The relationship shown by a Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple Linear Regression.

The key point in Simple Linear Regression is that the dependent variable must be a continuous/real value. However, the independent variable can be measured on continuous or categorical values.

Simple Linear regression algorithm has mainly two objectives:

Model the relationship between the two variables. Such as the relationship between income and expenditure, experience and Salary, etc.
Forecasting new observations. Such as Weather forecasting according to temperature, Revenue of a company according to the investments in a year, etc.

Simple logistic regression analysis refers to the regression application with one dichotomous outcome and one independent variable.

Step 1: Data Pre-processing step

Step 2: Fitting Logistic Regression to the Training set

Step 3: Predicting the test result

Step 4: Test accuracy of the result (Creation of Confusion matrix)

Step 5: Visualizing the test set result.

Multiple linear regression analysis:

In the previous topic, we have learned about Simple Linear Regression, where a single Independent/Predictor(X) variable is used to model the response variable (Y). But there may be various cases in which the response variable is affected by more than one predictor variable; for such cases, the Multiple Linear Regression algorithm is used.

Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one predictor variable to predict the response variable.

Some key points about MLR:

For MLR, the dependent or target variable(Y) must be the continuous/real, but the predictor or independent variable may be of continuous or categorical form.
Each feature variable must model the linear relationship with the dependent variable.
MLR tries to fit a regression line through a multidimensional space of data-points.

MLR equation:

In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple predictor variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so the same is applied for the multiple linear regression equation, the equation becomes:

Y= b0+b1x1+ b2x2+ b3x3+...... bnxn ............... (a)

Where,

Y= Output/Response variable
b0, b1, b2, b3 , bn....= Coefficients of the model.
x1, x2, x3, x4,...= Various Independent/feature variable

Assumptions for Multiple Linear Regression:

A linear relationship should exist between the Target and predictor variables.
The regression residuals must be normally distributed.
MLR assumes little or no multicollinearity (correlation between the independent variable) in data.

Step 1: Data Pre Processing

Importing The Libraries.
Importing the Data Set.
Encoding the Categorical Data.
Avoiding the Dummy Variable Trap.
Splitting the Data set into Training Set and Test Set.

Step 2: Fitting Multiple Linear Regression to the Training set Step 3: Predicting the Test set results.

Below is the GitHub link provided in which the related examples of both linear and logistic regression analysis have been performed.

GitHub Link:

https://github.com/rimpaporia/Linear-logisticRegressionAnalysis/blob/main/Logistic%20Regression.ipynb

References:

https://en.wikipedia.org/wiki/Regression_analysis

Madras Scientific Research Foundation

Linear and logistic regression analysis

Linear Regression

Advantages:

Disadvantages:

Logistic Regression

Advantages:

Disadvantages:

Simple Linear Regression Model:

Multiple linear regression analysis:

Some key points about MLR:

MLR equation:

Assumptions for Multiple Linear Regression:

GitHub Link:

References:

Recent Posts

Comments