This is a question I’ve had for a long time as I have begun my data science journey. I have heard these terms used interchangeably, yet, I have never been sure of what the difference is. In this post, I go through what each term means.

Logistic Regression

  • Logistic regression models the probability that an observation belong to a particular category. This is different from linear regression, which models y, the dependent variable, on a continuous scale.

  • Logistic regression is used for classification problems, such as, given a set of characteristics, what is the probability of an event happening? Specifically, logistic regression can be used for many marketing problems, such as determining whether or not a customer will purchase. It is also frequently used in banking, to determine the probability a customer will default, given characteristics such as credit score and income.

Logit

  • If one uses a linear regression model to categorize data, one can get a negative number on the left-hand side. This is nonsensical, as probability must be between 0 and 1. Therefore, one must use a function that forces the left-hand side to fall between 0 and 1. This is the logistic function:

Screen Shot 2020-07-26 at 3.31.54 PM.png
  • One can see from the function that with some manipulation, one can achieve the same right-hand side as one would get using a linear regression. The left-hand side is called the ‘logit’ or log-odds.

  • The graph of the function, called a sigmoid function, is below:

The sigmoid function

The sigmoid function

  • Hence, while related, a logistic regression is not the same as logit! Logit is the prediction resulting from manipulation of a logistic model.

Comment