top of page

Logistic Regression

Logistic Regression Theory:

Logistic Regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Unlike linear regression, which is used for estimating continuous variables, logistic regression is used for estimating discrete outcomes. It is used for classification problems, where the target variable can take on one of two or more discrete values.


The main idea behind logistic regression is to find the best fitting model to describe the relationship between the independent variables and the dependent variable, where the dependent variable is binary (e.g., 0 or 1, yes or no, true or false). The model is represented by a logistic function that maps the inputs to a probability between 0 and 1, where 0 represents an unlikely event and 1 represents a certain event.

The logistic regression model is trained using a maximum likelihood estimation method, which finds the parameters that maximize the likelihood of observing the training data given the model. The predicted probabilities are then transformed into binary class labels using a threshold, typically set to 0.5.

Logistic Regression is a simple yet powerful technique for binary classification problems, and it can also be extended to handle multiclass classification problems by training one logistic regression model per class. It can also handle non-linear relationships between the independent variables and the dependent variable by transforming the inputs or adding polynomial features.

Yes, here are some examples of when logistic regression might be used:

  1. Credit risk analysis: Given information about an applicant's income, employment history, and credit score, a logistic regression model can be used to predict the likelihood of the applicant defaulting on a loan.

  2. Email spam classification: Given the contents of an email, a logistic regression model can be used to predict whether the email is spam or not.

  3. Cancer diagnosis: Given information about a patient's medical history and test results, a logistic regression model can be used to predict the likelihood of the patient having a particular type of cancer.

  4. Customer churn prediction: Given information about a customer's usage patterns and demographics, a logistic regression model can be used to predict the likelihood of the customer cancelling their subscription.


These are just a few examples, but logistic regression can be applied to many other domains where a binary outcome needs to be predicted based on a set of independent variables.


Here's an example implementation of Logistic Regression in Python using the scikit-learn library:

import numpy as np
from sklearn.linear_model import LogisticRegression


# Training data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([0, 0, 0, 1, 1, 1])


# Create a Logistic Regression object
clf = LogisticRegression()


# Train the model, y)


# Predict using the model
pred = clf.predict([[-0.8, -1]])


In this example, we have training data X and y representing the independent and dependent variables respectively. The LogisticRegression class from the scikit-learn library is used to fit a Logistic Regression model to the data. Finally, the model is used to make a prediction for a new input value [-0.8, -1]. The predicted value is [0], which indicates that the input belongs to class 0.

bottom of page