Machine LearningBeginner14 min readUpdated March 2025

Logistic Regression

Despite its name, Logistic Regression is a classification algorithm. It uses the sigmoid function to predict probabilities and is the go-to baseline for binary and multi-class classification problems.

What is Logistic Regression?

Logistic Regression is a supervised learning algorithm used for classification tasks. Unlike linear regression which predicts continuous values, logistic regression predicts the probability that an input belongs to a particular class.

For binary classification (two classes), the output is a probability between 0 and 1. If the probability exceeds a threshold (typically 0.5), the input is classified as class 1; otherwise class 0.

Common applications include: - Email spam detection - Disease diagnosis (positive/negative) - Credit risk assessment - Customer churn prediction

The Sigmoid Function

The key to logistic regression is the sigmoid (logistic) function, which maps any real number to a value between 0 and 1:

sigma(z) = 1 / (1 + e^(-z))

Where z = w1*x1 + w2*x2 + ... + b (the linear combination of inputs).

Properties of the sigmoid: - Output is always between 0 and 1 (interpretable as probability) - sigma(0) = 0.5 (the decision boundary) - Smooth and differentiable (required for gradient descent)

Binary Cross-Entropy Loss

Logistic regression uses Binary Cross-Entropy (also called Log Loss) as its cost function:

Loss = -(1/n) * sum[y*log(y_pred) + (1-y)*log(1-y_pred)]

This function: - Heavily penalizes confident wrong predictions (e.g., predicting 0.99 when true label is 0) - Is convex, guaranteeing a global minimum - Reduces to MSE for linear regression in the limit

Multi-Class Classification

For problems with more than two classes, logistic regression extends in two ways:

One-vs-Rest (OvR): Train one binary classifier per class. Each classifier predicts "is this class X or not?"
Softmax Regression (Multinomial): Extends sigmoid to multiple classes. Outputs a probability distribution over all classes that sums to 1.
Scikit-learn handles multi-class automatically with the multi_class parameter.
Softmax is the foundation of the output layer in neural network classifiers.

Implementing Logistic Regression in Python

Complete example with binary and multi-class classification:

python

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer, load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# ---- Binary Classification: Breast Cancer ----
data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling is important for logistic regression
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

model = LogisticRegression(max_iter=1000, C=1.0)  # C = 1/lambda (regularization)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Binary Classification (Breast Cancer)")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(classification_report(y_test, y_pred, target_names=data.target_names))

# ---- Multi-Class: Iris ----
iris = load_iris()
X2, y2 = iris.data, iris.target
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.2, random_state=42)

multi_model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=200)
multi_model.fit(X2_train, y2_train)
print(f"\nIris Multi-Class Accuracy: {multi_model.score(X2_test, y2_test):.4f}")

# ---- Predict probabilities ----
probs = model.predict_proba(X_test[:3])
print(f"\nProbabilities for first 3 test samples:\n{probs}")

Decision Boundary and Threshold Tuning

The default decision threshold is 0.5, but this can be adjusted based on the problem:

High precision needed (e.g., spam filter - avoid false positives): Raise threshold to 0.7+ High recall needed (e.g., cancer screening - avoid false negatives): Lower threshold to 0.3

The ROC curve and AUC score help evaluate model performance across all thresholds. An AUC of 1.0 is perfect; 0.5 is random guessing.

Key Takeaways

Logistic regression is a classification algorithm that outputs probabilities via the sigmoid function.
It is trained by minimizing Binary Cross-Entropy loss using gradient descent.
Feature scaling (StandardScaler) significantly improves convergence and performance.
The decision threshold (default 0.5) can be tuned to balance precision and recall.
Despite its simplicity, logistic regression is a strong baseline and highly interpretable.

Contact Us

Have a question or feedback? Fill out the form below or reach us directly at support@nvaitraining.com