625 — Logistic regression

Step 1 of 16

Binary problems

We start with a simple binary dataset: study hours (x) mapped to pass/fail labels (0/1). Logistic regression is built for this kind of yes/no prediction.

Check understanding

What kind of target does logistic regression model?

  1. Binary 0/1 labels
  2. Any real number
  3. Only categories with 3+ classes
Step 2 of 16

Why regression will not work

A straight regression line can output any value, including numbers below 0 or above 1. Those numbers cannot be interpreted as probabilities.

Check understanding

Why is plain regression unsuitable here?

  1. It predicts outside 0–1
  2. It is too slow
  3. It cannot use gradients
Step 3 of 16

Probabilities between 0 and 1

Instead of raw scores, we want a probability between 0 and 1 for each input. That probability is later converted to a class decision.

Step 4 of 16

The sigmoid curve

The sigmoid squashes any input into the 0–1 range. Its S-shape makes it perfect for turning linear combinations into probabilities.

Step 5 of 16

Logistic model equation

Our model predicts p = 1 / (1 + exp(-(w₀ + w₁ × x))). The weights w₀ and w₁ shift and tilt the curve.

Check understanding

What does w₁ control in the logistic model?

  1. Curve slope
  2. Vertical shift
  3. Random noise
Step 6 of 16

Initial untrained model

With weights near zero, the model predicts ~0.5 for every x. That makes poor decisions because it ignores the structure of the data.

Step 7 of 16

Loss function (cross-entropy)

Cross-entropy penalises confident wrong predictions heavily. This pushes the model to assign high probability to the correct class.

Check understanding

What happens to loss when the model is confidently wrong?

  1. Loss increases a lot
  2. Loss stays small
  3. Loss becomes zero
Step 8 of 16

Gradient descent updates

Gradients show how to adjust w₀ and w₁ to reduce loss. Each update nudges the sigmoid to better separate the classes.

Step 9 of 16

Training the model

After many updates, the sigmoid aligns with the data: low x predicts fail (0), high x predicts pass (1).

Step 10 of 16

Decision boundary

The decision boundary is the x where p = 0.5. Points to the right are classified as 1; to the left as 0.

Check understanding

What probability marks the decision boundary?

  1. 0.5
  2. 0.0
  3. 1.0
Step 11 of 16

Inspect probabilities

By sampling different x-values you can see how the predicted probability climbs smoothly from 0 toward 1.

Step 12 of 16

Probability to class

To classify, we threshold the probability: if p ≥ 0.5, predict class 1; otherwise predict class 0.

Check understanding

How do we convert probability to a class?

  1. Threshold at 0.5
  2. Always choose 1
  3. Always choose 0
Step 13 of 16

Model evaluation

Accuracy tells us what fraction of labels we got right. The confusion matrix counts correct positives/negatives and mistakes.

Step 14 of 16

Poor-fit dataset

On an XOR-style dataset, labels alternate along x. A single sigmoid can only make one boundary, so it misclassifies alternating sections.

Step 15 of 16

Why it fails

Because logistic regression uses one boundary, it cannot follow multiple flips between 0 and 1. Misclassifications stay even after training.

Check understanding

Why does one sigmoid fail on XOR?

  1. Only one boundary
  2. Too many parameters
  3. No gradient available
Step 16 of 16

Next stop: kNN

To handle shapes with multiple boundaries we need models like kNN that adapt locally instead of forcing a single global split.