624 — Polynomial regression

Step 1 of 16

Western Sydney scatter

These points show floor area versus price for Western Sydney homes. Instead of a clean straight-line trend, the cloud bends upward and spreads out, hinting that a simple linear model will struggle.

Check understanding

What pattern do you see in the scatter?

  1. Linear trend only
  2. Curved trend visible
  3. No trend
Step 2 of 16

Fit linear model

A straight regression line is laid over the data. It cuts through the middle but misses the curvature, so its predictions are biased in the low and high ends.

Check understanding

How does the linear fit summarise the curved pattern?

  1. Good fit
  2. Moderate fit
  3. Poor fit
Step 3 of 16

Fit polynomial (degree 2)

A quadratic curve bends upward to follow the middle of the cloud. It captures the lift in mid-range floor areas that a straight line ignores.

Check understanding

How does the quadratic compare to the linear fit?

  1. Better than linear
  2. Same as linear
  3. Worse
Step 4 of 16

Fit polynomial (degree 3)

A cubic curve adds another bend. Here it only tweaks the shape slightly, which might not justify the extra complexity.

Step 5 of 16

Compare fits

See the linear, quadratic, and cubic fits together. The quadratic hugs the data best; the linear underfits; the cubic adds tiny wiggles.

Check understanding

Which model explains the curved pattern best here?

  1. Quadratic best
  2. Linear best
  3. Cubic clearly superior
Step 6 of 16

Predictions using quadratic

To predict price from floor area, trace along the quadratic curve. Predictions come from the curve, not just an average.

Check understanding

Where do these predictions come from?

  1. Based on curve
  2. Based on line
  3. Based on average
Step 7 of 16

Residuals (linear)

Residuals from the linear model show a clear curve: the line underestimates mid-range prices and overestimates at the edges.

Check understanding

What pattern do you see in the linear residuals?

  1. Patterned
  2. Random
  3. Zero
Step 8 of 16

Residuals (quadratic)

Residuals from the quadratic model look more random, signalling that the curve has removed most of the systematic bend.

Check understanding

How do the quadratic residuals look compared to linear?

  1. More random
  2. More patterned
  3. Residuals larger
Step 9 of 16

MSE comparison

Compare mean squared error for linear, quadratic, and cubic fits. Lower MSE indicates tighter predictions.

Check understanding

Which model has the lowest MSE here?

  1. Quadratic lowest
  2. Linear lowest
  3. All equal
Step 10 of 16

R² comparison

R² increases when curvature explains more variance. Compare how much variance each model explains.

Check understanding

How does adding curvature affect R²?

  1. Polynomial ↑ R²
  2. Polynomial ↓ R²
  3. No change
Step 11 of 16

Overfitting warning

The cubic curve starts to chase noise at the edges. Small wiggles signal overfitting when the data do not support the extra bend.

Step 12 of 16

New dataset: approaching a ceiling

This dataset climbs and then flattens toward a ceiling. A linear fit would miss the plateau and overshoot.

Check understanding

What pattern does this dataset suggest?

  1. Linear
  2. Polynomial
  3. Logistic-like
Step 13 of 16

Polynomial on ceiling data

A polynomial fit on this saturation data oscillates near the ends, showing instability when extrapolating.

Check understanding

How does the polynomial behave on the saturation data?

  1. Good fit
  2. Overfitting
  3. Underfitting
Step 14 of 16

Visualise logistic shape

A logistic-shaped curve rises and then levels off. This shape better matches the ceiling behaviour than a high-degree polynomial.

Step 15 of 16

Compare poly vs logistic idea

Comparing the polynomial to the logistic concept shows why polynomials cannot robustly model saturation without instability.

Check understanding

Which model is conceptually better for saturation?

  1. Polynomial better
  2. Logistic conceptually better
  3. Both terrible
Step 16 of 16

Summary

Polynomial regression can capture curvature and improve fit, but overfitting and instability are risks. For saturation patterns, a logistic model is often more appropriate.