Step 2 of 8
The dataset — two clusters
Our dataset has two clear clusters of points. Each point is labelled as either Class A or Class B, forming distinct groups in the space.
Check understanding
What pattern do you see in the dataset?
- Random scatter
- Two distinct clusters
- Single group
Step 3 of 8
Classifying with k = 1
With k = 1, the algorithm looks at only the single nearest neighbour. The new point gets classified as whatever class that nearest neighbour belongs to.
Check understanding
How many neighbours does k = 1 consider?
- All neighbours
- Just one
- Three neighbours
Step 4 of 8
Classifying with k = 3
With k = 3, the algorithm examines the three nearest neighbours and uses majority voting. If two are Class A and one is Class B, the prediction is Class A.
Check understanding
How does k = 3 make its prediction?
- Uses the closest point only
- Majority vote of 3 neighbours
- Averages all points
Step 5 of 8
Classifying with k = 7
With k = 7, we consider seven nearest neighbours. This larger k value makes the classification more stable and less sensitive to individual outlier points.
Check understanding
What advantage does a larger k provide?
- Faster computation
- More stable predictions
- Worse accuracy
Step 6 of 8
Decision boundary with k = 1
The decision boundary shows which regions would be classified as A or B. With k = 1, the boundary is very jagged because it reacts to every single nearby point.
Check understanding
Why is the k = 1 boundary jagged?
- It uses all points
- It reacts to individual points
- It ignores close points
Step 7 of 8
Decision boundary with k = 7
With k = 7, the boundary becomes much smoother. Instead of following every local variation, it captures the general separation between the two clusters.
Check understanding
What does a smooth boundary indicate?
- Over-fitting to noise
- General pattern recognition
- Random classification
Step 8 of 8
Comparing k = 1 vs k = 7
Side-by-side comparison shows the key trade-off: small k is sensitive to local details (can overfit), while large k focuses on broader patterns (more generalizable). The transition zone between clusters shows this most clearly.
Check understanding
In the ambiguous middle region, which k is more stable?
- k = 1 (more detail)
- k = 7 (averages neighbors)
- Both are equal