K-Nearest Neighbors (KNN) Algorithm Explained with Easy-to-Understand Examples

Photo by Yancy Min on Unsplash

K-Nearest Neighbors (KNN) Algorithm Explained with Easy-to-Understand Examples

K-Nearest Neighbors (KNN) is a machine learning algorithm that can be used for classification and regression problems. It's a simple and easy-to-understand algorithm, which makes it a good starting point for learning about machine learning.

The basic idea behind KNN is to find the K closest training examples to a new example, and use the class of the majority of those K examples to predict the class of the new example. In other words, if K=3 and two of the closest examples are class A and one is class B, we would predict that the new example is class A.

Here's an example to help illustrate how KNN works:

Imagine that we want to classify different types of fruits based on their weight and color. We have a dataset of 10 fruits, with their weights and colors, and their corresponding labels (Apple or Orange):

FruitWeight (grams)ColorLabel
Fruit 1100RedApple
Fruit 2120RedApple
Fruit 3130RedApple
Fruit 4140OrangeOrange
Fruit 5150OrangeOrange
Fruit 6160OrangeOrange
Fruit 7170OrangeOrange
Fruit 8180OrangeOrange
Fruit 9190RedApple
Fruit 10200RedApple

We want to predict the label of a new fruit that weighs 165 grams and is orange in color. To do this using KNN, we would:

  1. Calculate the distance between the new fruit and each of the 10 fruits in our dataset, using a distance metric such as Euclidean distance.

  2. Sort the distances in ascending order to get the K closest fruits to our new fruit. Let's say we set K=3 for this example.

  3. Count the number of fruits of each label among the K closest fruits. In this case, we have 2 oranges and 1 apple.

  4. Predict the label of the new fruit to be the label that was most common among the K closest fruits. In this case, we would predict that the new fruit is an Orange.

That's the basic idea behind KNN! The algorithm is simple, but it can be very effective for certain problems. One drawback of KNN is that it can be slow and memory-intensive for large datasets, since it requires computing the distance between every example in the dataset and the new example.

Did you find this article valuable?

Support codebyaadi by becoming a sponsor. Any amount is appreciated!