In this article, we are going to understand the absolute fundamentals, along with intuitions of what K-Nearest Neighbours (K-NN) is all about.

### What is K-Nearest Neighbours ?

K-NN, or K-Nearest Neighbours, is a **supervised learning **algorithm that can be used for **Classification** and **Regression** problems.

**What is classification ?** You have 10 bottles of wines. You are supposed to classify them into red and white. This can be done using software, and is known as a **Classification Problem**.

**What is regression ?** Simply put, regression is a technique to predict. The prediction could be the future, the price of a house, the quantity of units being sold today on Amazon. Regression in simple words means to predict in the space of Machine Learning. Don’t let these gurus confuse you!

The term K-Nearest Neighbours means the following:

1. **K **– An integer value that we choose to tell the algorithm

2. **Neareast Neighbours** – We use the value from K above, say for example K=5, and ask our algorithm to tell us the 3 Nearest Neighbours to a certain data point wih coordinate say (X,Y)

### K-NN Classification: A strong intuition

To begin with it is important we touch the topic of K-NN classification, and then once the intuition is understood, we slowly move over to K-NN regression in another post.

To make things clearer in perspective, like I always prefer, the intuition of this algorithm is absolutely essential. Let’s take a look at this image below.

In the above graph we see two categories, **Green** and **Blue**. Now, the ones in Blue when dropped do not break, but the ones in Green, when dropped, break!

Now the question is what if I have one glass with a specific hardness and it is dropped from a certain height. We would like to predict if this glass will **break** or **not break**. Take a look at the image below.

Here, the point on the scatter plot, with a **purple** outline, is to be categorised as a glass which will break or not. Simple ? Maybe yes, maybe no. Let’s find out together 🙂

## The process behind K-NN

The process is simple, and memorise it, will only help you grasp the logic better. Four main steps for you. Ecco qua!

**Step 1**

The problem is question whether a glass of a specific hardness, if dropped from a specific height, will break or not – is placed into the scatter plot with other glasses.

**Step 2**

The distance is measured from this point to **ALL **other points on the scatter plot. The distance is measured using the Euclidean distance formula, which is:

**If the points (x1,y1) and (x2,y2) are in 2-dimensional space, then the Euclidean distance between them is **

**Step 3**

**All the distances are first sorted in ascending order.**

**We then choose K = 5**, which is an number we choose as part of the algorithm. It tells the algorithm to find the K number of nearest neighbours from the **PURPLE** point. In the diagram on the right we have marked the **SHORTEST FIVE DISTANCES** in **RED**, since the value of **K** we have chosen is **5**.

## Step 4

In the image below, I have illustrated the most important steps next:

1. The distances are sorted in ascending order, an the top five distances are chosen since we have selected K=5.

2. The algorithm then checks for the highest frequency of the classes. In this case it is Green.

3. Since the highest frequency is Green, the algorithm decides, ok “**I think that the glass will break since the highest frequency is green!**“

## How do I choose the right value of K?

To select the appropriate value of K, we have to run the KNN algorithm multiple times, with multiple values of K. Whichever value of K gives us the best percentage of accuracy and/or minimum errors, then that is the right choice of K. 🙂

**Quick tips**:

1. With decreasing value of K – predictions become less stable. Example: K=3

2. With increasing value of K – predictions become better, and more stable. This is mainly because of higher voting capacity.

3. Always keep the value of K as an odd number. Helps reduce split decisions.

## Conclusion

K-NN is among the simplest classification algorithm on the market right now. It is important to begin with such simple classification algorithms to understand the absolute basics of Machine Learning. I hope you enjoyed reading this, and in case you have any questions – please ask your doubts, in the comments below.