In this article, we are going to understand the absolute fundamentals, along with intuitions of what K-Nearest Neighbours (K-NN) is all about.
What is K-Nearest Neighbours ?
K-NN, or K-Nearest Neighbours, is a supervised learning algorithm that can be used for Classification and Regression problems.
What is classification ? You have 10 bottles of wines. You are supposed to classify them into red and white. This can be done using software, and is known as a Classification Problem.
What is regression ? Simply put, regression is a technique to predict. The prediction could be the future, the price of a house, the quantity of units being sold today on Amazon. Regression in simple words means to predict in the space of Machine Learning. Don’t let these gurus confuse you!
The term K-Nearest Neighbours means the following: 1. K – An integer value that we choose to tell the algorithm 2. Neareast Neighbours – We use the value from K above, say for example K=5, and ask our algorithm to tell us the 3 Nearest Neighbours to a certain data point wih coordinate say (X,Y)
K-NN Classification: A strong intuition
To begin with it is important we touch the topic of K-NN classification, and then once the intuition is understood, we slowly move over to K-NN regression in another post.
To make things clearer in perspective, like I always prefer, the intuition of this algorithm is absolutely essential. Let’s take a look at this image below.
In the above graph we see two categories, Green and Blue. Now, the ones in Blue when dropped do not break, but the ones in Green, when dropped, break!
Now the question is what if I have one glass with a specific hardness and it is dropped from a certain height. We would like to predict if this glass will break or not break. Take a look at the image below.
Here, the point on the scatter plot, with a purple outline, is to be categorised as a glass which will break or not. Simple ? Maybe yes, maybe no. Let’s find out together 🙂
The process behind K-NN
The process is simple, and memorise it, will only help you grasp the logic better. Four main steps for you. Ecco qua!
The problem is question whether a glass of a specific hardness, if dropped from a specific height, will break or not – is placed into the scatter plot with other glasses.
The distance is measured from this point to ALL other points on the scatter plot. The distance is measured using the Euclidean distance formula, which is:
If the points (x1,y1) and (x2,y2) are in 2-dimensional space, then the Euclidean distance between them is
All the distances are first sorted in ascending order.
We then choose K = 5, which is an number we choose as part of the algorithm. It tells the algorithm to find the K number of nearest neighbours from the PURPLE point. In the diagram on the right we have marked the SHORTEST FIVE DISTANCES in RED, since the value of K we have chosen is 5.
In the image below, I have illustrated the most important steps next: 1. The distances are sorted in ascending order, an the top five distances are chosen since we have selected K=5. 2. The algorithm then checks for the highest frequency of the classes. In this case it is Green. 3. Since the highest frequency is Green, the algorithm decides, ok “I think that the glass will break since the highest frequency is green!“
How do I choose the right value of K?
To select the appropriate value of K, we have to run the KNN algorithm multiple times, with multiple values of K. Whichever value of K gives us the best percentage of accuracy and/or minimum errors, then that is the right choice of K. 🙂
Quick tips: 1. With decreasing value of K – predictions become less stable. Example: K=3 2. With increasing value of K – predictions become better, and more stable. This is mainly because of higher voting capacity. 3. Always keep the value of K as an odd number. Helps reduce split decisions.
K-NN is among the simplest classification algorithm on the market right now. It is important to begin with such simple classification algorithms to understand the absolute basics of Machine Learning. I hope you enjoyed reading this, and in case you have any questions – please ask your doubts, in the comments below.
To understand the impact some disruptions can make, it is important to know how an algorithm such as XGBoost, has pushed the Machine Learning community by a leap. With successful results, and winning in multiple competitions, this algorithm is quite a breakthrough, and uses a trivial, yet essential concept to help find effective solutions.
What is XGBoost ?
XGBoost is an ensemble learning algorithm, and it stands for Extreme Gradient Boosting. At times, it is important to take the reviews of a few people before we order for a pizza, isnt it ? Thus, this insufficiency of validated information is what is at the heart, and root, of XGBoost. Now, what is an ensemble learning algorithm ? Well, to put it simply, we combine the predictive power of multiple results, and the resultant is the aggregated output of several models. For example, if 7 out of 10 suggest that this object is a ball, we will rely on the opinion that ok, this object is a ball.
Did you know that CERN utilises the XGBoost algorithm to classify signals for the Large Hadron Collider ? You may ask why ? Well it’s because of the algorithms construct, scalability, and flexibility that allows it to process close to 3 petabytes (~3000 Terabytes) of data every year. The two common techniques used for statistical models using ensemble learners are Bagging and Boosting.
So, the models that form the ensemble, also called as base learners, could include learning algorithms from the same, or different learning algorithms. There are two predominant techniques used for ensemble learners, which are, Bagging and Boosting. We will discuss Bagging and Boosting in the sub-topic below.
How does XGBoost work?
Before we begin with understanding how XGBoost works, let us understand what are these two essential concepts, Bagging, and Boosting.
Intuition: Image we are in a supermarket, as a family of 5. 1. When three suggest that they want to eat pasta tonight, majority wins. We cook pasta. Winner: Pasta 2. Now, if 4 suggest that today we eat pasta, again majority wins. Winner: Pasta 3. If two want to eat pasta, and the other three want to eat alternative meals such as a kebab, pizza, and burger, respectively, again pasta wins. Winner: Pasta
I don’t want pasta to lose, because it is life. 😀
Now, the point here is that, in bagging, the model accepts inputs, and comes up with multiple decisions. The majority of the output received is what wins, and the model will give us an output by a majority vote – just like we did with pasta.
To give you the right example, this is EXACTLY how decision trees work. They create multiple decisions, and the one which has the highest votes wins.
In boosting, the process is quite different. Here we learn from our mistakes, and thus end up making a strong predictor. The model learns sequentially, and the output of one becomes the input of the second, then the output of the second becomes the input of the third, so on – and so forth. Thus it iterates, and learns from it’s previous iteration.
The models are built sequentially such that each subsequent model aims to reduce the errors of the previous model. Each model learns from its predecessors. Hence, the model that grows next in the sequence will learn from an updated version of the errors. Thus, with every passing iteration of the model, we end up knowing “WHAT WE DON’T WANT“.
Intuition: The illustration below should give you a quick idea of how the algorithm works.
As we can see, the output of the First Model, becomes the input of the Second Model, and so on. With each passing model, the errors of the first become the benchmarks for the consequent one, and keep telling it, “OK, understand me clearly, you are NOT supposed to do this!“
How does XGBoost learn from the errors?
I like to explain things with intuitions, because if you can imagine something, you can reproduce it. Sometimes we may not be able to explain it right away, but little by little, imagining what is really happening, in our little grey organ, over time, we are able to express ourselves with a lot more ease.
So, let us take a look at the below illustrations to understand what do we mean by that XGBoost learns from it’s errors ?
So, the illustration includes the Sun and the Moon. Let’s look at it in detail: 1. Output 1: 3/5 Suns correctly classified – 2/5 incorrectly classified 2. Output 2: 3/5 Suns correctly classified – 2/5 incorrectly classified 3. Output 3: 4/5 Suns correctly classified – 1/5 incorrectly classified – 2/5 Moons incorrectly classified
Now, let’s imagine this intuitively. In the final result, the output of each one of them is overlapped, and we end up with a final result with correct classification of the Sun and the Moon.
This is how, Boosting really works. It finds the intersection of all correct classifications, and gives us a final model with dependable results. Tada!
Why is the word Gradient used in XGBoost (Extreme Gradient Boosting) ?
Extreme Gradient Boosting gets it’s name from Gradient Descent. The loss function that is chosen by us, is minimised by using gradient descent to help us find the right parameters to ensure this loss function is minimum. The aim is to find those parameters where the loss function give us the MINIMUM results. This is how Gradient Boosting reduces the error. With every pasing sequence, the error reduces, all the way until we have found one final predictor which gives us the least difference in the actual and predicted values.
A combination of model, and computational, efficiency has ensured that this model turns out to be the most effective one among many available. It is Fast, Efficient, and Easy to implement.
An intuitive explanation of the Mathematics behind XGBoost
Mathematics – a word that has given many little children, and adults alike, more sleepless nights than Exhorcist.
Well, to make things simple, for you, and me 🙂 I will provide you with a strong intuition of what is really happening when such an algorithm is put to use. The mathematics can be difficult for me to remember and recollect as well, but with an intuition it shall be simple to grasp the essence of the Algorithm.
The general mathematical formula: fm(x)=fm-1(x)+ γmhm(x)
Hence, the first iteration would be: f1(x)=f0(x)+γ1h1(x)
To explain the above, we are going to perform three important steps:
An initial model f0 is defined to predict the target variable y. This model will be associated with a residual (y – f0) – this is known as the ‘Residual‘ which is the difference between the predicted value and the loss function.
A new model h1 is fit to the residuals from the previous step.
Now, f0 and h1 are combined to give f1, the boosted version of f0. To this model h1 we add a constant called γ1 which is the derivedmultiplicative factor.
Therefore, we achieve a general equation which can visualised as the following:
Now that we have seen what the general boosted model looks like, I have made a quick video for you to see what the model actually does. I enjoy learning with intuitions since it let’s me imagine the models, and numbers, in my mind.
Intuition of the Mathematics
Let’s consider the first square below with ‘x‘ which is 1/4 part of the square being the solution and the residue being 3/4 parts of the square, which is something we don’t need as part of our solution.
Now, over a few iterations, we obtain the following results:
Now what happens after this is where things begin to make complete sense. The three results, overlap, and the end result is shown in a short 20 second below.
As we can see through Part 1 to Part 4 of the algorithm breakdown, of what is really happening at the heart of the algorithm. We achieve results MINUS the residues that are computed. The multiplicative factor is also derived using a mathematical formula. Here, for convenience, I have ensured that the intuition is strongly understood. Some simple ideas can create effective solutions.
Being a quick resouce, here is a quick sample code for your perusal. Can be used as you would want. In another blog post, I shall explain the specifics of how can we hyper tune our model for enhanced results.
#Import Packages import xgboost as xgb from sklearn.metrics import mean_squared_error from sklearn.metrics import explained_variance_score
#Initialise the Model, I have used some hyperparameters here xgb = xgb.XGBRegressor(objective =’reg:squarederror’,n_estimators=101,learning_rate=0.2, gamma=0,subsample=0.8,colsample_bytree=0.6, max_depth=43)
#Fit the Model xgb.fit(x_train,y_train)
#Predict and Print predictions = xgb.predict(x_test) print(‘Accuracy Rate:’,round(explained_variance_score(y_test,predictions)*100),’%’)
I have typed this tutorial with utmost excitement, and a drive to share what I have understood. While I implement the same algorithm at work, understanding the fundamentals of algorithms is key to moving a step ahead everyday while we float through this vast ocean of Artificial Intelligence and Machine Learning. In case you have any questions, or doubts, feel free to drop a message here. If you do observe any errors in the above post, do comment your observations. I would like to rectify the error immediately.
You have no idea how long I have waited to have my own portal in order to share information, with one, and all. With a lot of patience, time, and effort, I’d like to begin my journey of being an open book, solving some of your problems, and in that process also learning a lot more from you.
To begin with, I would like to introduce Python to you as a concept I like to call SEQL – Simple Elegant Quick Learn. So instead of SQL, we have SEQL.
Python is a popular programming language. It was created by Guido van Rossum, and released in 1991, and has a wide array of uses, particularly:
Owing to it’s Simple syntax, and an Elegant programming structure, it is Quick to execute and Learn. Over the course of my forthcoming posts, you shall come across a wide range of tutorials, quick solutions, tips, and tricks, that I have documented, learned, and experienced over time.
With my trivial, yet foundational, exposure to Python, I want to document my understanding of programming, and make it as easy for you to grasp the fundamental concepts of programming.
In case you have any doubts, feel free to write down your questions in the comments below. I will try and respond to each one of them.
Before I leave, one has to cut the ribbon by a simple saying….wait for it….”Hello World”.