Machine Learning: XGBoost / Extreme Gradient Boosting

To understand the impact some disruptions can make, it is important to know how an algorithm such as XGBoost, has pushed the Machine Learning community by a leap. With successful results, and winning in multiple competitions, this algorithm is quite a breakthrough, and uses a trivial, yet essential concept to help find effective solutions.

What is XGBoost ?

XGBoost is an ensemble learning algorithm, and it stands for Extreme Gradient Boosting. At times, it is important to take the reviews of a few people before we order for a pizza, isnt it ? Thus, this insufficiency of validated information is what is at the heart, and root, of XGBoost.
Now, what is an ensemble learning algorithm ?
Well, to put it simply, we combine the predictive power of multiple results, and the resultant is the aggregated output of several models. For example, if 7 out of 10 suggest that this object is a ball, we will rely on the opinion that ok, this object is a ball.

Did you know that CERN utilises the XGBoost algorithm to classify signals for the Large Hadron Collider ? You may ask why ? Well it’s because of the algorithms construct, scalability, and flexibility that allows it to process close to 3 petabytes (~3000 Terabytes) of data every year. The two common techniques used for statistical models using ensemble learners are Bagging and Boosting.

So, the models that form the ensemble, also called as base learners, could include learning algorithms from the same, or different learning algorithms. There are two predominant techniques used for ensemble learners, which are, Bagging and Boosting. We will discuss Bagging and Boosting in the sub-topic below.

How does XGBoost work?

Before we begin with understanding how XGBoost works, let us understand what are these two essential concepts, Bagging, and Boosting.


Intuition: Image we are in a supermarket, as a family of 5.
1. When three suggest that they want to eat pasta tonight, majority wins. We cook pasta. Winner: Pasta
2. Now, if 4 suggest that today we eat pasta, again majority wins. Winner: Pasta
3. If two want to eat pasta, and the other three want to eat alternative meals such as a kebab, pizza, and burger, respectively, again pasta wins. Winner: Pasta Che Buona!

I don’t want pasta to lose, because it is life. 😀

Now, the point here is that, in bagging, the model accepts inputs, and comes up with multiple decisions. The majority of the output received is what wins, and the model will give us an output by a majority vote – just like we did with pasta.

To give you the right example, this is EXACTLY how decision trees work. They create multiple decisions, and the one which has the highest votes wins.


In boosting, the process is quite different. Here we learn from our mistakes, and thus end up making a strong predictor. The model learns sequentially, and the output of one becomes the input of the second, then the output of the second becomes the input of the third, so on – and so forth. Thus it iterates, and learns from it’s previous iteration.

The models are built sequentially such that each subsequent model aims to reduce the errors of the previous model. Each model learns from its predecessors. Hence, the model that grows next in the sequence will learn from an updated version of the errors. Thus, with every passing iteration of the model, we end up knowing “WHAT WE DON’T WANT“.

Intuition: The illustration below should give you a quick idea of how the algorithm works. The iterative process of the XGBoost algorithm

As we can see, the output of the First Model, becomes the input of the Second Model, and so on. With each passing model, the errors of the first become the benchmarks for the consequent one, and keep telling it, “OK, understand me clearly, you are NOT supposed to do this!

How does XGBoost learn from the errors?

I like to explain things with intuitions, because if you can imagine something, you can reproduce it. Sometimes we may not be able to explain it right away, but little by little, imagining what is really happening, in our little grey organ, over time, we are able to express ourselves with a lot more ease.

So, let us take a look at the below illustrations to understand what do we mean by that XGBoost learns from it’s errors ? In the above illustration, we see how the outputs of various models are essential in deciding the final output.

So, the illustration includes the Sun and the Moon. Let’s look at it in detail:
1. Output 1: 3/5 Suns correctly classified – 2/5 incorrectly classified
2. Output 2: 3/5 Suns correctly classified – 2/5 incorrectly classified
3. Output 3: 4/5 Suns correctly classified – 1/5 incorrectly classified – 2/5 Moons incorrectly classified

Now, let’s imagine this intuitively. In the final result, the output of each one of them is overlapped, and we end up with a final result with correct classification of the Sun and the Moon.

This is how, Boosting really works. It finds the intersection of all correct classifications, and gives us a final model with dependable results. Tada!

Why is the word Gradient used in XGBoost (Extreme Gradient Boosting) ?

Extreme Gradient Boosting gets it’s name from Gradient Descent. The loss function that is chosen by us, is minimised by using gradient descent to help us find the right parameters to ensure this loss function is minimum. The aim is to find those parameters where the loss function give us the MINIMUM results. This is how Gradient Boosting reduces the error. With every pasing sequence, the error reduces, all the way until we have found one final predictor which gives us the least difference in the actual and predicted values.

A combination of model, and computational, efficiency has ensured that this model turns out to be the most effective one among many available. It is Fast, Efficient, and Easy to implement.

An intuitive explanation of the Mathematics behind XGBoost

Mathematics – a word that has given many little children, and adults alike, more sleepless nights than Exhorcist. Absolutely! Here to scare you

Well, to make things simple, for you, and me 🙂 I will provide you with a strong intuition of what is really happening when such an algorithm is put to use. The mathematics can be difficult for me to remember and recollect as well, but with an intuition it shall be simple to grasp the essence of the Algorithm.

The general mathematical formula: fm(x)=fm-1(x)+ γmhm(x)

Hence, the first iteration would be: f1(x)=f0(x)+γ1h1(x)

To explain the above, we are going to perform three important steps:

  1. An initial model f0 is defined to predict the target variable y. This model will be associated with a residual (yf0) – this is known as the ‘Residual‘ which is the difference between the predicted value and the loss function.
  2. A new model h1 is fit to the residuals from the previous step.
  3. Now, f0 and h1 are combined to give f1, the boosted version of f0. To this model h1 we add a constant called γ1 which is the derived multiplicative factor.

Therefore, we achieve a general equation which can visualised as the following: General boosted model equation. If you have any doubts, please re-read the text above. Don’t hesistate to ask your doubts.

Now that we have seen what the general boosted model looks like, I have made a quick video for you to see what the model actually does. I enjoy learning with intuitions since it let’s me imagine the models, and numbers, in my mind.

Intuition of the Mathematics

Let’s consider the first square below with ‘x‘ which is 1/4 part of the square being the solution and the residue being 3/4 parts of the square, which is something we don’t need as part of our solution.

Now, over a few iterations, we obtain the following results: Assume that after each iteration, we have the following three results.

Now what happens after this is where things begin to make complete sense. The three results, overlap, and the end result is shown in a short 20 second below. A visual illustration of what essentially happens during the algorithm

As we can see through Part 1 to Part 4 of the algorithm breakdown, of what is really happening at the heart of the algorithm. We achieve results MINUS the residues that are computed. The multiplicative factor is also derived using a mathematical formula. Here, for convenience, I have ensured that the intuition is strongly understood. Some simple ideas can create effective solutions.

Sample Code

Being a quick resouce, here is a quick sample code for your perusal. Can be used as you would want. In another blog post, I shall explain the specifics of how can we hyper tune our model for enhanced results.


#Import Packages
import xgboost as xgb
from sklearn.metrics import mean_squared_error
from sklearn.metrics import explained_variance_score

#Initialise the Model, I have used some hyperparameters here
xgb = xgb.XGBRegressor(objective =’reg:squarederror’,n_estimators=101,learning_rate=0.2, gamma=0,subsample=0.8,colsample_bytree=0.6, max_depth=43)

#Fit the Model,y_train)

#Predict and Print
predictions = xgb.predict(x_test)
print(‘Accuracy Rate:’,round(explained_variance_score(y_test,predictions)*100),’%’)


I have typed this tutorial with utmost excitement, and a drive to share what I have understood. While I implement the same algorithm at work, understanding the fundamentals of algorithms is key to moving a step ahead everyday while we float through this vast ocean of Artificial Intelligence and Machine Learning. In case you have any questions, or doubts, feel free to drop a message here. If you do observe any errors in the above post, do comment your observations. I would like to rectify the error immediately.

Stay tuned for more!

Your guide for Business Jargons

Business journals, books, and guides, of all sorts, are an integral part of corporate life. Many such books tend to touch base with certain business buzzwords, that tend to become an essential part of work, and private life. Here, we have compiled a quick list of what we could think ‘off our head’ (another business jargon), for you to use them.

Use them, and leave your colleagues amazed!

Big Data

The massive collection of structured and unstructured data that is often difficult to process through traditional means. Data collected from mobile, web browsing, voice data etc.

Bleeding Edge

A step beyond “cutting edge”, bleeding edge typically refers to technology that is so new that it is unproven. Also means being ahead of the current trends.

Catch Up

 Also known as “check in,” and “touch base” To have a conversation about something. “ Let’s touch base next week to finalize the terms.”

Change Agent

Change is often perceived as a good thing in business, but it can also be incredibly bad! A change agent is one who leads the charge whatever the change may be.

Contextual Marketing

Another marketing buzzword, contextual marketing is aware of its surroundings and placement within a larger form of content. For example, offering a free ebook about how to use Instagram in an article on social media.

Core Competency

In business, core competencies are what a company or person does best.

Corporate Synergy

Refers to coordinating and collaborating effectively within an enterprise. Use this one where appropriate, Many users don’t understand the true meaning.


Deck refers to a power-point presentation as in a “deck of cards.” Seems like it would be easier to ask for the file, or slides, but “deck” it is.

Deep Dive

Otherwise known as brainstorming, this one is used by professionals as in “Were going for a deep dive on the Parson’s account.”


A deliverable is an output of work that is completed. For example a brochure, ad, computer code or a document.


Something that “rocks the boat,” a game changer, a unique product or service whose innovation throws the status quo off kilter.


A strategy where the “basic” version of a product is offered free of charge. Extra functionality requires an upgrade to a premium version. Examples include, WordPress, MailChimp and Hoot Suite. Also used frequently by mobile games.

Go Live

When something is released to the public, for example a promotion or website. For example, “The new website goes live next Monday.”

Growth Hacking

Refers to bootstrap marketing strategies utilized by businesses with small budgets, startups, and new businesses. Growth hacking consists of free marketing methods like blogging, social media, SEO and content marketing.


The newest of the business buzzwords, hyperlocal search uses GPS data to geographically target audiences and provide location based advertising. Hyperlocal SEO is optimizing your online content to reflect your location using street address, neighborhood information, proximity to local landmarks even longitude and latitude to pinpoint searchers to your physical location.


One of the most ridiculous business buzzwords, “ideation” is the process of creating new ideas.


To get someone to buy what you’re selling, you need to offer incentives. This is the word to describe the effort. Can also mean motivating a person to get something done.


This one has been beaten to death. Innovators produce new ideas, products or strategies that are “revolutionary.” Everyone is an innovator these days. They used to be called Entrepreneurs.

In This Space

A hipper way of saying “here.” Example, “Cryptocurrency is exciting. There’s a lot of opportunity in this space.”


Executive speak for firing staff. A natural progression after “downsizing” became too much of a bummer, followed by “rightsizing” ITL means “Invited to Leave.”


Jacking is the process of commandeering content for your own marketing purposes. Examples include news-jacking where writers cover a breaking story to further their personal agenda. Meme-jacking is a corporate takeover of a popular meme to market their product or service.


Key Performance Indicators or data used to measure performance. Not exactly a buzzword, it’s actually a longstanding marketing metric.

Low Hanging Fruit 

An obvious, easily attainable win you have to grab. While you always want to grab the brass ring, sometimes you can’t neglect the easy score.

Make A Case

To form a coherent argument for something that is deemed important. “I’ll make a case for that budget increase at the next partner’s meeting.”


Used in Silicon Valley around presentations and earnings calls. Also used to describe products or services that protect a company from incursions by competitors (see also competitive differentiation) Popularized over a decade ago by none other than Warren Buffet.

Move the Needle

Used in sales and marketing when effort is required to make a noticeable difference.

Outside the Box

Thinking or approaching a task in a different way than it is normally approached.

Pain Points

You’ll often hear talk of “finding your customers pain points.” Think of this as a creative way of saying “we want to understand and solve our client’s problems.”


To work together on a project. Example, “I’ll pair you and Tom together on the proposal.”


Once relegated to tech, ping is used by everyone today and means to send a message. Example, “I’ll ping Tom about the new meeting time.”


A euphemism for failing. If your product fails, you pivot to a new model or upgrade. If your business model is a joke, you pivot to another model. Popular euphemism used by politicians who are called on a previous stance – After due consideration, They “pivot” to a more popular stance.

Push Back

When you don’t get your way, sometimes with a reason. For example, “the client pushed back on our proposal. They want to find a cheaper way.”


Typical in the tech industry, a runway is how long a company can last before running out of money. A counterintuitive metaphor, a runway is typically the distance needed to attain flight. In Silicon Valley, it often is a measure of how long before a company “crashes and burns.”


The work that needs to get done, and what you don’t need to do. Example, “…that option is out of scope.”


“Snackable content” is a marketing buzzword that is used to delineate an attempt to draw people in with bite-sized bits of text, video or anything that bolsters a brand’s visibility. Offshoots include “ making readers hungry for snackable content,” or “ how to “give them a satisfying and speedy feed.”

Sweep The Sheds

Popularized by a 2013 business book offering lessons for success from the New Zealand Rugby team the “All Blacks” who used brooms to sweep out their own locker room. This business buzzword is a popular euphemism for a humble attention to detail.

Swim Lanes

A swim lane is a column or row in a flowchart. Each lane is devotes to one unit or process within the business.


A transformative change is a dramatic or delightful change. Can have both positive and negative connotations – for example a transformative change can include when a business goes bankrupt.


Once you ideate, you need to unpack your ideations.

Value Add

A positive impact you add to a product or company. For example, “did the client respond positively to the value add at the pitch?”


In boating a wheelhouse shelters the person driving the boat. In business is means a person’s specialty.

Believe it or not, this is just a small sampling! Every industry and profession from sales and marketing, to finance, healthcare and the auto industry has their own specific jargon and business buzzwords. 

Today, whether your pitching a new product, or networking in your industry, business buzzwords can give you an aura of being more reputable if you use the right terminology in the proper context.