Skip to main content

ADABOOST

AdaBoost This blog post will provide you with a comprehensive overview of Adaboost, exploring the theory behind this probabilistic algorithm and demonstrating its implementation using Python libraries. Dive in to uncover the advantages and disadvantages of neural network, as well as its real-world applications across various domains. With that, enjoy your journey in QDO! What is  Adaboost AdaBoost (Adaptive Boosting) is an ensemble learning technique that combines multiple weak classifiers (often decision trees) to create a strong classifier. It works by training the weak classifiers sequentially, giving more weight to misclassified instances at each step so that subsequent classifiers focus more on the harder cases. The final prediction is made by combining the weighted votes of all weak classifiers. AdaBoost is effective at reducing bias and variance, and it’s particularly good for binary classification problems. However, it can be sensitive to noisy data and outliers. Concepts o...

ADABOOST

AdaBoost


This blog post will provide you with a comprehensive overview of Adaboost, exploring the theory behind this probabilistic algorithm and demonstrating its implementation using Python libraries. Dive in to uncover the advantages and disadvantages of neural network, as well as its real-world applications across various domains. With that, enjoy your journey in QDO!

What is Adaboost

AdaBoost (Adaptive Boosting) is an ensemble learning technique that combines multiple weak classifiers (often decision trees) to create a strong classifier. It works by training the weak classifiers sequentially, giving more weight to misclassified instances at each step so that subsequent classifiers focus more on the harder cases. The final prediction is made by combining the weighted votes of all weak classifiers. AdaBoost is effective at reducing bias and variance, and it’s particularly good for binary classification problems. However, it can be sensitive to noisy data and outliers.

Concepts of AdaBoost

In the forest of AdaBoost, the tree is only made out of 1 node and 2 leaves
This tree is called "Stump", which represents a weak learner in the model because it is not great in making accurate decisions.

There are several main concepts on how AdaBoost operates that differs this algorithm from the others.

  1. Combine weak learners known as stump
  2. Some stumps contain more say in classifications
  3. Previous stump impacts how the next stump is created


Lets say we have a Heart Disease dataset displayed as below


We first assign a sample weight for all the columns. 


Initially all the records carries the same weight but its weightage might shift once the stump is created.


The first stump is created using the attribute with the lowest gini index.

We first calculate the total error created by this stump, which in this context is 1/8
We then calculate the amount of Say using this formula below


If the amount of say for all records are plot on a graph that applies Total Error on the x-axis and amount of say on the y-axis. The results of the graph would look like this.


A lower value in the total Error would result in a high amount of say while a higher value in total error would result in a lower value in the amount of say.

As mentioned earlier, AdaBoost creates the next stump by focusing on the error of the previous one. Hence, we must increase the sample weight of the records which creates the error within the stump using the formula below.



The records that results in a correct prediction will result in a lower weightage using the formula below.


The new weightage is then normalized to scale the value between 0 and 1 and replaces the old sample weightage.
                                                                                  



Next, a new dataset is created in creation of the new stump.


Then a value between 0 and 1 is chosen. Lets say the value chosen is 0.72. Since 0.72 falls between the range of the 5th record from the old dataset, the 5th record is chosen to be the first record within the new dataset. The process repeats until the size of the new dataset is the same as the old dataset. 


The new dataset is created and the entire process iterates until a certain amount of number.

When it receives a testing data, the data is then passed through all of the stumps and the total amount of say for each classification is accumulated




As displayed above, the total amount of say for the patient having heart disease is more than the total amount of say for the patient not having heart disease. Hence, the patient is classified as having heart disease.

Implementation of AdaBoost in python

Importing libraries 

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Importing libraries 

iris = load_iris()

Importing libraries 

X = iris.data
y = iris.target

Importing libraries 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Importing libraries 

base_estimator = DecisionTreeClassifier(max_depth=1)
adaboost = AdaBoostClassifier(estimator=base_estimator, n_estimators=30,
learning_rate=0.5, random_state=42)

Importing libraries 

adaboost.fit(X_train, y_train)

Importing libraries 

y_pred = adaboost.predict(X_test)

Importing libraries 

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')

Accuracy: 0.9667

Parameters that you can tune in AdaBoost

n_estimators: The number of weak learners (usually decision trees) to be combined. Increasing this value can improve performance but may lead to overfitting.

learning_rate: A weight applied to each classifier’s contribution. A smaller learning rate requires more estimators to achieve the same performance, but it helps in preventing overfitting.

base_estimator: The weak learner to be used (by default, it’s a decision tree with a maximum depth of 1). You can change it to other models, like deeper trees or linear models.

algorithm: Specifies the boosting algorithm, either "SAMME" (multiclass) or "SAMME.R" (real boosting, using probabilities from weak learners). "SAMME.R" is usually faster and performs better.

random_state: Controls the randomness for reproducibility.

max_depth, min_samples_split, min_samples_leaf (if using a decision tree as the base estimator): These control the complexity of the individual weak learners.

Advantages and disadvantages of AdaBoost

Advantages

1) High Accuracy

Combines multiple weak classifiers to build a strong model, often achieving high accuracy.

2) Adaptability

Focuses on hard-to-classify instances by giving them more weight, improving performance on challenging data points.

3) Versatile Base Learners

Can use various weak classifiers, though decision trees are most common.

Disadvantage

1) Sensitivity to Noisy Data and Outliers

Since it increases the weight of misclassified points, noisy data or outliers can overly influence the model.

2) Computationally Intensive

With many estimators, training can be slow.

3) Dependency on Weak Learner Performance

If the weak learner is too complex (like deep trees), it can lead to overfitting.


Implementation of AdaBoost in real life

1. Fraud Detection 


Companies like PayPal use AdaBoost to detect fraudulent transactions. It helps identify unusual patterns in real-time by giving more focus to hard-to-classify transactions.

2. Face Detection



Apple have utilized AdaBoost in face detection algorithms, especially in earlier versions of their face recognition technology. It efficiently combines simple classifiers to detect human faces from images.

3. Customer Churn Prediction


Telecom companies like Verizon and e-commerce platforms like Amazon use AdaBoost to predict customer churn. It helps identify users likely to leave the service by analyzing historical data patterns.

Comments

Popular posts from this blog

PRINCIPAL COMPONENT ANALYSIS (PCA)

PRINCIPAL COMPONENT ANALYSIS (PCA) Figure 1: PCA This blogpost will bring to you the concept of principal component analysis which is one of the commonly used descriptive analysis that emphasizes of dimensionality reduction. You will learn how to implement this machine learning model in python, its advantages and disadvantages as well as how companies benefits from this machine learning model. What is PCA PCA is a statistical dimensionality-reducing technique. It takes a large set of variables and transforms them into a smaller set, retaining most of the information in the large set. This can be done by identifying the directions along which the data varies the most. These components are orthogonal to one another, capture the maximum possible variance within the data, and hence form a powerful tool for the simplification of datasets without loss of essential patterns and relationships. Concept of  PCA One of the key concepts behind PCA concerns diminishing the complexity of high-di...

LINEAR REGRESSION

 LINEAR REGRESSION Figure 1: Linear regression figure This blogpost will walk you through the concept of linear regression which is another machine learning model under the regression category of supervised learning. Introducing the parameters that you can turn while applying the logistic regression as well as the factors that play a significant impact upon the performance of the linear regression. What is linear regression Linear regression is a machine learning algorithm that could be used in predictive analysis. From predicting prices of houses to sales forecasting, linear regression is undoubtedly the first choice to many data scientists to implement within the dataset. In short, linear regression involves plotting your data on the graph base on the x and y coordinate and proceed to draw the best fit line upon the graph. The best fit line will be used as a reference to predict the independent variable in the future. However, do you have the skill to conduct a excellent analysis...

DECISION TREE

 DECISION TREE Figure 1: Decision Tree      This blogpost aims to introduce to you regarding to a machine learning model called decision trees. After reading this blogpost, you are able to deepen your knowledge on the concepts of decision trees model, its terminology, pros and cons as well as its application in real life scenarios that lends a hand in solving complex problems thus boosting the living quality of many.  What is decision tree      Imagine you’re wondering through a forest, each path branching off into multiple directions, and you need to make a series of decisions to escape the forest. Now, picture having a map that not only shows you all possible routes but also guides you on the specific conditions you encounter. Decision trees model which applies various splitting criteria's within the branches assists the user in decision making purposes. Compared to regression models which applies complex mathematical formulas like logistic regr...