AdaBoost This blog post will provide you with a comprehensive overview of Adaboost, exploring the theory behind this probabilistic algorithm and demonstrating its implementation using Python libraries. Dive in to uncover the advantages and disadvantages of neural network, as well as its real-world applications across various domains. With that, enjoy your journey in QDO! What is Adaboost AdaBoost (Adaptive Boosting) is an ensemble learning technique that combines multiple weak classifiers (often decision trees) to create a strong classifier. It works by training the weak classifiers sequentially, giving more weight to misclassified instances at each step so that subsequent classifiers focus more on the harder cases. The final prediction is made by combining the weighted votes of all weak classifiers. AdaBoost is effective at reducing bias and variance, and it’s particularly good for binary classification problems. However, it can be sensitive to noisy data and outliers. Concepts o...
AdaBoost
This blog post will provide you with a comprehensive overview of Adaboost, exploring the theory behind this probabilistic algorithm and demonstrating its implementation using Python libraries. Dive in to uncover the advantages and disadvantages of neural network, as well as its real-world applications across various domains. With that, enjoy your journey in QDO!
What is Adaboost
AdaBoost (Adaptive Boosting) is an ensemble learning technique that combines multiple weak classifiers (often decision trees) to create a strong classifier. It works by training the weak classifiers sequentially, giving more weight to misclassified instances at each step so that subsequent classifiers focus more on the harder cases. The final prediction is made by combining the weighted votes of all weak classifiers. AdaBoost is effective at reducing bias and variance, and it’s particularly good for binary classification problems. However, it can be sensitive to noisy data and outliers.
AdaBoost (Adaptive Boosting) is an ensemble learning technique that combines multiple weak classifiers (often decision trees) to create a strong classifier. It works by training the weak classifiers sequentially, giving more weight to misclassified instances at each step so that subsequent classifiers focus more on the harder cases. The final prediction is made by combining the weighted votes of all weak classifiers. AdaBoost is effective at reducing bias and variance, and it’s particularly good for binary classification problems. However, it can be sensitive to noisy data and outliers.
Concepts of AdaBoost
In the forest of AdaBoost, the tree is only made out of 1 node and 2 leaves
This tree is called "Stump", which represents a weak learner in the model because it is not great in making accurate decisions.
There are several main concepts on how AdaBoost operates that differs this algorithm from the others.
- Combine weak learners known as stump
- Some stumps contain more say in classifications
- Previous stump impacts how the next stump is created
- Combine weak learners known as stump
- Some stumps contain more say in classifications
- Previous stump impacts how the next stump is created
Lets say we have a Heart Disease dataset displayed as below
We first assign a sample weight for all the columns.
Initially all the records carries the same weight but its weightage might shift once the stump is created.
The first stump is created using the attribute with the lowest gini index.
We first calculate the total error created by this stump, which in this context is 1/8We then calculate the amount of Say using this formula below
If the amount of say for all records are plot on a graph that applies Total Error on the x-axis and amount of say on the y-axis. The results of the graph would look like this.
A lower value in the total Error would result in a high amount of say while a higher value in total error would result in a lower value in the amount of say.
As mentioned earlier, AdaBoost creates the next stump by focusing on the error of the previous one. Hence, we must increase the sample weight of the records which creates the error within the stump using the formula below.
The records that results in a correct prediction will result in a lower weightage using the formula below.
The new weightage is then normalized to scale the value between 0 and 1 and replaces the old sample weightage.
Next, a new dataset is created in creation of the new stump.
Then a value between 0 and 1 is chosen. Lets say the value chosen is 0.72. Since 0.72 falls between the range of the 5th record from the old dataset, the 5th record is chosen to be the first record within the new dataset. The process repeats until the size of the new dataset is the same as the old dataset.
When it receives a testing data, the data is then passed through all of the stumps and the total amount of say for each classification is accumulated
We first assign a sample weight for all the columns.
Initially all the records carries the same weight but its weightage might shift once the stump is created.
The first stump is created using the attribute with the lowest gini index.
We first calculate the total error created by this stump, which in this context is 1/8
We then calculate the amount of Say using this formula below
If the amount of say for all records are plot on a graph that applies Total Error on the x-axis and amount of say on the y-axis. The results of the graph would look like this.
A lower value in the total Error would result in a high amount of say while a higher value in total error would result in a lower value in the amount of say.
As mentioned earlier, AdaBoost creates the next stump by focusing on the error of the previous one. Hence, we must increase the sample weight of the records which creates the error within the stump using the formula below.
Next, a new dataset is created in creation of the new stump.
Then a value between 0 and 1 is chosen. Lets say the value chosen is 0.72. Since 0.72 falls between the range of the 5th record from the old dataset, the 5th record is chosen to be the first record within the new dataset. The process repeats until the size of the new dataset is the same as the old dataset.
When it receives a testing data, the data is then passed through all of the stumps and the total amount of say for each classification is accumulated
Implementation of AdaBoost in python
Importing libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
Importing libraries
iris = load_iris()
Importing libraries
X = iris.data
y = iris.target
Importing libraries
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
Importing libraries
base_estimator = DecisionTreeClassifier(max_depth=1)
adaboost = AdaBoostClassifier(estimator=base_estimator, n_estimators=30,
learning_rate=0.5, random_state=42)
Importing libraries
adaboost.fit(X_train, y_train)
Importing libraries
y_pred = adaboost.predict(X_test)
Importing libraries
accuracy = accuracy_score(y_test, y_pred)print(f'Accuracy: {accuracy:.4f}')
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')
Accuracy: 0.9667
Parameters that you can tune in AdaBoost
n_estimators: The number of weak learners (usually decision trees) to be combined. Increasing this value can improve performance but may lead to overfitting.
learning_rate: A weight applied to each classifier’s contribution. A smaller learning rate requires more estimators to achieve the same performance, but it helps in preventing overfitting.
base_estimator: The weak learner to be used (by default, it’s a decision tree with a maximum depth of 1). You can change it to other models, like deeper trees or linear models.
algorithm: Specifies the boosting algorithm, either "SAMME" (multiclass) or "SAMME.R" (real boosting, using probabilities from weak learners). "SAMME.R" is usually faster and performs better.
random_state: Controls the randomness for reproducibility.
max_depth, min_samples_split, min_samples_leaf (if using a decision tree as the base estimator): These control the complexity of the individual weak learners.
Advantages and disadvantages of AdaBoost
Advantages
1) High Accuracy
Combines multiple weak classifiers to build a strong model, often achieving high accuracy.
2) Adaptability
Focuses on hard-to-classify instances by giving them more weight, improving performance on challenging data points.
3) Versatile Base Learners
Can use various weak classifiers, though decision trees are most common.
Disadvantage
1) Sensitivity to Noisy Data and Outliers
Since it increases the weight of misclassified points, noisy data or outliers can overly influence the model.
2) Computationally Intensive
With many estimators, training can be slow.
3) Dependency on Weak Learner Performance
If the weak learner is too complex (like deep trees), it can lead to overfitting.
Implementation of AdaBoost in real life
1. Fraud Detection
Companies like PayPal use AdaBoost to detect fraudulent transactions. It helps identify unusual patterns in real-time by giving more focus to hard-to-classify transactions.
2. Face Detection
Apple have utilized AdaBoost in face detection algorithms, especially in earlier versions of their face recognition technology. It efficiently combines simple classifiers to detect human faces from images.
3. Customer Churn Prediction
Telecom companies like Verizon and e-commerce platforms like Amazon use AdaBoost to predict customer churn. It helps identify users likely to leave the service by analyzing historical data patterns.





Comments
Post a Comment