Skip to main content

ADABOOST

AdaBoost This blog post will provide you with a comprehensive overview of Adaboost, exploring the theory behind this probabilistic algorithm and demonstrating its implementation using Python libraries. Dive in to uncover the advantages and disadvantages of neural network, as well as its real-world applications across various domains. With that, enjoy your journey in QDO! What is  Adaboost AdaBoost (Adaptive Boosting) is an ensemble learning technique that combines multiple weak classifiers (often decision trees) to create a strong classifier. It works by training the weak classifiers sequentially, giving more weight to misclassified instances at each step so that subsequent classifiers focus more on the harder cases. The final prediction is made by combining the weighted votes of all weak classifiers. AdaBoost is effective at reducing bias and variance, and it’s particularly good for binary classification problems. However, it can be sensitive to noisy data and outliers. Concepts o...

NAIVE BAYES

 

NAIVE BAYES


This blog post will provide you with a comprehensive overview of Naive Bayes, exploring the theory behind this probabilistic algorithm and demonstrating its implementation using Python libraries. Dive in to uncover the advantages and disadvantages of Naive Bayes, as well as its real-world applications across various domains. With that, enjoy your journey in QDO!

WHAT IS NAIVE BAYES



Naive Bayes is a probabilistic machine learning algorithm based on Bayes' Theorem, primarily used for classification tasks. It assumes that features in a dataset are independent of each other, hence the term "naive." Despite this assumption, Naive Bayes performs remarkably well in practice, especially for tasks like spam filtering, sentiment analysis, and document classification. The algorithm calculates the probability of each class given the input features and selects the class with the highest probability as the predicted outcome. Its simplicity, efficiency, and effectiveness make it a popular choice for many real-world applications.


Concept of naive bayes

Scenario

Lets say we have a lot of mails in our inbox and we want to determine whether if that mail is a junk mail or not base on the content within the mail.


Likehood/Probabilities

We'll start by identifying the frequency of each word within the context of the mail for an actual mail
The probability/likehood of each word appearing within an actual mail is calculated for every single letter

The similar process is repeated for each junk mail


Next, we calculate the probability of an actual mail and junk mail appearing within our training dataset


Through comparing the value calculated from multiplying the probability of a actual/junk mail within the dataset and the probability of each letter within the content of the mail, we are able to determine whether the mail is an actual mail or junk mail base on which type of mail has the higher probability

Pseudocounts

In the event the content of the mail is 'Lunch Money Money Money Money' but the word lunch never appeared within the junk mail. This will result in the probability of the work 'Lunch' appearing within the junk mail to be 0, leading to the mail to be classified instantly as an actual mail.



However, it is very evident that the mail is a junk mail so how will this algorithm deal with situations like this?

The algorithm will automatically adds 1 to the count for each of the words to ensure the value of 0 will not appear so that misclassification would be minimized



Extra

Ever wonder where this algorithm got the name 'naive' from? That's because each word is treated fairly despite the difference within the sequence of the word within the sentence.

For example:

The term "Dear Friend" and "Friend Dear" had the same probability value when it comes to calculation be it which word comes first.

Implementation of naive bayes in python


Importing libraries 

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

Loading dataset

iris=load_iris()

Splitting the dataset into testing and training 

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
test_size=0.5)

applying the model

gnb=GaussianNB()
gnb.fit(X_train, y_train)

Get prediction result

y_pred=gnb.predict(X_test)

Get prediction accuracy

accuracy=accuracy_score(y_test, y_pred)
accuracy

0.9733333333333334

Testing the accuracy of other variants of Naive Bayes

from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred = mnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracy

0.7066666666666667

from sklearn.naive_bayes import BernoulliNB
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred = bnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracy

0.26666666666666666


Parameters that you can tune in naive bayes

In Naive Bayes, the parameters that can be tuned depend on the specific variant of Naive Bayes used. The main types are Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes. Here are some key parameters for each:

1. Gaussian Naive Bayes (GNB)

Gaussian Naive Bayes is a variant of Naive Bayes used for continuous data, assuming that features follow a Gaussian (normal) distribution. It calculates the likelihood of a feature belonging to a class using the mean and variance of the feature values, making it useful for tasks where numerical data is involved, such as image classification or medical diagnosis. 

  • var_smoothing: Adds a small value to the variance of each feature to handle numerical stability issues, especially when a feature's variance is zero or near-zero. This parameter smooths the likelihood estimates and can be adjusted for better accuracy.

2. Multinomial Naive Bayes (MNB)

Multinomial Naive Bayes is designed for discrete data, often applied to text classification where features represent the frequency or count of words. It calculates probabilities based on the frequency of words within a document and assigns the document to the class with the highest probability.  

  • alpha: This is the smoothing parameter . A small value of alpha means less smoothing, while larger values add more smoothing, which can prevent overfitting when dealing with categorical data, especially when features have zero occurrences in some classes.
  • fit_prior: A boolean parameter that learns class prior probabilities based on the training data. 

3. Bernoulli Naive Bayes (BNB)

Bernoulli Naive Bayes is tailored for binary or boolean features, where variables are present or absent. It is commonly used for binary classification tasks and assumes that each feature follows a Bernoulli distribution.

  • alpha: Similar to Multinomial Naive Bayes, alpha is the smoothing parameter that can help with zero-frequency issues in binary feature vectors.
  • binarize: This parameter thresholds the data to binary form , where any feature value greater than the threshold is set to 1, and values below or equal to the threshold are set to 0. It’s useful for data that isn’t already binary.
  • fit_prior: This parameter decides whether to learn class priors from the data or assume equal class probabilities.

Advantages and disadvantages of naive bayes

Advantages

  • Works Well with Small Data: The algorithm performs well even with smaller datasets, as it relies on probability estimates rather than large sample sizes.
  • Handles Irrelevant Features: It can handle irrelevant features well, as they don’t impact the probability calculations for each class.
  • Requires Less Data: Since Naive Bayes works based on probabilities, it doesn’t require as much data as some other algorithms to make accurate predictions.

Disadvantages

  • Assumes Feature Independence: Naive Bayes assumes that all features are independent, which is often unrealistic in real-world datasets and can lead to lower accuracy if the features are highly correlated.
  • Limited to Simple Data Structures: For continuous data, Naive Bayes assumes a normal distribution, which may not always be the case, potentially reducing model performance.
  • Sensitivity to Zero Frequency: If a category in a feature has a zero occurrence in training data, it can assign zero probability to that category during prediction. Smoothing techniques (like Laplace smoothing) are often needed to handle this.

Implementation of naive bayes in real life

Gmail

Gmail uses Naive Bayes classifiers to identify spam based on email content, sender reputation, and other features. The algorithm assigns probabilities to emails being spam or not based on keywords, phrases, and patterns observed in spam emails.


Reuters

                                                 


Reuters uses Naive Bayes to classify articles based on topics like politics, sports, finance, and technology. This classification helps users quickly find relevant articles and allows Reuters to maintain a structured, searchable archive.

Amazon



Amazon employs Naive Bayes classifiers to categorize reviews as positive, negative, or neutral. By analyzing words, phrases, and sentence structures, Naive Bayes can determine the sentiment and categorize feedback for further analysis or response.


Comments

Popular posts from this blog

PRINCIPAL COMPONENT ANALYSIS (PCA)

PRINCIPAL COMPONENT ANALYSIS (PCA) Figure 1: PCA This blogpost will bring to you the concept of principal component analysis which is one of the commonly used descriptive analysis that emphasizes of dimensionality reduction. You will learn how to implement this machine learning model in python, its advantages and disadvantages as well as how companies benefits from this machine learning model. What is PCA PCA is a statistical dimensionality-reducing technique. It takes a large set of variables and transforms them into a smaller set, retaining most of the information in the large set. This can be done by identifying the directions along which the data varies the most. These components are orthogonal to one another, capture the maximum possible variance within the data, and hence form a powerful tool for the simplification of datasets without loss of essential patterns and relationships. Concept of  PCA One of the key concepts behind PCA concerns diminishing the complexity of high-di...

LINEAR REGRESSION

 LINEAR REGRESSION Figure 1: Linear regression figure This blogpost will walk you through the concept of linear regression which is another machine learning model under the regression category of supervised learning. Introducing the parameters that you can turn while applying the logistic regression as well as the factors that play a significant impact upon the performance of the linear regression. What is linear regression Linear regression is a machine learning algorithm that could be used in predictive analysis. From predicting prices of houses to sales forecasting, linear regression is undoubtedly the first choice to many data scientists to implement within the dataset. In short, linear regression involves plotting your data on the graph base on the x and y coordinate and proceed to draw the best fit line upon the graph. The best fit line will be used as a reference to predict the independent variable in the future. However, do you have the skill to conduct a excellent analysis...

DECISION TREE

 DECISION TREE Figure 1: Decision Tree      This blogpost aims to introduce to you regarding to a machine learning model called decision trees. After reading this blogpost, you are able to deepen your knowledge on the concepts of decision trees model, its terminology, pros and cons as well as its application in real life scenarios that lends a hand in solving complex problems thus boosting the living quality of many.  What is decision tree      Imagine you’re wondering through a forest, each path branching off into multiple directions, and you need to make a series of decisions to escape the forest. Now, picture having a map that not only shows you all possible routes but also guides you on the specific conditions you encounter. Decision trees model which applies various splitting criteria's within the branches assists the user in decision making purposes. Compared to regression models which applies complex mathematical formulas like logistic regr...