K-NEAREST NEIGHBOUR (KNN)
| Figure 1: KNN |
What is KNN
Concept of KNN
Parameters that you can tune for KNN
Number of Neighbors (k):
- Number of the closest neighbors against which a prediction is made. The value of k is very critical in the model. If the value of k is too small, then it makes the model noise-sensitive; and if k is large, then it will smooth out the predictions too much. Common practice is to try various k values and use cross-validation to determine the best one.
Distance Metric:
- It measures the distance between data points.
- Options include:
- Euclidean distance: Straight line distance between two points in Euclidean space
- Manhattan distance: Distance between two points is the sum of the absolute difference of their coordinates
- Minkowski distance: Generalization of Euclidean and Manhattan distances
- Hamming distance: Number of positions at which the corresponding elements of two vectors differ
- Mahalanobis distance: taking into account the correlations between variables
Weight Function (Weights):
Determines how the neighbors are weighted in the prediction.
- Options include:
- Uniform:
Neighbors are weighted equally.
- Distance: Closer neighbors are given more weight.
Algorithm:
- The algorithm used to compute the nearest neighbors.
- Options include:
- Brute-force: Compute distance between every pair of points in a straightforward way.
- Ball Tree: BallTree or in the Pedregosa et al. BallTree utilizes a binary tree structure to partition the data points.
- KD Tree: It uses a k-dimensional tree to partition the data points.
- Auto: It automatically selects the best algorithm based on the input dataset.
Leaf Size
- It is the number of points at which the algorithm has to switch to brute-force. It affects the speed and memory required to build the tree structures. More significant leaf sizes generally make the tree building process faster but can slow down the query times.
p (for Minkowski Distance):
- Minkowski distance metric power parameter
- Options include:
- p=1: Equivalent to Manhattan distance.
- p=2: Equivalent to Euclidean distance.
Metric Parameters (metric_params):
n_jobs:
- The number of parallel jobs to run for neighbors search. Setting n_jobs to -1 uses all available processors, which can speed up the computation.
Measure types
- Types of measures that can be used to determine the "closeness" or similarity between data points.
Options include:
- MixedMeasures: Many different types of measures on different kinds of attributes in a data set
- NominalMeasures: Measures of dissimilarity between categories
- NumericalMeasures: Are used for continuous or ordinal data for which the mathematical notions of addition and subtraction make sense
- BregmanDivergences: A family of measures of the difference between two probability distributions.
Implementation of KNN in python
| Figure 4: Importing dataset |
| Figure 5:Overview of dataset |
| Figure 6: Understanding value counts |
| Figure 7: Separating dependent and independent variables |
| Figure 8: Importing necessary libraries |
| Figure 9: LabelEncoder |
| Figure 10: Splitting training and testing data |
| Figure 11: Applying the model |
| Figure 12: Getting Accuracy Score |
| Figure 13: Accuracy score |
| Figure 14: Classification Report |
Advantages and disadvantages of KNN
Advantages
- KNN is simple to be understood and implemented. It often serves as a baseline model that is compared against while using some more complex algorithm.
- KNN is a lazy learning algorithm, since it does not require an explicit training phase. Actually, the algorithm just stores the training data and then computes while classifying.
- KNN does not make any assumptions on the underlying distribution of data. Hence it becomes versatile and can be applied for a wide range of problems.
Disadvantages
- Since the KNN needs comparison with each data point in the prediction phase, it becomes really slow for large data sets.
- Keeping the whole training dataset consumes a lot of memory, which may be very expensive if your dataset is huge.
- When the number of features gets larger, the concept of distance between points becomes less and less meaningful, and accuracy is reduced with performance. Feature selection/dimensionality reduction techniques are often needed.
Application of KNN in real life
Telecommunications

Figure 17: Verizon
Verizon uses KNN to predict customer churn. KNN analyzes customer usage patterns and behaviors to identify those likely to leave, allowing proactive retention strategies.
Automotive
Figure 18: Tesla
Tesla uses KNN in its Autopilot system for object detection and classification. KNN helps in identifying objects on the road, aiding the vehicle's autonomous navigation.
Entertainment
![]() |
| Figure 17: Verizon |
Automotive
![]() |
| Figure 18: Tesla |
Entertainment

Comments
Post a Comment