Classification Algorithms in Python

Rohit Raj
Thrive in AI
Published in
4 min readMar 15, 2022

--

In my previous article, I demonstrated how to do K Means Clustering in python using Sklearn library. This article shows how to implement basic classification problems using sklearn library.

In a classification problem, we use the information contained in the data to predict the class of the sample.

First, we create synthetic data on which we will demonstrate our classification algorithms.

We can separate data into train- test split using sklearn.

  1. Logistics regression

Logistic regression, despite its name, is a classification model rather than regression model. Logistic regression is a simple and more efficient method for binary and linear classification problems. It is a classification model, which is very easy to realize and achieves very good performance with linearly separable classes. It is an extensively employed algorithm for classification in industry.

2. Random Forest Classifier

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

We can implement random forest classifier, similarly very easily using the below code

3. Gaussian Naive Bayes Classifier

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.

In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering. They require a small amount of training data to estimate the necessary parameters.

There are many types of Naive Bayes Classifier depending on assumption of probability distribution of underlying data. In Gaussian Naive Bayes classifier, underlying probability distribution is assumed to Gaussian normal distribution.

We can implement Gaussian Naive Bayes Classifier using the below code

4 Nearest Neighbours Classification

Among the various methods of supervised statistical pattern recognition, the Nearest Neighbour rule achieves consistently high performance, without a priori assumptions about the distributions from which the training examples are drawn. It involves a training set of both positive and negative cases. A new sample is classified by calculating the distance to the nearest training case; the sign of that point then determines the classification of the sample. The k-NN classifier extends this idea by taking the k nearest points and assigning the sign of the majority. It is common to select k small and odd to break ties (typically 1, 3 or 5). Larger k values help reduce the effects of noisy points within the training data set, and the choice of k is often performed through cross-validation.

We can implement nearest neighbour classifier using the following code

5 SVM Classifier

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

Support vector machines are effective in high-dimensional spaces. They are still effective in cases where number of dimensions is greater than the number of samples.

We can implement SVM classifier using the following code

6. Gradient Boosting Classifier

Gradient boosting classifiers are a group of machine learning algorithms that combine many weak learning models together to create a strong predictive model. Decision trees are usually used when doing gradient boosting.

We can implement gradient boosting classifier using the following code

Each of the above classifiers supports the following methods

We can choose the classification model using the below heuristic

Source:[1]

In high-dimensional spaces, data can more easily be separated linearly and the simplicity of classifiers such as naive Bayes and linear SVMs might lead to better generalization than is achieved by other classifiers.

In this article, we learned the various classification model available in sklearn library. If you liked my article, please like and subscribe to my newsletter.

Sources

  1. https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/
  2. https://scikit-learn.org/stable/supervised_learning.html#supervised-learning

--

--

Rohit Raj
Thrive in AI

Studied at IIT Madras and IIM Indore. Love Data Science