Skip to main content

Feature Scaling

Why Should We Do Feature Scaling?

The first question we need to address – why do we need to scale the variables in our dataset? Some
machine learning algorithms are sensitive to feature scaling while others are virtually invariant to it. Let me explain that in more detail.




Gradient Descent Based Algorithms :

Machine learning algorithms like linear regression, logistic regression, neural network, etc. that use
gradient descent as an optimization technique require data to be scaled.


Distance-Based Algorithms : 

Distance algorithms like KNN, K-means, and SVM are most affected by the range of features. This is
because behind the scenes
they are using distances between data points to determine their similarity.

Tree-Based Algorithms : 

Tree-based algorithms, on the other hand, are fairly insensitive to the scale of the features. Think about it, a decision tree is only splitting a node based on a single feature. The decision tree splits a node on a
feature that increases the homogeneity of the node. This split on a feature is not influenced by other
features. 
            So, there is virtually no effect of the remaining features on the split. This is what makes them invariant to the scale of the features!


Scene from Cinema Paradiso



Comments

Popular posts from this blog

Batch and Online Learning

  It is the criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data. Batch learning In batch learning , the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning . If you want a batch learning system to know about new data (such as a new type of spam), you need to train a new version of the system from scratch on the full dataset (not just the new data, but also the old data), then stop the old system and replace it with the new one. Fortunately, the whole process of training, evaluating, and launching a Machine Learning system can be automated fairly easily (as shown in Figure 1-3 ), so even a ba...

Standard Deviation And Variance

   Standard Deviation :  Standard deviation is a number that describes how spread out the values are. A low standard deviation means that most of the numbers are close to the mean (average) value. A high standard deviation means that the values are spread out over a wider range. Example: This time we have registered the speed of 7 cars: speed = [ 86 , 87 , 88 , 86 , 87 , 85 , 86 ] The standard deviation is:  0.9 Meaning that most of the values are within the range of 0.9 from the mean value, which is 86.4. Let us do the same with a selection of numbers with a wider range: speed = [ 32 , 111 , 138 , 28 , 59 , 77 , 97 ] The standard deviation is:  37.85 Meaning that most of the values are within the range of 37.85 from the mean value, which is 77.4. As you can see, a higher standard deviation indicates that the values are spread out over a wider range. The NumPy module has a method to calculate the standard deviation:  import  numpy speed = [ 86 , 87 , 8...

Normalization

 What is Normalization? Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging  between 0 and 1. It is also known as Min-Max scaling. Here’s the formula for normalization: Here, Xmax and Xmin are the maximum and the minimum values of the feature respectively. When the value of X is the minimum value in the column, the numerator will be 0, and hence X’ is 0. On the other hand, when the value of X is the maximum value in the column, the numerator is equal to  the denominator and thus the value of X’ is 1 If the value of X is between the minimum and the maximum value, then the value of X’ is between 0 and  1