Fit vs. Transform in SciKit libraries for Machine Learning We have seen methods such as fit(), transform(), and fit_transform() in a lot of SciKit’s libraries. And almost all tutorials, including the ones I’ve written, only tell you to just use one of these methods. The obvious question that arises here is, what do those methods mean? What do you mean by fit something and transform something? The transform() method makes some sense, it just transforms the data, but what about fit()? In this post, we’ll try to understand the difference between the two. To better unders t and the meaning of these methods, we’ll take the Imputer class as an example, because the Imputer class has these methods. But before we get started, keep in mind that fitting something like an imputer is different from fitting a whole model. You use an Imputer to handle the missing value in dataset. Imputer gives you easy methods to replace NaNs and blanks with something like the mean of the column or even m...
What is Standardization? Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation. Here’s the formula for standardization: μ is the mean of the feature values and σ is the standard deviation of the feature values. Note that in this case, the values are not restricted to a particular range