Skip to main content

Standard Deviation And Variance

  

Standard Deviation : 

Standard deviation is a number that describes how spread out the values are.

A low standard deviation means that most of the numbers are close to the mean (average) value.

A high standard deviation means that the values are spread out over a wider range.

Example: This time we have registered the speed of 7 cars:

speed = [86,87,88,86,87,85,86]

The standard deviation is: 0.9

Meaning that most of the values are within the range of 0.9 from the mean value, which is 86.4.


Let us do the same with a selection of numbers with a wider range:

speed = [32,111,138,28,59,77,97]

The standard deviation is: 37.85

Meaning that most of the values are within the range of 37.85 from the mean value, which is 77.4.

As you can see, a higher standard deviation indicates that the values are spread out over a wider range.

The NumPy module has a method to calculate the standard deviation: 

import numpy

speed = [86,87,88,86,87,85,86]

x = numpy.std(speed)

print(x)
output : 0.9035079029052513
import numpy

speed = [32,111,138,28,59,77,97]

x = numpy.std(speed)

print(x)
output : 37.84501153334721 





Variance : 

Variance is another number that indicates how spread out the values are.

In fact, if you take the square root of the variance, you get the standard deviation!

Or the other way around, if you multiply the standard deviation by itself, you get the variance!

To calculate the variance you have to do as follows:

1. Find the mean:

(32+111+138+28+59+77+97) / 7 = 77.4

2. For each value: find the difference from the mean:

 32 - 77.4 = -45.4
111 - 77.4 =  33.6
138 77.4 =  60.6
 28 - 77.4 = -49.4
 59 - 77.4 = -18.4
 77 77.4 = - 0.4
 97 - 77.4 =  19.6

3. For each difference: find the square value:

(-45.4)2 = 2061.16
 (33.6)2 = 1128.96
 (60.6)2 = 3672.36
(-49.4)2 = 2440.36
(-18.4)2 =  338.56
(- 0.4)2 =    0.16
 (19.6)2 =  384.16

4. The variance is the average number of these squared differences:

(2061.16+1128.96+3672.36+2440.36+338.56+0.16+384.167 = 1432.2

Luckily, NumPy has a method to calculate the variance:

Use the NumPy var() method to find the variance:

import numpy

speed = [32,111,138,28,59,77,97]

x = numpy.var(speed)

print(x)

Output : 1432.2448979591834


Scene from Cinema Paradiso


Comments

Popular posts from this blog

Normalization

 What is Normalization? Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging  between 0 and 1. It is also known as Min-Max scaling. Here’s the formula for normalization: Here, Xmax and Xmin are the maximum and the minimum values of the feature respectively. When the value of X is the minimum value in the column, the numerator will be 0, and hence X’ is 0. On the other hand, when the value of X is the maximum value in the column, the numerator is equal to  the denominator and thus the value of X’ is 1 If the value of X is between the minimum and the maximum value, then the value of X’ is between 0 and  1

Fit vs. Transform

  Fit vs. Transform in SciKit libraries for Machine Learning We have seen methods such as fit(), transform(), and fit_transform() in a lot of SciKit’s libraries. And almost all tutorials, including the ones I’ve written, only tell you to just use one of these methods. The obvious question that arises here is, what do those methods mean? What do you mean by fit something and transform something? The transform() method makes some sense, it just transforms the data, but what about fit()? In this post, we’ll try to understand the difference between the two. To better unders t and the meaning of these methods, we’ll take the Imputer class as an example, because the Imputer class has these methods. But before we get started, keep in mind that fitting something like an imputer is different from fitting a whole model. You use an Imputer to handle the missing value in dataset. Imputer gives you easy methods to replace NaNs and blanks with something like the mean of the column or even m...