Covariance And Correlation

Covariance

It gives relation between two features or two random variables. It tells how one depends upon other. It could be positive or negative. Simple way to find covariance is to multiple variance of both the variables. Let’s discuss this with an example then it will be more clear.

Let a = [1,2,3,4,5,6,7,8,9] and b = [2,4,6,8,10,12,14,16,18]

We have to find the distance of each value from its mean.

Here dot product will be used to get result on these list of elements.

Dot product will be a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + …

So how we will know that they are positively related or negatively related.

Let’s clarify it.

x*y then

if x is positive and y is also positive then covariance is also positive.

if x is negative and y is also negative then covariance is positive.

if x is positive and y is negative then covariance is negative.

if x is negative and y is positive then covariance is negative.

There is a little problem with covariance i.e it can not clear state that how much two variables are depending on each other. For example :- 12cm is small in some cases but 12 miles is very great in the same case. So there is no well defined measure of dependence in case of covariance. So here comes the solution – Correlation.

Correlation

Correlation relies between -1 and 1.

If Correlation is 1 means perfectly dependent.

If Correlation is 0 then there is no dependence.

If Correlation is -1 then there they are inversely or negatively related.

Correlation = Covariance/( standard deviation(a) * standard deviation(b))

Let’s implement Covariance and Correlation in Python

Covariance

import matplotlib.pyplot as plt
import numpy as np
# Covariance

def diff_mean(X):
    xmean = np.mean(X)
    return [xi-xmean for xi in X]

def covariance(X,Y):
    return np.dot(diff_mean(X),diff_mean(Y))/(len(X)-1)

X = np.random.normal(1,5,100)
Y = np.random.normal(2,10,100)

plt.scatter(X,Y)
plt.show()
covariance(X,Y)
-5.556093998005411

Correlation

# Correlation

def correlation(X,Y):
    xstd = np.std(X)
    ystd = np.std(Y)
    return covariance(X,Y)/np.sqrt(np.var(X)*np.var(Y))

X = np.random.rand(40)
Y = 3*X+20

plt.scatter(X,Y)
plt.show()

correlation(X,Y)
1.0256410256410258

The value of correlation as per of our formula cross 1. But that’s little error which we have to bear with. We can use numpy’s function corrcoef() which is very accurate.

np.corrcoef(X,Y)
array([[1., 1.],        
[1., 1.]])

It gives matrix of result of every possible pair.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s