It gives relation between two features or two random variables. It tells how one depends upon other. It could be positive or negative. Simple way to find covariance is to multiple variance of both the variables. Let’s discuss this with an example then it will be more clear.
Let a = [1,2,3,4,5,6,7,8,9] and b = [2,4,6,8,10,12,14,16,18]
We have to find the distance of each value from its mean.
Here dot product will be used to get result on these list of elements.
Dot product will be a*b + a*b + a*b + …
So how we will know that they are positively related or negatively related.
Let’s clarify it.
if x is positive and y is also positive then covariance is also positive.
if x is negative and y is also negative then covariance is positive.
if x is positive and y is negative then covariance is negative.
if x is negative and y is positive then covariance is negative.
There is a little problem with covariance i.e it can not clear state that how much two variables are depending on each other. For example :- 12cm is small in some cases but 12 miles is very great in the same case. So there is no well defined measure of dependence in case of covariance. So here comes the solution – Correlation.
Correlation relies between -1 and 1.
If Correlation is 1 means perfectly dependent.
If Correlation is 0 then there is no dependence.
If Correlation is -1 then there they are inversely or negatively related.
Correlation = Covariance/( standard deviation(a) * standard deviation(b))
Let’s implement Covariance and Correlation in Python
import matplotlib.pyplot as plt import numpy as np
# Covariance def diff_mean(X): xmean = np.mean(X) return [xi-xmean for xi in X] def covariance(X,Y): return np.dot(diff_mean(X),diff_mean(Y))/(len(X)-1) X = np.random.normal(1,5,100) Y = np.random.normal(2,10,100) plt.scatter(X,Y) plt.show() covariance(X,Y)
# Correlation def correlation(X,Y): xstd = np.std(X) ystd = np.std(Y) return covariance(X,Y)/np.sqrt(np.var(X)*np.var(Y)) X = np.random.rand(40) Y = 3*X+20 plt.scatter(X,Y) plt.show() correlation(X,Y)
The value of correlation as per of our formula cross 1. But that’s little error which we have to bear with. We can use numpy’s function corrcoef() which is very accurate.
It gives matrix of result of every possible pair.