Random Process and Linear Algebra: Unit II: Two-Dimensional Random Variables,,

Covariance Correlation and Regression

When two or more random variables are defined on a probability space, it is useful to describe how they vary together, that is, it is useful to measure the relationship between the variables. A common measure of the relationship between two random variables is the covariance. To define the covariance, we need to describe the expected value of a function of two random variables h (x, y). The definition simply extends that used for a function of a single random variable.

COVARIANCE, CORRELATION AND REGRESSION

When two or more random variables are defined on a probability space, it is useful to describe how they vary together, that is, it is useful to measure the relationship between the variables. A common measure of the relationship between two random variables is the covariance. To define the covariance, we need to describe the expected value of a function of two random variables h (x, y). The definition simply extends that used for a function of a single random variable.

i. COVARIANCE

If X and Y are random variables, then co-variance between them is defined as, [A.U N/D 2019 (R17) PS]


Note: If X and Y are independent, then Cov (X, Y) = 0

If X and Y are independent, then E [XY] = E(X). E(Y) => Cov (X, Y) = 0





ii. CORRELATION ANALYSIS

A distribution involving two variables is known as a bivariate distribution. If these two variables vary such that change in one variable affects the change in the other variable, the variables are said to be correlated. For example, there exists some relationship between the height and weight of a person, price of a commodity and its demand, rainfall and production of rice, etc. The degree of relationship between the variables under consideration is measured through the correlation analysis. The measure of correlation is called as the correlation co-efficient or correlation index. Thus, the correlation analysis refers to the techniques used in measuring the closeness of relationship between the variables.

Types of correlation

There are three important ways of classifying correlation viz.,

a. Positive and Negative

b. Simple, partial and multiple

c. Linear and non-linear.

a. Positive and Negative correlation :

If the two variables deviate in the same direction i.e., if the increase in one variable results in a corresponding increase in the other or if the decrease in one variable result in a corresponding decrease in the other, then the correlation is said to be direct or positive. For example, the correlation between the height and weight of a person, correlation between the rainfall and production of rice, etc. are positive.

If the two variables constantly deviate in the opposite directions, i.e., if the increase in one variable results in a corresponding decrease in the other or if the decrease in one variable results in a corresponding increase in the other, the correlation is said to be inverse or negative. The correlation between the price of a commodity and its demand, correlation between the volume and pressure of a perfect gas, etc., are negative.

Correlation is said to be perfect if the deviation in one variable is followed by a corresponding and proportional deviation in the other.

b. Simple, partial and multiple correlation :

If only two variables are considered for correlation analysis, it is called a simple correlation. When three or more variables are studied, it is a problem of either multiple or partial correlation.

In multiple correlation, three or more variables are studied simultaneously. For example, the study of relationship between the yield of rice per hectare and both the amount of rainfall and the usage of fertilizers is a multiple correlation.

When three or more variables are involved in correlation analysis, the correlation between the dependent variable and only one particular independent variable is called partial correlation. The influence of other independent variable is excluded.

For example, the yield of rice is related with the application of fertilizers and the rainfall. In this case, the relation of yield to fertilizer excluding the effect of rainfall, the relation of yield to rainfall excluding the usage of fertilizer are partial correlations.

c. Linear and non-linear correlation :

If the amount of change in one variable tends to bear constant ratio to the amount of change in the other variable, then the correlation is said to be linear.

For example, if


the variation between X and Y is a straight line

A correlation is said to be non-linear or curvi linear if the amount of change in one variable does not bare a constant ratio to the amount of change in the other variable. For example, if rainfall is doubled, the production of rice would not necessarily be doubled.

Methods of studying correlation :

as The following are some of the methods used for studying the correlation.

(i) Scatter diagram method

(ii) Graphic method

(iii) Karl pearson's co-efficient of correlation

(iv) Rank method

(vi) Method of least squares.

Karl Pearson's co-efficient of correlation :

Let X and Y be given random variables. The Karl Pearson's co-efficient of correlation is denoted by rXY or r(X, Y) and defined as


Note that correlation co-efficient always lies between -1 to +1.

Note: Two random variables with non zero correlation are said to be correlated.

iii. RANK CORRELATION

Let us suppose that a group of 'n' individuals is arranged in order of merit or proficiency in possession of two characteristics A and B. These ranks in the two characteristics will, in general, be different. For example, if we consider the relation between intelligence and beauty, it is not necessary that a beautiful individual is intelligent also.

If (Xi, Yi), i = 1, 2, ... n are the ranks of the individuals in two characteristics A and B respectively, then the rank correlation co-efficient is given by,


where di is the different between the ranks. This formula is called Karl Pearson's formula for the rank correlation co-efficient.

iv. REPEATED RANKS

If any two or more individuals are equal in any classification with respect to characteristic A or B, or if there is more than one item with the same value in the series then Spearman's formula for calculating the rank correlation coefficients breaks down. In this case common ranks are given to the repeated ranks. This common rank is the average of the ranks which these items would have assumed if they are slightly different from each other and the next item will get the rank next to the ranks already assumed. As a result of this, following adjustment or correction is made in the correlation formula.

In the correlation formula, we add the factor  ∑d2 where m is the number of times an item is repeated. This correction factor is to be added for each repeated value.

Example 2.2.1

Calculate the correlation co-efficient for the following heights (in inches) of fathers X and their sons Y. [A.U. N/D 2004, A/M 2015 (RP) R13]


Solution :



The correlation co-efficient of X and Y is given by,



Example 2.2.2

Find the rank correlation co-efficient from the following data:


Solution :



Example 2.2.3

Ten participants were ranked according to their performance in a mustical test by the 3 Judges in the following data.


Using rank correlation method, discuss which pair of judges has the nearest approach to common likings of music.

Solution :


The rank correlation co-efficient between X and Y is given by



The rank correlation co-efficient between Y and Z is given by


The rank correlation co-efficient between X and Z is given by


Since the rank correlation coefficient between X and Z is positive and maximum, we conclude that the pair of judges X and Z has the nearest approach to common liking in music.

Example 2.2.4

Obtain the rank correlation coefficient for the following data:


Solution :


In X series 75 is repeated twice which are in the positions 2nd and 3rd ranks. Therefore common ranks 2.5 (which is the average of 2 and 3) is to be given for each 75. Also in X series 64 is repeated thrice which are in the position 5th, 6th and 7th ranks.

Therefore common ranks 6 (which is the average of 5, 6 and 7) is to be given for each 64.

Similarly in Y series 68 is repeated twice which are in the positions 3rd and 4th ranks. Therefore common ranks 3.5 (which is the average of 3 and 4) is to be given for each 68.

Correction factors

In X series 75 is repeated twice




Example 2.2.5

The joint probability mass function of X and Y is given below.


Find the correlation coefficient of (X, Y).

Solution :





Example 2.2.6

Let X and Y be discrete R.V's with probability function f(x,y) = x+y/21, x=1,2,3; y=1,2. Find (i) Mean and Variance of X and Y. (ii) Cov (X, Y) (iii) Correlation of X and Y. [A.U N/D 2015 R13, CBT A/M 2011], [A.U A/M 2019 (R17) PQT]

Solution :





Example 2.2.7

Two random variables X and Y have the joint density  Show that cov(X, Y) = -1/144. [AU, N/D. 2004, M/J 2006, N/D 2010, Tvli A/M 2009, M/J 2010] [A.U. CBT M/J 2010] [A.U N/D 2011] [A.U A/M 2019 (R13) PQTI

Solution :

The marginal density function of X is,


Similarly, the marginal density function of Y is,



Example 2.2.8

Suppose that the 2D RVs (X, Y) has the joint p.d.f.  Obtain the correlation co-efficient between X and Y. Check whether X and Y are independent. [AU, N/D, 2003, 2004] [A.U Tvli M/J 2010] [A.U A/M 2010] [A.U CBT N/D 2011] [A.U N/D 2017 (RP) R-13]

Solution:

The marginal density function of X is given by,


The marginal density function of Y is given by,





Example 2.2.9

Let X be a random variable with p.d.f f(x) = 1/2, -1 ≤ x ≤ 1 and let Y = X2. Prove that, the correlation co-efficient between X and Y is zero.

Solution :


Example 2.2.10

Two independent random variables X and Y are defined by,


Show that U = X + Y and V = X - Y are uncorrelated. [AU A/M 2003, N/D 2012, M/J 2013]

Solution :



 

.'. U and V are uncorrelated.

Example 2.2.11

If (X, Y) is a two-dimensional random variable uniformly distributed over the triangular region R bounded by y=0, x = 3, and y =4x/3. Find the correlation coefficient rxy. [A.U.]

Solution:

(X, Y) is uniformly distributed, f (x, y) = K, constant (say)

To find the point of Xn of x = 3 and y = 4x/3


The marginal density function of X is


similarly the marginal density function of Y is


 


.'. the correlation co-efficient rXY is given by,


Example 2.2.12

Let X1 and X2 be two independent random variables with means 5 and 10 and standard deviations 2 and 3 respectively. Obtain rUV where U = 3X1 + 4X2 and V = 3X1 - X2. [A.U N/D 2019 (R17) PS]

Solution:




EXERCISE 2.2

1. For the given data, find the correlation co-efficient between X and Y.


2. Ten students got the following percentage of marks in Economics and Statistics. Calculate the correlation co-efficient.


3. Find the correlation co-efficient between sales and expenses of the following 10 firms.


4. The following marks have been obtained by a class of students in English. Find the correlation co-efficient.


5. Calculate the Karl Pearson's co-efficient of correlation from the following data.


6. Find the correlation co-efficient between X and Y given in table.


7. Find the correlation co-efficient between the height of fathers(X) and height of sons (Y) given below:


8. Find the rank correlation co-efficient of ranks of 8 candidates in Maths and English as given below :


9. Let the random variable X have the marginal density function g(x) = 1, -1/2 < x < 1//2 and let the conditional density of Y be

 S.T the variables X and Y are uncorrelated.

10. Let X and Y be random variables having joint density function  Find rXY.

11. If ƒ (x, y) = 1/8 (6 - x - y), 0 ≤ x ≤ 2, 2 ≤ y ≤ 4, find rXY.

Random Process and Linear Algebra: Unit II: Two-Dimensional Random Variables,, : Tag: : - Covariance Correlation and Regression