How to use the COVARIANCE.S function
What is the COVARIANCE.S function?
The COVARIANCE.S function calculates the sample covariance from two different datasets.
Table of Contents
1.Introduction
What is covariance?
Covariance is the average of the products of deviations for each pair in two different datasets. The covariance is positive if greater values in the first data set correspond to greater values in the second data set. The covariance is negative if greater values in the first data set correspond to smaller values in the second data set.
What is the average of the products of deviations for each pair in two different datasets?
The covariance between two datasets is computed by taking each data point, finding its deviation from its respective dataset mean by subtracting the mean, multiplying the two datasets' deviations together for each pair, and averaging these cross-products of deviations.
What is deviation?
In statistics, deviation is a measure of how far each value in a data set lies from the mean (the average of all values). A high deviation means that the values are spread out widely, while a low deviation means that they are clustered closely around the mean.
What is the mean?
It is also known as the average. It is calculated by adding up all the values in the data set and dividing by the number of values.
For example, if you have a data set of 5, 7, 9, 11, and 13, the mean is (5 + 7 + 9 + 11 + 13) / 5 = 9.
How to interpret covariance?
A positive covariance means that the variables tend to increase or decrease together, indicating a positive linear relationship.
A negative covariance means that the variables tend to move in opposite directions, indicating a negative linear relationship.
A zero covariance means that the variables are independent and have no linear relationship.
However, covariance is not a standardized measure and it depends on the scale and units of the variables. It is not easy to compare the covariances of different pairs of variables or interpret the strength of the relationship. A more common and useful measure of linear relationship is the correlation coefficient, which is the normalized version of covariance.
How to calculate normalized version of covariance?
To calculate the normalized version of covariance, which is also known as the correlation coefficient, you need to divide the covariance by the product of the standard deviations of the two variables. The standard deviation is a measure of how much the values in a data set deviate from the mean.
What is the difference between the COVARIANCE.S function and the COVARIANCE.P function?
The COVARIANCE.S function calculates the covariance based on a sample while the COVARIANCE.P function calculates the covariance based on a population.
COVARIANCE.P = Σ(x- x̄)(y- ȳ)/n
COVARIANCE.S = Σ(x- x̄)(y- ȳ)/(n-1)
x̄ - AVERAGE(array1)
ȳ - AVERAGE(array2)
n - sample size
2. COVARIANCE.S Function Syntax
COVARIANCE.S(array1, array2)
3. COVARIANCE.S Function Arguments
array1 | Required. The first data set. |
array2 | Required. The second data set. |
4. COVARIANCE.S Function Example
A city's utility company has collected data on the monthly average temperature and the total energy consumption for the entire year. Use the COVARIANCE.S function to find the relationship between temperature and energy consumption for this sample of data points. Determine if the covariance is positive or negative?
Formula in cell C18:
The COVARIANCE.S function returns a value of -1752.6 for the data on monthly average temperature and total energy consumption for an entire year. This indicates a negative covariance between the two variables.
A negative covariance value suggests an inverse relationship between the two variables. In this case, it means that as the monthly average temperature increases, the total energy consumption tends to decrease, or vice versa.
The size of the covariance value alone does not provide information about the strength of the relationship. However, a larger absolute value generally suggests a stronger inverse relationship compared to a smaller absolute value.
The covariance value of -1752 will have units that are the product of the units of measurement for temperature and energy consumption. For example, if temperature is measured in degrees Celsius and energy consumption is measured in kilowatt-hours (kWh), the covariance value would be in units of (°C × kWh).
The negative covariance value of -1752 suggests that in months with higher average temperatures, the total energy consumption tends to be lower, which aligns with the expected pattern of reduced energy usage for heating purposes during warmer months. In months with lower average temperatures the total energy consumption tends to be higher due to increased heating needs.
To better understand the strength of the relationship it is recommended to calculate the correlation coefficient. It is a standardized measure ranging from -1 to 1 and the correlation coefficient can be calculated by dividing the covariance value by the product of the standard deviations of the two variables.
Use the CORREL function to analyze how strong the correlation between the energy consumption and the temperature is. The correlation coefficient is calculated like this:
COVARIANCE.P(X, Y) / (STDEV.S(X) * STDEV.S(Y)) however, the CORREL function does this for you.
5. COVARIANCE.S Function not working
Text, logical or empty values are ignored, however, 0 (zeros) are included.
The COVARIANCE.S function returns
- #N/A error value if the number of data points in array1 and array2 is not equal.
- #DIV/0! error value if either array1 or array2 is empty.
6. How is the COVARIANCE.S Function calculated
The math formula for calculating the covariance for a sample is:
COVARIANCE.S = Σ(x- x̄)(y- ȳ)/(n-1)
x̄ - AVERAGE(array1)
ȳ - AVERAGE(array2)
n - sample size
Let's consider the following three data points:
x: 2, 4, 6
y: 3, 5, 7
Step 1: Calculate the sample means x̄ and ȳ.
x̄ = (2 + 4 + 6) / 3 = 4
ȳ = (3 + 5 + 7) / 3 = 5
Step 2: Calculate the deviations (x - x̄) and (y - ȳ) for each data point.
Data Point 1:
x - x̄ = 2 - 4 = -2
y - ȳ = 3 - 5 = -2
Data Point 2:
x - x̄ = 4 - 4 = 0
y - ȳ = 5 - 5 = 0
Data Point 3:
x - x̄ = 6 - 4 = 2
y - ȳ = 7 - 5 = 2
Step 3: Calculate the products of the deviations (x - x̄)(y - ȳ) for each data point.
Data Point 1: (x - x̄)(y - ȳ) = (-2)(-2) = 4
Data Point 2: (x - x̄)(y - ȳ) = (0)(0) = 0
Data Point 3: (x - x̄)(y - ȳ) = (2)(2) = 4
Step 4: Sum the products of the deviations.
Σ(x - x̄)(y - ȳ) = 4 + 0 + 4 = 8
Step 5: Divide the sum by (n - 1), where n is the number of data points.
Covariance = Σ(x - x̄)(y - ȳ) / (n - 1)
= 8 / (3 - 1)
= 8 / 2
= 4
The sample covariance for the given three data points (x: 2, 4, 6 and y: 3, 5, 7) is 4.
Functions in 'Statistical' category
The COVARIANCE.S function function is one of 73 functions in the 'Statistical' category.
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Contact Oscar
You can contact me through this contact form