How to use the CORREL function
What is the CORREL function?
The CORREL function calculates the correlation between two groups of numbers.
Table of Contents
1. Introduction
What is the correlation in statistics?
Correlation in statistics refers to the relationship between two variables and how closely they vary or change together.
Correlation measures the association or co-movement between two quantitative variables. The output ranges from -1 to 1. Values close to -1 or 1 indicate strong correlation, values near 0 mean weak or no correlation. Positive correlation means variables move in same direction, negative correlation means opposite directions.
Used to identify relationships and make predictions, but not determine cause and effect alone. Does not imply causation - correlation does not necessarily mean direct causality.
What is covariance?
Covariance measures how two random variables or datasets vary together by calculating the average product of their deviations from their individual means. Positive values indicating both tend to be above or below their means together.
What is regression relationships?
Regression relationships describe the correlation between two variables. One variable is modeled as a function of the other using a regression line or curve that best represents the predictive relationship between the variables' values.
2. Syntax
CORREL(array1, array2,…)
3. Arguments
array1 | Required. The first set of numbers. |
array2 | Required. The second set of numbers. |
4. Example 1
A company wants to analyze the relationship between the temperature outside and their ice cream sales. The data collected is as follows:
Date | Temperature | Ice cream |
Jun 1 | 97 | 45 |
Jun 2 | 79 | 24 |
Jun 3 | 70 | 21 |
Jun 4 | 87 | 36 |
Jun 5 | 72 | 19 |
Jun 6 | 92 | 39 |
Jun 7 | 76 | 23 |
What is the correlation between ice cream sales and the temperature?
The image above shows the calculation in cell B28:
The first argument contains the first set of numbers, in this example cell range C3:C9. The second set of numbers are in cell range D3:D9. The output in cell B28 is approx. 0.9831
The guidelines for interpreting the strength of correlation based on the value of the correlation coefficient:
0.0 to 0.3: Weak or no correlation
0.3 to 0.5: Moderate correlation
0.5 to 0.7: Strong correlation
0.7 to 1.0: Very strong correlation
The value in cell B12 shows that there is a very strong correlation between ice cream sales and temperature. The closer the value is to 1, the stronger the positive correlation.
If the correlation coefficient is positive (above zero) indicates a positive linear relationship between the two variables. This means that as one variable increases, the other variable tends to increase as well.
The chart above shows the ice cream sales on the y-axis and the temperature on the x-axis. If we evaluate the dots on the scatter plot, a clear positive linear relationship between the two variables is apparent.
A strong correlation between two variables does not necessarily mean that one variable causes the other. It simply indicates that the variables tend to move together in a linear fashion.
5. Example 2
A researcher wants to analyze the relationship between the amount of time students spend studying (in hours) and their final exam scores. The data collected is as follows:
Study Time (hours): 2, 4, 6, 3, 5, 7, 8, 6, 4, 9
Final Exam Scores: 65, 90, 75, 79, 70, 85, 79, 87, 76, 96
Find if there is a significant correlation between study time and final exam scores?
The image shows the calculation
in cell E19, which is used to find the correlation coefficient between two sets of data.
The first set of data is contained in the cell range B19:B28, and the second set is in the cell range C19:C28. The output in cell E19 is approximately 0.6015.
The guidelines for interpreting the strength of the correlation based on the value of the correlation coefficient are as follows:
0.0 to 0.3: Weak or no correlation
0.3 to 0.5: Moderate correlation
0.5 to 0.7: Strong correlation
0.7 to 1.0: Very strong correlation
The value in cell E19, which is approximately 0.6015, indicates a strong correlation between the two sets of data.
A positive correlation coefficient (above zero) suggests a positive linear relationship between the two variables. This means that as one variable increases, the other variable tends to increase as well.
The chart in the image displays the exam score on the y-axis and the number of hours on the x-axis. By evaluating the dots on the scatter plot, a clear positive linear relationship between the two variables is evident.
6. Example 3
A meteorologist wants to study the relationship between the average daily temperature (in Celsius) and the amount of rainfall (in millimeters) for a particular region. The data collected is as follows:
Avg daily temp: | Rainfall (mm) |
22 | 10 |
18 | 15 |
25 | 5 |
20 | 20 |
28 | 8 |
23 | 12 |
26 | 6 |
19 | 18 |
21 | 9 |
20 | 14 |
Determine the correlation between temperature and rainfall?
The image above shows the calculation in cell E19:
The first argument contains the first set of numbers, in this example cell range B19:B28. The second set of numbers are in cell range C19:C28. The output in cell E19 is approx. -0.7849
The guidelines for interpreting the strength of correlation still apply regardless of whether the correlation coefficient is positive or negative. The absolute value of the correlation coefficient determines the strength of the correlation, while the sign (positive or negative) indicates the direction of the relationship.
0.0 to 0.3: Weak or no correlation
0.3 to 0.5: Moderate correlation
0.5 to 0.7: Strong correlation
0.7 to 1.0: Very strong correlation
The value in cell E19 shows that there is a strong correlation between study time and exam score. The closer the value is to -1, the stronger the negative correlation.
If the correlation coefficient is negative (below zero) indicates a negative or inverse linear relationship between the two variables. This means that as one variable decreases, the other variable tends to decrease as well.
The chart above shows the rainfall on the y-axis and the average daily temperature on the x-axis. If we evaluate the dots on the scatter plot, a negative linear relationship between the two variables is clearly visible.
7. Function not working
Text, boolean and empty cells are ignored. This can be good and bad, this is bad if you have numbers stored as text and you don't realize it.
#N/A! error is returned if the arguments don't contain the same number of values, in other words, the size of the cell ranges are not equal.
7.1 Troubleshooting the error value
When you encounter an error value in a cell a warning symbol appears, displayed in the image above. Press with mouse on it to see a pop-up menu that lets you get more information about the error.
- The first line describes the error if you press with left mouse button on it.
- The second line opens a pane that explains the error in greater detail.
- The third line takes you to the "Evaluate Formula" tool, a dialog box appears allowing you to examine the formula in greater detail.
- This line lets you ignore the error value meaning the warning icon disappears, however, the error is still in the cell.
- The fifth line lets you edit the formula in the Formula bar.
- The sixth line opens the Excel settings so you can adjust the Error Checking Options.
Here are a few of the most common Excel errors you may encounter.
#NULL error - This error occurs most often if you by mistake use a space character in a formula where it shouldn't be. Excel interprets a space character as an intersection operator. If the ranges don't intersect an #NULL error is returned. The #NULL! error occurs when a formula attempts to calculate the intersection of two ranges that do not actually intersect. This can happen when the wrong range operator is used in the formula, or when the intersection operator (represented by a space character) is used between two ranges that do not overlap. To fix this error double check that the ranges referenced in the formula that use the intersection operator actually have cells in common.
#SPILL error - The #SPILL! error occurs only in version Excel 365 and is caused by a dynamic array being to large, meaning there are cells below and/or to the right that are not empty. This prevents the dynamic array formula expanding into new empty cells.
#DIV/0 error - This error happens if you try to divide a number by 0 (zero) or a value that equates to zero which is not possible mathematically.
#VALUE error - The #VALUE error occurs when a formula has a value that is of the wrong data type. Such as text where a number is expected or when dates are evaluated as text.
#REF error - The #REF error happens when a cell reference is invalid. This can happen if a cell is deleted that is referenced by a formula.
#NAME error - The #NAME error happens if you misspelled a function or a named range.
#NUM error - The #NUM error shows up when you try to use invalid numeric values in formulas, like square root of a negative number.
#N/A error - The #N/A error happens when a value is not available for a formula or found in a given cell range, for example in the VLOOKUP or MATCH functions.
#GETTING_DATA error - The #GETTING_DATA error shows while external sources are loading, this can indicate a delay in fetching the data or that the external source is unavailable right now.
7.2 The formula returns an unexpected value
To understand why a formula returns an unexpected value we need to examine the calculations steps in detail. Luckily, Excel has a tool that is really handy in these situations. Here is how to troubleshoot a formula:
- Select the cell containing the formula you want to examine in detail.
- Go to tab “Formulas” on the ribbon.
- Press with left mouse button on "Evaluate Formula" button. A dialog box appears.
The formula appears in a white field inside the dialog box. Underlined expressions are calculations being processed in the next step. The italicized expression is the most recent result. The buttons at the bottom of the dialog box allows you to evaluate the formula in smaller calculations which you control. - Press with left mouse button on the "Evaluate" button located at the bottom of the dialog box to process the underlined expression.
- Repeat pressing the "Evaluate" button until you have seen all calculations step by step. This allows you to examine the formula in greater detail and hopefully find the culprit.
- Press "Close" button to dismiss the dialog box.
There is also another way to debug formulas using the function key F9. F9 is especially useful if you have a feeling that a specific part of the formula is the issue, this makes it faster than the "Evaluate Formula" tool since you don't need to go through all calculations to find the issue..
- Enter Edit mode: Double-press with left mouse button on the cell or press F2 to enter Edit mode for the formula.
- Select part of the formula: Highlight the specific part of the formula you want to evaluate. You can select and evaluate any part of the formula that could work as a standalone formula.
- Press F9: This will calculate and display the result of just that selected portion.
- Evaluate step-by-step: You can select and evaluate different parts of the formula to see intermediate results.
- Check for errors: This allows you to pinpoint which part of a complex formula may be causing an error.
The image above shows the cell references on the worksheet while editing the formula in the formula bar. The colors of the cell references correspond to the rectangles (cell ranges) on the worksheet. The CORREL function the same number of values in both arrays which is not the case in this example. We have found what is wrong with the formula.
Tips!
- View actual values: Selecting a cell reference and pressing F9 will show the actual values in those cells.
- Exit safely: Press Esc to exit Edit mode without changing the formula. Don't press Enter, as that would replace the formula part with the calculated value.
- Full recalculation: Pressing F9 outside of Edit mode will recalculate all formulas in the workbook.
Remember to be careful not to accidentally overwrite parts of your formula when using F9. Always exit with Esc rather than Enter to preserve the original formula. However, if you make a mistake overwriting the formula it is not the end of the world. You can “undo” the action by pressing keyboard shortcut keys CTRL + z or pressing the “Undo” button
7.3 Other errors
Floating-point arithmetic may give inaccurate results in Excel - Article
Floating-point errors are usually very small, often beyond the 15th decimal place, and in most cases don't affect calculations significantly.
8. How is the function calculated?
The CORREL function calculates the correlation between two groups of numbers. To calculate the correlation follow these steps:
-
- Calculate the mean of group of numbers named:
x="Temp"
y="Icecream"
For example:
Mean of X = x̄ is calculated in cell C10
Mean of Y = ȳ is calculated in cell D10 - For each data point xi and yi calculate the deviations from the mean.
Deviation of xi = xi - x̄ are calculated in cells E3:E9
Deviation of yi = yi - ȳ are calculated in cells F3:F9 - Multiply the deviations between each data point pair to get their products.
For each pair: (xi - x̄) * (yi - ȳ) are calculated in cells G3:G9 - Sum all the deviation products.
S = Σ(xi - x̄)(yi - ȳ) calculated in cell G10 - Calculate the sum of squared deviations of each dataset.
SSx = Σ(xi - x̄)2 calculated in cell H10
SSy = Σ(yi - ȳ)2 calculated in cell I10 - The correlation is the sum of products divided by the square roots of the sum of squares.
Correlation = S / √(SSx * SSy) calculated in cell I13
- Calculate the mean of group of numbers named:
These steps calculates the correlation coefficient and the same number is shown in cell B14 which is the output from the CORREL function based on cells C3:C9 and D3:D9.
Functions in 'Statistical' category
The CORREL function function is one of 73 functions in the 'Statistical' category.
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Contact Oscar
You can contact me through this contact form