top of page

Correlation In Python

Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. In Python, the correlation between two variables can be calculated using the pearsonr() function from the scipy library.

 

The pearsonr() function takes two arrays as input and returns two values: the Pearson correlation coefficient (a value between -1 and 1 that indicates the strength and direction of the relationship) and a p-value (which indicates the probability of observing the correlation by chance).

A positive correlation means that as the value of one variable increases, so does the value of the other variable. A negative correlation means that as the value of one variable increases, the value of the other variable decreases.

Here is an example of how to calculate the correlation between two variables in Python:

from scipy.stats import pearsonr

 

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

 

correlation, p_value = pearsonr(x, y)

 

print("The Pearson correlation coefficient is:", correlation)
print("The p-value is:", p_value)

 

Output:

The Pearson correlation coefficient is: 1.0
The p-value is: 0.0

 

In this example, the Pearson correlation coefficient is 1, which indicates a perfect positive correlation between x and y. The p-value is 0, which indicates that the correlation is statistically significant.

bottom of page