Getting ready

To prepare for this recipe, do the following:

  1. Open Power BI Desktop.
  2. Use an Enter Data query to create a table called R08_Sales that contains the following data:

Quarter

Year

Sales

1

2013

47

2

2013

49

3

2013

53

4

2013

44

1

2014

43

2

2014

55

3

2014

58

4

2014

48

1

2015

44

2

2015

43

3

2015

58

4

2015

49

  1. Use an Enter Data query to create a second table called R08_Wages that contains the following data:

Quarter

Year

Wages

4

2012

19.25

1

2013

18.5

2

2013

17.75

3

2013

17.7

4

2013

18

1

2014

20

2

2014

18.75

3

2014

18.7

4

2014

18.25

1

2015

17

2

2015

18.75

3

2015

19

4

2015

20.25

1

2016

19

 

 

  1. Use an Enter Data query to create a third table called R08_Estimates that contains the following data:

Quarter

Year

1

2013

2

2013

3

2013

4

2013

1

2014

2

2014

3

2014

4

2014

1

2015

2

2015

3

2015

4

2015

1

2016

2

2016

  1. Create the following column in the R08_Sales table:
YearQuarter = 'R08_Sales'[Year] & 'R08_Sales'[Quarter]
  1. Create the following columns in the R08_Wages table:
YearQuarter = 'R08_Wages'[Year] & 'R08_Wages'[Quarter]

YearQuarterFuture =
IF('R08_Wages'[Quarter] = 4,
'R08_Wages'[Year]+1 & "1",
'R08_Wages'[Year] & 'R08_Wages'[Quarter]+1
)
  1. Create the following columns in the R08_Estimates table:
YearQuarter = 'R08_Estimates'[Year] & 'R08_Estimates'[Quarter]

YearQuarterPrevious =
IF(
'R08_Estimates'[Quarter] = 1,
'R08_Estimates'[Year]-1 & "4",
'R08_Estimates'[Year] & 'R08_Estimates'[Quarter]-1
)

YearQuarter2Previous =
SWITCH(TRUE(),
'R08_Estimates'[Quarter] = 1,'R08_Estimates'[Year]-1 & "3",
'R08_Estimates'[Quarter] = 2,'R08_Estimates'[Year]-1 & "4",
'R08_Estimates'[Year] & 'R08_Estimates'[Quarter]-2
)
  1. Create a relationship between the YearQuarter column in the R08_Sales table and the YearQuarterFuture column in the R08_Wages table. The relationship should be 1:1 and have a Cross filter direction of Both. Ensure that there are no relationships between the R08_Sales table and any other table. Ensure that there are no relationships between the R08_Wages table and any other table.
  2. Ensure that there are no relationships between the R08_Estimates table and any other table.
  1. Create the following measure:
Power BI's Correlation = 
VAR __CORRELATION_TABLE = VALUES('R08_Sales'[YearQuarter])
VAR __COUNT =
COUNTX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('R08_Sales'[Sales]) * SUM('R08_Wages'[Wages]))
)
VAR __SUM_X =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('R08_Sales'[Sales]))
)
VAR __SUM_Y =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('R08_Wages'[Wages]))
)
VAR __SUM_XY =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('R08_Sales'[Sales]) * SUM('R08_Wages'[Wages]) * 1.)
)
VAR __SUM_X2 =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('R08_Sales'[Sales]) ^ 2)
)
VAR __SUM_Y2 =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('R08_Wages'[Wages]) ^ 2)
)
RETURN
DIVIDE(
__COUNT * __SUM_XY - __SUM_X * __SUM_Y * 1.,
SQRT(
(__COUNT * __SUM_X2 - __SUM_X ^ 2)
* (__COUNT * __SUM_Y2 - __SUM_Y ^ 2)
)
)

  1. Create the following additional measure:
Daniil's Correlation Coefficient = 
VAR Correlation_Table =
FILTER (
ADDCOLUMNS (
VALUES ( 'R08_Sales'[YearQuarter]),
"Value_X", CALCULATE ( SUM('R08_Sales'[Sales]) ),
"Value_Y", CALCULATE ( SUM('R08_Wages'[Wages]) )
),
AND (
NOT ( ISBLANK ( [Value_X] ) ),
NOT ( ISBLANK ( [Value_Y] ) )
)
)
VAR Count_Items = COUNTROWS ( Correlation_Table )
VAR Sum_X = SUMX ( Correlation_Table, [Value_X] )
VAR Sum_X2 = SUMX ( Correlation_Table, [Value_X] ^ 2 )
VAR Sum_Y = SUMX ( Correlation_Table, [Value_Y] )
VAR Sum_Y2 = SUMX ( Correlation_Table, [Value_Y] ^ 2 )
VAR Sum_XY = SUMX ( Correlation_Table, [Value_X] * [Value_Y] )
VAR Pearson_Numerator = Count_Items * Sum_XY - Sum_X * Sum_Y
VAR Pearson_Denominator_X = Count_Items * Sum_X2 - Sum_X ^ 2
VAR Pearson_Denominator_Y = Count_Items * Sum_Y2 - Sum_Y ^ 2
VAR Pearson_Denominator = SQRT ( Pearson_Denominator_X * Pearson_Denominator_Y )
RETURN
DIVIDE ( Pearson_Numerator, Pearson_Denominator )
  1. On a Report page, create a Card visualization and place the Power BI's Correlation measure into the Fields area of the visualization. Notice that the visualization is broken or provides a nonsensical number that's not between -1 and 1.
  2. On the same Report page, create a second Card visualization and place the Daniil's Correlation Coefficient measure into the Fields area of the visualization. You should get a result of about .54 (.5362). If you get a value of 1, make sure that you have the measure set to the data type of a Decimal number and not a Whole number.

Here, we can see the issue with Power BI's built-in implementation of Pearson's Correlation Coefficient (Power BI's Correlation): it does not handle correlations between unbalanced tables (tables with different numbers of rows). Also, we can see that by using Daniil's implementation of Pearson's Correlation Coefficient (Daniil's Correlation Coefficient), the relationship between the Sales column in the R08_Sales table and the Wages column in the R08_Wages table does not appear to be very strong (.54).

The data and its setup presents a scenario. The scenario is that a retail business believes that the real, hourly average wage (represented by the R08_Wages table) is a 3-month leading indicator for their sales (represented by the R08_Sales table). A 3-month leading indicator means that when wages go up, the businesses' sales go up 3 months later. The business wishes to test if there is a correlation between real, hourly average wages and their sales. Furthermore, the business wishes to use this correlation, if it exists, to forecast sales for one quarter in the future.

Because we believe that wages are a 3-month leading indicator for sales, we linked the YearQuarter column in the R08_Sales table with the YearQuarterFuture column in the R08_Wages table. This relationship effectively means that we are comparing a particular quarter's sales with the previous quarter's wages.