We start by adding four columns to our main fact table, R06_Table. The first column, Count, is set to a static value of 1 for all rows in the table. This may seem superfluous, but it will make certain later calculations a bit easier and more consistent. The second column, Event, is set to 1 if the employee has reached an end event (termination) and set to 0 if they have not. The Event column is our indicator of whether or not the employee is still employed by the organization. The third column is the number of days of tenure with the organization, Days. In the event that an employee has reached an end event, the Days column is calculated to be the difference between the End Date for the employee and the Hire Date for the employee. Otherwise, the Days column is the difference between the current date as returned by the TODAY function and the Hire Date for the employee. The fourth column, DeptDateKey, provides a key column that will later be used to relate the R06_KM table. The DeptDateKey column is created by using the CONCATENATE function to join the Department column with the Days column, with the latter column being formatted, using the FORMAT function, to always have four digits with leading zeros if necessary.
With operations on our base table, R06_Table, completed, our next step is to construct a new table that will become the basis for our Kaplan-Meier estimator calculations. This is necessary in order to create the running product that the Kaplan-Meier estimator function requires. Because we desire to compare estimations for two distinct segments of our population, we create two similar table variables, __DeptA and __DeptB, and use the UNION function to combine these two table variables into a single table. For both the __DeptA and __DeptB variables, we start by using GENERATESERIES to create a table that contains all the possible days of tenure for an employee from 0 to the maximum number of tenure days listed in the Days column in the R06_Table table. To this table of values returned by the GENERATESERIES function, we add a column, Department, that specifies the appropriate segment of our population as well as a similar DeptDateKey formula as used previously to create a unique key per row of the table. Importantly, because of the use of our population segmentation column, Department, within DeptDateKey, this key remains unique within the R06_KM table, even after a union of the two table variables, __DeptA and __DeptB.
We now have the base table required for our Kaplan-Meier estimator calculations. These Kaplan-Meier calculations consist of five columns. The first column, Count, simply records the number of the segment of the population that has reached the particular tenure milestone, Days. This is accomplished by essentially counting the related records for the segment of the population in our base table, R06_Table, only we use the Count column created previously in the R06_Table table, which is always 1, and use SUMX to sum the values in the column. The next column, Running Count, keeps a running total of the Count column in the R06_KM table. This is essentially a running total of all employees that have a tenure that is less than or equal to the current tenure as represented in the Days column for each row. This is achieved by using the EARLIER function to FILTER the R06_KM table to only the rows that have a tenure, Days, that is less than or equal to the current row's value for the Days column and have the same segmentation, Department, as the current row.
The third column, d(i), stores the number of employees that have exited the segmented population at the tenure denoted by the Days column and is the numerator in our Kaplan-Meier estimator equation. The d(i) column is calculated by using SUMX to sum the Event column of related records in the base table, R06_Table. Recall that the Event column in the R06_Table table contains a 1 if an end event has occurred and 0 if not. Thus, the d(i) column becomes the number of non-surviving members of the population segment for each tenure denoted by the Days column. Our fourth column, n(i), stores the number of employees still surviving within the segmented population at the tenure denoted by the Days column and is the denominator in our Kaplan-Meier estimator equation. The n(i) column is calculated by adding our Count column to the total original population and then subtracting our Running Count column. We can now finally calculate the value of our Kaplan-Meier function at each tenure of Days by creating our fifth column, 1-d(i)/n(i), which simply implements the Kaplan-Meier estimation function using our previous created columns.
Now that we have the Kaplan-Meier estimation function calculated for every possible time (t), our final step is to implement the running product. This is the purpose of the Survivability measure. The Survivability measure is calculated similarly to the Running Total quick measure in Power BI's DAX Quick Measures gallery. Using the CALCULATE function, we return the PRODUCT of the 1-d(i)/n(i) column multiplied across all rows in the R06_KM table that have a value for the Days column that is less than or equal to the current maximum value for the Days column within the current context.