A data driven approach to understanding employee turnover
Chief Scientist, Culture Amp
Data can help you understand why people are leaving your organization and give you an opportunity to address issues in the future more proactively. By linking your employee feedback data with your churn data you can get a sense of what things might cause people to leave in the future and where problem areas (e.g. departments, functions or demographics) might arise in your organization.
If you want to look at why people are leaving your organization, this four-step approach gives you a data-driven way to understand and address churn.
1. Clean your data and focus on the right people
First, you need to clean your data and ensure that you have the right people in your data set.
The most important thing is to focus on and understand people who left voluntarily and regrettably. These are the people you want to understand better (because you wanted them to stay).
You don’t want to muddy this data with people who left involuntarily, were being performance managed or who were moving countries or retiring. Their data can look very different from people who chose to leave of their own volition. In fact they often look very much like any other person in your company and thus make your prediction models confused (see the data at the bottom of this article).
To create your basic data set, the starting point is generally to look at your exit survey results and make sure you have the data you need. This data, or your HRIS, should be able to tell you who has left and at least some indication of whether they were voluntary and therefore potentially a regrettable leaver. No step is more important in this process than cleaning your data of involuntary leavers.
2. Include a broad range of connected data
For those who left voluntarily, you can often see major trends in exit data. Does the exit data show that many people are leaving for similar reasons, or that similar roles or demographic groups are exiting more frequently, for example. You can then connect this data with their previous responses and profiles in the feedback.
It’s important to find all the information that may be connected to your leaving or churn data. Consider what other data you have about them - their age, their training profile, the promotions they’ve had. Then couple this with their feedback or survey data to look for the connections.
The feedback data can be quite telling of how an individual was feeling early on. For example, a good predictor of churn is asking people when they think they’ll leave or if they can see themselves in the organization in two years’ time (this is one of our recommended employee engagement questions).
People often tell the truth when they give that feedback in surveys, so if someone says they might leave they’re actually more likely to. This holds true for at least a year.
When looking at your survey data, there are four other specific areas we always suggest considering in your analysis - leadership, learning and development, alignment and salary.
How people perceive leadership can play a significant role in their decision to stay or leave a company. Learning and development is also commonly in the mix. These questions indicate whether someone believed that they had opportunities in the organization to develop their career. Learning and development is often important to retention even in industries where people don’t believe it is, like retail. Survey questions can also give you an indication of whether someone feels in sync with the organization or not.
3. Use the right statistical techniques to identify patterns
To illustrate the statistical challenge, consider that for many companies only 10-20% of people might churn in any given year. Out of that 10-20% you only want to understand those who left voluntarily and regrettably. That’s why you need statistical models that can predict relatively rare outcomes.
Recently we had a large dataset that included thousands of people who’d left an organization. We tried several different methods to look through the surveys - random forests, decision trees, logistic regression and other algorithms - and we found that random forests were the most effective for this type of work.
A random forest is an an extension of a decision tree. Essentially it contains multiple decision trees, so instead of finding one tree this technique finds a multitude of the best trees and combines them to predict an outcome. Random forests are quite good at picking up nonlinear effects and unusual combinations of things that are predictive - although they can be hard to interpret.
Other useful techniques we’ve found are survival analysis and sampling procedures such as the ROSE technique (that stands for Random Over Sampling Examples). These types of procedures boost and adjust your training data for the smaller number of churn cases to help your model
These aren't tools or techniques that everyone is familiar with, but our people science team are happy to help people with any questions about using them. However, just looking at simple differences in how the regrettable churn groups and the people who stayed responded in previous surveys will give you some powerful insights alone.
4. Address solutions at the group level not the individual
Rather than predicting whether a specific individual is going to leave your organization, try to identify and address issues at the group level. For example, if your models suggest a certain role (e.g. Sales Managers or Engineering Managers) are at risk, you might carefully examine retention questions for that group and act accordingly. This data can be very powerful when you find at-risk groups within your organization that you can help.
However, the worst thing you can do is start targeting individuals that you predict will leave. If you use data to predictively target individuals you risk getting it very wrong. People may feel targeted or believe that you’ve been looking at their personal data. This has the potential to damage your credibility and that of your feedback process. Individuals who may have chosen to stay may also change their mind as result. There’s no worse outcome than churn predictions becoming a self-fulfilling prophecy.
By following these four steps, you can identify groups or areas in your organization that are at risk of churning. The analysis will also help you identify why people may choose to leave so you can address these issues head on.
The best predictors of churn are often simple
The best predictors often include the most obvious questions. For example, one of our standard benchmarked questions simply asks people if they can “see themselves still at the company in two years’ time”.
Surprisingly, a lot of people respond very honestly to this somewhat direct question. Across 100s of companies we’ve found people who say they can’t see themselves at the company in two years’ time are 136% more likely to leave within the next year. Hence it is often an important contributing predictor in churn models regardless of the statistical techniques we use.
To illustrate this point, below you can see some real data from an anonymous company showing the percentage of some key groups of people that said they could see themselves at the company in two years’ time.
Overall you can see that about 80% who said they could see themselves still there actually stayed compared to closer to 45% who were regrettable leavers. However, those who were managed out or even retired look somewhat similar to those who stayed. This visual clearly shows the importance of cleaning your churn data given the clear differences between regrettable leavers and other categories.
Percentage of people who said they could see themselves with the company in two years’ time: