Cohorts
What is cohort modeling?
In order to understand your customers at a deeper level, you'll often want to track them on a more granular basis, using time-based cohorts. This is because customers can behave differently at different points in time, for example:
- Retention and expansion rates often follow a pattern depending on how many months they've been a customer
- Churn rates may differ depending on what time of year a customer joined
- Spend or engagement may be higher in initial months after sign-up, then drop off
- If you have a new onboarding flow, more customers might retain in the 2nd month, (vs. when there was no new onboarding flow)
To forecast on this cohort basis, you need to be able to split your customers into different cohorts, so you can apply the correct assumptions (e.g. churn, retention etc) at different points in time.
Cohorts in Causal
In Causal, Cohorts
act as a category that reflects the time period of the model. For example, if a monthly model goes from Jan '22 to Dec '22, adding cohorts would add 12 items, one for each month in the model.
You can access cohorts by explicitly adding the cohort
category in the variable or by referencing cohorts in the formula. Below is a simple example of using cohorts ->
Cohort of leads that convert into New Customers
Let's break this formula into parts:
Example Inputs:
New sign-ups
is 1,000 for our first month and grows at 5%Activation of cohort
uses relative time so that 1st month is 45%, 2nd is 25%, etc.
Simply this formula is saying "Signups multiplied by activation rate %"
What is the "cohort" and "t-cohort" doing?
- By putting
cohort
as the time modifier of theNew sign-ups
variable, we are telling Causal to use the new sign-ups for Jan in the Jan cohort, Feb in the Feb cohort, etc. - By using
t-cohort
as the time modifier of theActivations
variable, we are telling Causal to use the 1st month activation rate (i.e. 45%) for the first month of the cohort (Jan'22 for the Jan'22 cohort), the second activation rate (25%) for the second month of that cohort (Feb'22 for the Jan'22 cohort), and so on.
Deeper dive on t-cohort
:brain:
Let's consider the Feb '22 cohort, in Mar '22, in a model that begins in January '22:
cohort
is 0 for Jan '22, 1 for Feb '22, etc.
t
is 0 for Jan '22, 1 for Feb '22, 2 for Mar '22 etc
- For our worked example,
t
is 2 andcohort
is 1.t-cohort
will return 1, so I will be applying the 2nd month activation rate (25%). This is correct as Mar'22 is the 2nd month of my Feb'22 cohort.
- If the month was instead Feb '22, then
t-cohort
would be 0 (1-1) corresponding to the 1st month activation rate of 45%
Note: t
is a helper variable (also known as timestep or date).
Please visit our Causal Community Forum here for another example of cohort modelling with monthly active users. This includes an in depth breakdown of the formula and what exactly using 'cohort' and 't-cohort' as time modifiers within a formula are doing.
Importing Cohort Data
You can connect your historic cohort data to Causal via a spreadsheet or directly from your data warehouse. If you go the spreadsheet route, there are two formats that are compatible with Causal: Time-Series format and Transactions format.
The one rule that must be satisfied is that the dates in the Cohort column must fall between the range of the Date column
Time-Series
This is an example of a time-series format spreadsheet for cohorts:
- Each row represents a single variable for a single category item
- The columns can be split into 3 section: Variable Names, Categories, and Values
- The first section is just the first column
- This must contain the Variable Names
- In the example above, column A contains the variable "Total Billed"; in this case, it was the only variable
- The second section proceeds the first column
- This contains the Categories
- The first row must have the name of the category and the rows below may have the name of an item in that category
- In the example above, column B contains the "Cohort" category; in this case it was the only category. The cohort names (e.g. August 2018) must be formatted as dates to be recognized by Causal.
- The third section follows after all of the columns involved in the second section
- This contains the Values
- The first row must contain the dates, and the rows below are the values themselves (must be number format, not text)
- In the example of above, this section ranges from column C to column G
- The first section is just the first column
Transactions
This is an example of a transactions format spreadsheet for cohorts:
- The first column must have the name "Date"
- Simply add a column with the name "Cohort"
- and as above, the cohort names must be formatted as dates to be recognized by Causal.
- The other columns can be for additional categories / data items