Managing all the variations of (dbt) metrics

Jan 5, 2024

data-modelling-with-dbt-core

Written by Silja MƤrdla, Senior Analytics Engineer at Bolt.

Semantic layers and metrics will be a big part of how we work with data in the future. Weā€™ve done this before but are now forming a more systematic approach. This is what Iā€™ve been working on in Bolt.

Iā€™ve previously shared how we approach data modelling in general and how to materialise metrics defined in dbt. This time, Iā€™ll share the metrics weā€™ve set up and how weā€™ve set them up.

Simple metrics

Just follow the metric creation documentation to set up simple metrics. They mirror the measures set up in the semantic models. Use generic dbt functionality to include a label, metadata, and a reusable doc block with a description.

Or simply opt for the create_metric: True option directly on the measure.

In general, itā€™s straightforward. Thereā€™s one philosophical question that popped up, though.

Should we measure filters or metrics?

There are two options to calculate the number of orders in specific states. We can do the filtering in the measure:

Or we can have only one measure but multiple different metrics with filters:

I donā€™t know whatā€™s better. I was discussing the difference here, and this is what Derek Willemsen suggested:

I believe this type of filtering should happen in the metrics, not in measures. Both will work, and yes, it might be DRYer to do it in measures sometimes, but once the number of variations/combinations of filters increases, youā€™ll see that itā€™s cleaner to do the filtering in the metrics section.

So now I have all of the filters consistently in metrics.

Derived metrics

Derived metrics are calculated from other metrics, for example:

Again, thereā€™s something you should consider. What do you think happens when metric_1 is null?

Null handling in derived metrics

The answer is that metric_3 will also be null out of the box. This is different from what you expect in most cases.

Thereā€™s a more extended discussion around the topic in MetricFlow issue #764. As I understand, there should be a fill_nulls_with configuration released with dbt 1.7 that will help to deal with this problem manually. Hopefully, the derived metric functionality will be further improved to do this automatically.

Iā€™ve not tested the null handling out yet. In my case, I went with the ugly workaround of redefining the metric.

Donā€™t do this. Iā€™m sharing the example just for the record.

Ratio metrics

Ratio metrics are a particular type of derived metric where one metric is divided by another. When working with these, I discovered that derived and ratio metrics could cause ambiguous queries to be compiled when you query two or more metrics that share components. See the details in Metricflow issue #757. Hereā€™s an example of two metrics that will cause this problem when queried together:

I noticed that both would be better defined as ratio metrics instead of derived, but this doesnā€™t matter here.

The trick to overcome the problem of an ambiguous query is to alias the components in derived metric definitions:

Variations of metrics

When you start using cumulative metrics, you will define some metrics that are variations of the same metric.

Currently, theyā€™re conceptually similar but technically wholly isolated. Thomas Lento hopes this might change soon. See the discussion here.

Thereā€™s another scenario where I found myself creating variations of metrics. It has to do with the time dimensions of a metric. Your semantic model about order might have some different time dimensions:

or you might be working with UTC and local time in parallel:

The semantic layer has been designed to support multiple time dimensions. You can return different results by different MetricFlow queries:

However, MetricFlow doesnā€™t support different time dimensions when querying cumulative metrics. See the details in MetricFlow issue #806. In fact, itā€™ll return completely wrong results if the time dimension is referenced by its name. This query:

will return completely wrong data (instead of the 7-day cumulative, youā€™ll get the simple daily sums).

On the positive side, cumulative metrics work when queried using metric_time, the default time dimension. So, this query will return the correct results:

So, my current, very ugly workaround is to define two separate semantic models:

and have two separate metrics on top of them:

So, to summarise, each metric can have:

  • The ā€œbaseā€ metric: the one that means something to the business;
  • Cumulative versions of the metric;
  • Different metric versions based on different time dimensions (created_timestamp vs finished_timestamp, UTC vs local, etc.).

Keeping different versions of metrics in order

Iā€™ve recently reorganised my metrics in the following manner:

  • 1 .yml file per 1 ā€œbaseā€ metric.
  • 1 .md file per 1 ā€œbaseā€ metric.
  • All the variations of the same metric in that same .yml file.

Also, thereā€™s a naming convention that makes sure the different variations are easy to find:

will return all the variations of the rides_orders_created metric:

and theyā€™re just as easily found from the dbt docs page:

Documentation

Unfortunately, the automatically generated documentation for metrics is useless without the implementation of MetricFlow. The documentation only displays the name, description, and upstream semantic model. There is nothing about the metric definition written in code nor the compiled code one would get when querying the metric.

I developed the following conventions to make the metric documentation more attractive to users.

Metric descriptions combine generic and specific doc blocks

To simplify documentation of a metricā€™s many variations, Iā€™ve created some generic doc blocks that can be combined with the particular doc block for the ā€œbaseā€ metric. This is what a description of a cumulative metric looks like:

where the metrics_cumulative block explains the details of how cumulative metrics work:

Notice that the end of the description leads to the following component: the specific description of the rides_orders_created metric. I will come back to that later.

The final metrics_utc block explains the difference between metric versions working on top of different time dimensions:

Base metric description

The two things that explain the meaning of a metric are the configurations and the compiled query. Iā€™ve decided to include these in the metric description manually. This is how it looks like:

There are 3 mandatory components:

  1. A readable business description of the metric to please the business users;
  2. Link to the specific .yml file that defines this particular metric and itā€™s different versions to expose the metric configurations;
  3. The simplest version of the compiled code is generated by querying the metric without any filtering or grouping to show the compiled query.

Optionally, some metrics have additional details explained at the end of the documentation. This is usually related to business knowledge, links to Confluence documentation, known issues, etc.

Metrics documentation page

Combining the documentation logic described above, this is what the final metric doc looks like:

Itā€™s not ideal, but itā€™s consistent and more valuable than the version that comes out of the box.

Summary

I hope these tips are helpful for other users of dbt metrics. Please share your opinions and any other tips youā€™ve discovered.

I also hope that most of these tips will become useless as the functionality offered by dbt improves over time.

Join us! 

Bolt is a place where you can grow professionally at lightning speed and create a real impact on a global scale.

If you feel inspired by our approach and want to join us on our journey, check out our job board. Weā€™ve hundreds of open roles, from entry-level to top management ā€” each being an exciting opportunity to contribute to making cities for people, not cars.

If youā€™re ready to work in an exciting, dynamic, fast-paced industry and are not afraid of challenges, weā€™re waiting for you!

Download Bolt

Recent posts