Written by Silja Märdla, Senior Analytics Engineer at Bolt.
As users of dbt_metrics know, the dbt_metrics package will be deprecated and replaced with MetricFlow. Together with this step, the metrics.calculate() macro from the dbt_metrics package will also be deprecated. The current suggestion is to use the --explain parameter in the MetricFlow query to compile the SQL code that you can then paste into your dbt model code.
Because I believe in the power of the Semantic Layer and metrics, I’ve been on a journey to implement and re-implement some metrics for Bolt.
Here is a summary of how the MetricFlow queries can currently be used when you’re on dbt core instead of the cloud.
Before sharing the hacky workaround process that I have come up with, I’d like to summarise some pain points we feel:
--explain provides a fully compiled query, no {{ref()}} macros that would keep the lineage intact (issue, thread).
There is no way to (even manually) define the dependency of data marts on semantic models and metrics (thread).
There is no straightforward functionality to make sure partition keys are filtered (thread).
That said, here’s what I like about MetricFlow queries (compared to the dbt_metrics functionality):
There is no need to build extra wide fact tables to support the metrics queries (we can simply get the dimension values from dimension tables).
Users can use the mf query functionality to construct queries for their ad hoc analysis.
So, here’s a copy of an internal document written to guide the creation of data marts using MetricFlow queries:
Define semantic models and metrics in yml files;
List the available metrics using mf list metrics;
Create a comma-separated list of the metrics you wish to include in your model;
Construct your MetricFlow metrics query in the format
where metric1, metric2 and metric3 are metric names. No spaces are allowed in the list of metric names;
Test your metrics query by running the same MetricFlow command without –explain and reasonable date limits;
Add the metrics query to the beginning of your model inside a SQL comment /* ... */;
Run the MetricFlow query to compile the model SQL code. It will be printed in the terminal output by default. For long queries (1000+ lines), it helps to send the terminal output to a file:
Notice the beginning of the file will need to be cleaned up.
Place the compiled SQL query in your model;
If you need to rename any of the output columns (typically, the dimensions need renaming), place the compiled SQL inside
and modify the SELECT * to do the renaming;
Construct a full list of fact tables your metrics rely on (either from metrics definitions or by reviewing the list of FROM statements in the output query);
Add the list to the beginning of your model inside a SQL comment /* ... */;
Find and replace each occurrence of
with
where table_name=dbt_model_name. You can omit the dbt_project name and use
but it’s better to be explicit.
This will ensure the model has references to its upstream models and the dependencies in the dbt data lineage graph.
Find and replace each occurrence of
with
This will replace the placeholder data interval start and end dates with jinja variables to use the data_interval_start and data_interval_end passed by Airflow. Input tables are partitioned by date; this is how we make sure we filter according to the data interval being processed.
Compile the dbt project;
Review the compiled SQL code for the dbt model in your target folder;
Test your model by running the dbt model.
Join us!
Bolt is a place where you can grow professionally at lightning speed and create a real impact on a global scale.
If you feel inspired by our approach and want to join us on our journey, check out our job board. We’ve hundreds of open roles, from entry-level to top management — each being an exciting opportunity to contribute to making cities for people, not cars.
If you’re ready to work in an exciting, dynamic, fast-paced industry and are not afraid of challenges, we’re waiting for you!