Skip to main content

Provide users with meaningful names

Models & Datasets naming

This applies to both model and dataset files:

General rules

  • Naming:
    • File names and model names can reuse the name defined in the dbt project.
    • use_lower_case_and_dash For i.e: Use first_namenot "First Name".
    • Entity name in the name of files should be in plural form
    • If verb is used in the name, they must follow the forms
      • verb (active voice) - entity: map_users_visitors
      • entity name - verb (passive voice): search_results_unnested
  • Labeling: Since the label is what the end-user sees, it should be descriptive and easy to understand
    • Avoid too much abbreviation
    • Remove prefixes and suffixes (like dim, facts...)
    • Use Upper Case And No Dash

Staging models

  • Staging models used to produce a final model. Multiple actions can be done here.
  • These tend to be convenient models used for reporting purpose, not for exploration.
  • Format: stg_<some useful names>
  • Example:
    • stg_query_reports
    • stg_dashboards
    • stg_map_holistics_landing_visitors_trials

Assisting models

Assisting models should not be exposed to end-user, commonly perform one specific action.

The general name format is verb (present tense)- entity (plural). Some common verbs:

  • map_: use for identity stitching tasks
  • dedup_: use to deduplicate records
  • unnest_: Use to unnest the nested fields, and use the result to do other things
  • Examples
    • map_exchange_rates
    • map_visitors_trial
    • map_pipedrive_deals_holistics_tenants
    • dedup_trial_submission_emails
    • unnest_holistics_app_global_search_results
    • unnest_trial_submissions_what_you_like
    • unnest_trial_submissions_reporting_data_sources

Another format is entity (plural) - verb (passive voice).

  • For example:
    • visitors_trials_mapped
    • trial_emails_deduped

Cleaned and reporting models

These data models are where we start applying business logic, and as a result, typically have heavier transformations than staging models. Here is a guide of naming models which adapt Kimball's dimensional modeling techniques

dim_ models

  • Dim models represent certain objects/entities in our products. These models tend to be short and wide (few rows, many fields).
  • Format: dim_<object name in plural>
  • Example:
    • dim_query_reports
    • dim_dashboards.

fact_ models

  • contains information about "transactions" or "interactions".
  • Have measurements, is not necessary a raw, atomic event table
  • Tend to be a long, narrow table
  • Examples:
    • fact_jobs
    • fact_business_metrics_monthly

Field naming & labeling

A model field has two things: Field name and field label.

Field names should follow stricter rules compared to field labels. This is applicable for anything that is analyst-facing instead of business user-facing.

Field name

  • Case: use snake_case
  • Name fields consistently and represent values consistently across the model. For example:
    • user vs. customer vs. member vs. account
    • company vs. account vs. organization
    • registration vs. creation
  • In denormalized tables, should try to retain the original name of the table. For example, id, created_at, updated_at field in table users should still be id, created_at, updated_at instead of user_id, user_created_at, user_updated_at
  • In a normalized table (where you joined different tables to get info of an entity into another entity's model), ambiguous field names should be made clear: For example, in a derived query_report model (that create a normalized table):
    select
    query_reports.id as query_report_id
    , query_reports.name as query_report_name
    , query_reports.user_id
    , users.name as user_name -- `name` exists in both tables

    from query_reports
    left join users on query_reports.user_id = users.id

Dimensions

Numeric Dimensions

  • General form [aggregation name]_object_number
  • If the aggregated dimension is monhubspot_customer_statusetary: [aggregation name]_value_amount
  • Aggregation name is optional
  • Ratios: x_per_y
  • Percentage: pct_x_over_y

For example:

user_stats: {
dimensions: {
user_id
, user_name
, query_reports_number -- Number of reports belong to this user
, avg_reports_per_day_number
, pct_models_in_dataset_number
, total_revenue_amount
, total_payment_this_monnth_amount
, total_active_days_number
}
, measures: {}
}

Boolean Dimensions

Dimensions that have boolean (TRUE/FALSE) type

  • General form: is_..., has_..., clicked_button_a, created_multi_ds_report
  • Verb tense depends on the context.

Measures

General form: [aggregation name]_object

  • Average: avg_
  • Sum: total_ (Use Total instead of Sum because it sound more natural)
  • Count: count_
  • Ratios: x_per_y
  • Percentage: pct_x_over_y

Example:

user_stats: {
dimensions: {
user_id
, user_name
}
, measures: {
count_created_users: @sql count(users.id)
, count_existing_users: @sql count(users.id if is not deleted)
, avg_activity_days: @sql avg(total_active_days_number)
}
}

Field label in Holistics modeling layer

  • Case: Please Use Capital
  • Field label is what the end-user see when exploring the dataset, so it should be descriptive
  • Fields that have the same base name across multiple models (id, created_at, updated_at...) should be labeled explicitly. For example:
    • Field jobs.created_at should have label Job Created At
    • Field users.created_at should have label User Created At
    • Field query_reports.created_at should have label Report Created At
Field label in Holistics modeling layer

Let us know what you think about this document :)