Skip to main content

Model & Field Naming Convention

Introduction

How you name your models and fields is, by itself, a type of documentation. A clear naming convention will help both navigation and understanding of the data.

While there are several possible conventions that you can adopt, in this document, we offer some simple conventions for you to get started.

Models & Datasets naming

This applies to both model and dataset files:

General rules

  • Naming:
    • use_lower_case_and_dash For i.e: Use first_namenot "First Name".
    • Entity name in the name of files should be in plural form
    • If verb is used in the name, they must follow the forms
      • verb (active voice) - entity: map_users_visitors
      • entity name - verb (passive voice): search_results_unnested
  • Labeling: Since the label is what the end-user sees, it should be descriptive and easy to understand:
    • Avoid too much abbreviation
    • Remove prefixes and suffixes (like dim, facts...)
    • Use Upper Case And No Underscore

Staging models

  • Staging models used to produce a final model. Multiple actions can be done here.
  • These tend to be convenient models used for reporting purpose, not for exploration.
  • Format: stg_<some useful names>
  • Example:
    • stg_query_reports
    • stg_dashboards
    • stg_map_holistics_landing_visitors_trials

Assisting models

Assisting models should not be exposed to end-user, commonly perform one specific action.

The general name format is verb (present tense)- entity (plural). Some common verbs:

  • map_: use for identity stitching tasks
  • dedup_: use to deduplicate records
  • unnest_: Use to unnest the nested fields, and use the result to do other things
  • Examples
    • map_exchange_rates
    • map_visitors_trial
    • map_pipedrive_deals_holistics_tenants
    • dedup_trial_submission_emails
    • unnest_holistics_app_global_search_results
    • unnest_trial_submissions_what_you_like
    • unnest_trial_submissions_reporting_data_sources

Another format is entity (plural) - verb (passive voice).

  • For example:
    • visitors_trials_mapped
    • trial_emails_deduped

Cleaned and reporting models

These data models are where we start applying business logic, and as a result, typically have heavier transformations than staging models. Here is a guide of naming models which adapt Kimball's dimensional modeling techniques

dim_ models

  • Dim models represent certain objects/entities in our products. These models tend to be short and wide (few rows, many fields).
  • Format: dim_<object name in plural>
  • Example:
    • dim_query_reports
    • dim_dashboards.

fact_ models

  • contains information about "transactions" or "interactions".
  • Have measurements, is not necessary a raw, atomic event table
  • Tend to be a long, narrow table
  • Examples:
    • fact_jobs
    • fact_business_metrics_monthly

Field naming & labeling

A model field has two things: Field name and field label.

Field names should follow stricter rules compared to field labels. This is applicable for anything that is analyst-facing instead of business user-facing.

Field name

  • Case: use snake_case
  • Name fields consistently and represent values consistently across the model. For example:
    • user vs. customer vs. member vs. account
    • company vs. account vs. organization
    • registration vs. creation
  • In denormalized tables, should try to retain the original name of the table. For example, id, created_at, updated_at field in table users should still be id, created_at, updated_at instead of user_id, user_created_at, user_updated_at
  • In a normalized table (where you joined different tables to get info of an entity into another entity's model), ambiguous field names should be made clear: For example, in a derived query_report model (that create a normalized table):
    select
    query_reports.id as query_report_id
    , query_reports.name as query_report_name
    , query_reports.user_id
    , users.name as user_name -- `name` exists in both tables

    from query_reports
    left join users on query_reports.user_id = users.id

Dimensions

Numeric Dimensions

  • General form [aggregation name]_object_number
  • If the aggregated dimension is monhubspot_customer_statusetary: [aggregation name]_value_amount
  • Aggregation name is optional
  • Ratios: x_per_y
  • Percentage: pct_x_over_y

For example:

user_stats: {
dimensions: {
user_id
, user_name
, query_reports_number -- Number of reports belong to this user
, avg_reports_per_day_number
, pct_models_in_dataset_number
, total_revenue_amount
, total_payment_this_monnth_amount
, total_active_days_number
}
, measures: {}
}

Boolean Dimensions

Dimensions that have boolean (TRUE/FALSE) type

  • General form: is_..., has_..., clicked_button_a, created_multi_ds_report
  • Verb tense depends on the context.

Measures

General form: [aggregation name]_object

  • Average: avg_
  • Sum: total_ (Use Total instead of Sum because it sound more natural)
  • Count: count_
  • Ratios: x_per_y
  • Percentage: pct_x_over_y

Example:

user_stats: {
dimensions: {
user_id
, user_name
}
, measures: {
count_created_users: @sql count(users.id)
, count_existing_users: @sql count(users.id if is not deleted)
, avg_activity_days: @sql avg(total_active_days_number)
}
}

Field label

  • Case: Use Capital
  • Field label is what the end-user see when exploring the dataset, so it should be descriptive
  • Fields that have the same base name across multiple models (id, created_at, updated_at...) should be labeled explicitly. For example:
    • Field jobs.created_at should have label Job Created At
    • Field users.created_at should have label User Created At
    • Field query_reports.created_at should have label Report Created At
Field label in Holistics modeling layer

Let us know what you think about this document :)