Model & Field Naming Convention
Introduction
How you name your models and fields is, by itself, a type of documentation. A clear naming convention will help both navigation and understanding of the data.
While there are several possible conventions that you can adopt, in this document, we offer some simple conventions for you to get started.
Models & Datasets naming
This applies to both model and dataset files:
General rules
- Naming:
use_lower_case_and_dash
For i.e: Usefirst_name
, not"First Name"
.- Entity name in the name of files should be in plural form
- If verb is used in the name, they must follow the forms
- verb (active voice) - entity:
map_users_visitors
- entity name - verb (passive voice):
search_results_unnested
- verb (active voice) - entity:
- Labeling: Since the label is what the end-user sees, it should be descriptive and easy to understand:
- Avoid too much abbreviation
- Remove prefixes and suffixes (like dim, facts...)
Use Upper Case And No Underscore
Staging models
- Staging models used to produce a final model. Multiple actions can be done here.
- These tend to be convenient models used for reporting purpose, not for exploration.
- Format:
stg_<some useful names>
- Example:
stg_query_reports
stg_dashboards
stg_map_holistics_landing_visitors_trials
Assisting models
Assisting models should not be exposed to end-user, commonly perform one specific action.
The general name format is verb (present tense)- entity (plural)
. Some common verbs:
map_
: use for identity stitching tasksdedup_
: use to deduplicate recordsunnest_
: Use to unnest the nested fields, and use the result to do other things- Examples
- map_exchange_rates
- map_visitors_trial
- map_pipedrive_deals_holistics_tenants
- dedup_trial_submission_emails
- unnest_holistics_app_global_search_results
- unnest_trial_submissions_what_you_like
- unnest_trial_submissions_reporting_data_sources
Another format is entity (plural) - verb (passive voice)
.
- For example:
visitors_trials_mapped
trial_emails_deduped
Cleaned and reporting models
These data models are where we start applying business logic, and as a result, typically have heavier transformations than staging models. Here is a guide of naming models which adapt Kimball's dimensional modeling techniques
dim_
models
- Dim models represent certain objects/entities in our products. These models tend to be short and wide (few rows, many fields).
- Format:
dim_<object name in plural>
- Example:
dim_query_reports
dim_dashboards
.
fact_
models
- contains information about "transactions" or "interactions".
- Have measurements, is not necessary a raw, atomic event table
- Tend to be a long, narrow table
- Examples:
- fact_jobs
- fact_business_metrics_monthly
Field naming & labeling
A model field has two things: Field name and field label.
Field names should follow stricter rules compared to field labels. This is applicable for anything that is analyst-facing instead of business user-facing.
Field name
- Case: use
snake_case
- Name fields consistently and represent values consistently across the model. For example:
- user vs. customer vs. member vs. account
- company vs. account vs. organization
- registration vs. creation
- In denormalized tables, should try to retain the original name of the table. For example,
id
,created_at
,updated_at
field in tableusers
should still beid
,created_at
,updated_at
instead ofuser_id
,user_created_at
,user_updated_at
- In a normalized table (where you joined different tables to get info of an entity into another entity's model), ambiguous field names should be made clear: For example, in a derived
query_report
model (that create a normalized table):select
query_reports.id as query_report_id
, query_reports.name as query_report_name
, query_reports.user_id
, users.name as user_name -- `name` exists in both tables
from query_reports
left join users on query_reports.user_id = users.id
Dimensions
Numeric Dimensions
- General form
[aggregation name]_object_number
- If the aggregated dimension is monhubspot_customer_statusetary:
[aggregation name]_value_amount
- Aggregation name is optional
- Ratios:
x_per_y
- Percentage:
pct_x_over_y
For example:
user_stats: {
dimensions: {
user_id
, user_name
, query_reports_number -- Number of reports belong to this user
, avg_reports_per_day_number
, pct_models_in_dataset_number
, total_revenue_amount
, total_payment_this_monnth_amount
, total_active_days_number
}
, measures: {}
}
Boolean Dimensions
Dimensions that have boolean (TRUE/FALSE) type
- General form:
is_...
,has_...
,clicked_button_a
,created_multi_ds_report
- Verb tense depends on the context.
Measures
General form: [aggregation name]_object
- Average:
avg_
- Sum:
total_
(Use Total instead of Sum because it sound more natural) - Count:
count_
- Ratios:
x_per_y
- Percentage:
pct_x_over_y
Example:
user_stats: {
dimensions: {
user_id
, user_name
}
, measures: {
count_created_users: @sql count(users.id)
, count_existing_users: @sql count(users.id if is not deleted)
, avg_activity_days: @sql avg(total_active_days_number)
}
}
Field label
- Case:
Use Capital
- Field label is what the end-user see when exploring the dataset, so it should be descriptive
- Fields that have the same base name across multiple models (id, created_at, updated_at...) should be labeled explicitly. For example:
- Field
jobs.created_at
should have label Job Created At - Field
users.created_at
should have label User Created At - Field
query_reports.created_at
should have label Report Created At
- Field