Model & Field Naming Convention

Introduction

How you name your models and fields is, by itself, a type of documentation. A clear naming convention will help both navigation and understanding of the data.

While there are several possible conventions that you can adopt, in this document, we offer some simple conventions for you to get started.

Models & Datasets naming

This applies to both model and dataset files:

General rules

Naming:
- use_lower_case_and_dash For i.e: Use first_name, not "First Name".
- Entity name in the name of files should be in plural form
- If verb is used in the name, they must follow the forms
  - verb (active voice) - entity: map_users_visitors
  - entity name - verb (passive voice): search_results_unnested
Labeling: Since the label is what the end-user sees, it should be descriptive and easy to understand:
- Avoid too much abbreviation
- Remove prefixes and suffixes (like dim, facts...)
- Use Upper Case And No Underscore

Staging models

Staging models used to produce a final model. Multiple actions can be done here.
These tend to be convenient models used for reporting purpose, not for exploration.
Format: stg_<some useful names>
Example:
- stg_query_reports
- stg_dashboards
- stg_map_holistics_landing_visitors_trials

Assisting models

Assisting models should not be exposed to end-user, commonly perform one specific action.

The general name format is verb (present tense)- entity (plural). Some common verbs:

map_: use for identity stitching tasks
dedup_: use to deduplicate records
unnest_: Use to unnest the nested fields, and use the result to do other things
Examples
- map_exchange_rates
- map_visitors_trial
- map_pipedrive_deals_holistics_tenants
- dedup_trial_submission_emails
- unnest_holistics_app_global_search_results
- unnest_trial_submissions_what_you_like
- unnest_trial_submissions_reporting_data_sources

Another format is entity (plural) - verb (passive voice).

For example:
- visitors_trials_mapped
- trial_emails_deduped

Cleaned and reporting models

These data models are where we start applying business logic, and as a result, typically have heavier transformations than staging models. Here is a guide of naming models which adapt Kimball's dimensional modeling techniques

`dim_` models

Dim models represent certain objects/entities in our products. These models tend to be short and wide (few rows, many fields).
Format: dim_<object name in plural>
Example:
- dim_query_reports
- dim_dashboards.

`fact_` models

contains information about "transactions" or "interactions".
Have measurements, is not necessary a raw, atomic event table
Tend to be a long, narrow table
Examples:
- fact_jobs
- fact_business_metrics_monthly

Field naming & labeling

A model field has two things: Field name and field label.

Field names should follow stricter rules compared to field labels. This is applicable for anything that is analyst-facing instead of business user-facing.

Field name

Case: use snake_case
Name fields consistently and represent values consistently across the model. For example:
- user vs. customer vs. member vs. account
- company vs. account vs. organization
- registration vs. creation
In denormalized tables, should try to retain the original name of the table.

For example, id, created_at, updated_at field in table users should still be id, created_at, updated_at instead of user_id, user_created_at, user_updated_at

In a normalized table (where you joined different tables to get info of an entity into another entity's model), ambiguous field names should be made clear:

For example, in a derived query_report model (that create a normalized table):

select
	query_reports.id as query_report_id
	, query_reports.name as query_report_name
	, query_reports.user_id
	, users.name as user_name -- `name` exists in both tables

from query_reports 
left join users on query_reports.user_id = users.id

Dimensions

Numeric Dimensions

General form [aggregation name]_object_number
If the aggregated dimension is monhubspot_customer_statusetary: [aggregation name]_value_amount
Aggregation name is optional
Ratios: x_per_y
Percentage: pct_x_over_y

For example:

user_stats: {
	dimensions: {
		user_id
		, user_name
		, query_reports_number -- Number of reports belong to this user
		, avg_reports_per_day_number
		, pct_models_in_dataset_number
		, total_revenue_amount
		, total_payment_this_monnth_amount
		, total_active_days_number
	}
	, measures: {}
}

Boolean Dimensions

Dimensions that have boolean (TRUE/FALSE) type

General form: is_..., has_..., clicked_button_a, created_multi_ds_report
Verb tense depends on the context.

Measures

General form: [aggregation name]_object

Average: avg_
Sum: total_ (Use Total instead of Sum because it sound more natural)
Count: count_
Ratios: x_per_y
Percentage: pct_x_over_y

Example:

user_stats: {
	dimensions: {
		user_id
		, user_name
	}
	, measures: {
			count_created_users: @sql count(users.id)
			, count_existing_users: @sql count(users.id if is not deleted)
			, avg_activity_days: @sql avg(total_active_days_number)
	}
}

Field label

Case: Use Capital
Field label is what the end-user see when exploring the dataset, so it should be descriptive
Fields that have the same base name across multiple models (id, created_at, updated_at...) should be labeled explicitly. For example:
- Field jobs.created_at should have label Job Created At
- Field users.created_at should have label User Created At
- Field query_reports.created_at should have label Report Created At

Introduction​

Models & Datasets naming​

General rules​

Staging models​

Assisting models​

Cleaned and reporting models​

dim_ models​

fact_ models​

Field naming & labeling​

Field name​

Dimensions​

Measures​

Field label​