Skip to main content

AML vs YAML

AML vs YAML

Introduction

"Aren't all YAML-based analytics modeling languages similar? Why did Holistics have to invent a new modeling language instead of just use YAML?"

Most "Analytics as Code" solutions on the market use YAML as their underlying modeling markup syntax. While this is a popular choice among BI vendors, YAML remains a general-purpose markup language that poses many limitations when it comes to analytics development.

This document explains the limitations of YAML-based modeling language, and how AMQL addresses those limitations.

YAML is ambiguous and error-prone

YAML is designed for human readability, and it tries to be "helpful" by automatically interpreting values. But this "helpfulness" creates a minefield of edge cases and inconsistencies.

Inconsistent type parsing

Inconsistent type parsing

Consider this seemingly innocent example:

versions:
- test # Return 'test' (string)
- 10.5.1 # Return 10.5 (number)
- 1e+5 # Return 10000 (scientific notation)
config:
enabled: yes # String or boolean? Depends on your parser!
release: 2023-05-01 # String or date?

What actually gets parsed for versions? ["test", 10.5, 100000] . The version number becomes a floating-point number, and the product code is interpreted as scientific notation, completely corrupting your intended data structure (all should be strings).

Also, the same YAML document parses differently depending on which parser and version you use. Try parsing this identical YAML in Python vs Node.js. Python returns True (boolean) for the field enabled, whereas NodeJS returns 'yes' (string).

Whitespace indentation nightmares

Furthermore, YAML uses "whitespace indentation" to define nested structures. This makes it really easy to cause error. Imagine working on a big YAML file and accidentally make a small spacing mistake in the code:

dimensions:
- name: id
label: "ID"
- name: email
label: "Email"

Can you spot where the error happens? Not very easy to do, especially with large files.

These ambiguities turn simple mistakes into major debugging headaches, especially in large configuration files.

AMQL provides clear, analytics-first syntax

AMQL eliminates these ambiguities by design with a clean, explicit syntax:

// users.model.aml
Model users {
type: 'table'
label: 'Users'
data_source_name: 'snowflakedb'
table_name: 'public.users'

dimension id {
label: 'ID'
type: 'number'
definition: @sql {{#SOURCE.id}};;
}
dimension email {
label: 'Email'
type: 'text'
definition: @sql {{#SOURCE.email}};;
}
measure user_count {
type: 'number'
label: 'Count Users'
definition: @sql count({{#SOURCE.id}});;
}
}

Notice the difference: analytics objects are first-class citizens in AMQL. Models, dimensions, and measures are native language constructs with explicit types, not generic data structures left to interpretation.

Furthermore, the curly bracket { syntax eliminates indentation ambiguity while creating a programming-friendly language that reads naturally and maps directly to analytics concepts.

YAML is schemaless with no type system

Perhaps the most fundamental limitation of YAML for analytics development is its complete lack of a type system and schema validation. This severely impacts both your analysts' productivity and your analytics projects' reliability, creating cascading problems that grow with your codebase.

No enforcement of required fields

YAML's schemaless nature means there's no built-in way to define what fields are required, what types they should be, or what relationships exist between different parts of your model:

# sales_dashboard.yaml
dashboard:
title: Sales Performance
charts:
- name: Revenue by Quarter
metric: quarterly_revenue
chart_type: bar
- name: Conversion Rate
metric: conversion_rate
**# Oops! chart_type is missing, but YAML doesn't care**

Without schema validation, YAML silently accepts missing required properties. These issues only surface at runtime (typically after production deployment) and often manifest as silent corruptions where charts display convincing but entirely wrong data.

No autocomplete and inline documentations

Furthermore, YAML's schemaless nature creates a development experience that lacks two critical productivity features: autocomplete and inline documentation.

Without code suggestions or inline docs, analysts must memorize property names, valid values, and structure details, or constantly refer to external documentation.

metrics:
- name: revenue_growth
time_grain: quater # Typo for "quarter" goes undetected
# What are valid compare_methods? No way to know without external docs
compare_method: precentage_change # Another typo that YAML happily accepts

This lack of guidance leads to countless wasted hours on trial-and-error, debugging typos, and hunting through documentation.

AMQL has robust type system

AML, on the other hand, is a strongly-typed language (think TypeScript but for analytics) with a powerful built-in static type system. This allows enhancing the developer experience with instant error feedback, smart auto-completion and template suggestions directly within the IDE.

  • Smart autocomplete: Get instant suggestions for valid attribute values
  • Inline documentation that explains the parameters without leaving your editor.
  • Intelligent refactoring: Go to definition, find all references, and track dependencies between dashboards, reports, and metrics
  • Instant error feedback catches mistakes as you type them
  • Compile-time validation that catches errors before deployment

Here are a few examples:

Autocompletion within the analytics IDE: the IDE auto-suggests suitable values for the "type" property of a dimension.

auto complete

Inline docs shown when hovering over code object

amql-inlinedocs.gif

Instantly raise errors when there's error (duplicate dimension).

amql-instant-error.gif

Raising error instantly when required attributes not declared

amql-instant-error-missing-attrs.gif

Jumps to the model's definition code by control-clicking on name.

amql-gotodef.gif

YAML lacks programmability and reusability

YAML's nature as a pure data serialization format means it completely lacks programmability features. This forces analytics teams into inefficient workflows and prevents the kind of code reuse that's standard in modern development.

AML, on the other hand, is a fully programmable language that gives you variables, functions, and composition to build reusable analytics components.

No variables, no functions, no abstractions

Want to define a constant once and reuse it? Want to parameterize your logic? Want to create reusable analytics components? YAML offers no native way to do any of this:

  • No variables to store and reuse values
  • No functions to encapsulate and parameterize logic
  • No template mechanism for reusing structure
  • No inheritance or extension mechanisms
  • No conditional logic or control flow

The result is analytics code that constantly violates the DRY (Don't Repeat Yourself) principle.

Without programming constructs, YAML forces you to duplicate code for similar analytics components:

metrics:
# Define a metric for daily active users
- name: daily_active_users
type: count_distinct
sql: "SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date = CURRENT_DATE"

# Want weekly active users? Copy-paste and modify
- name: weekly_active_users
type: count_distinct
sql: "SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date >= CURRENT_DATE - 7"

# Monthly active users? Copy-paste again
- name: monthly_active_users
description: "Users active in the last 30 days"
type: count_distinct
sql: "SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date >= CURRENT_DATE - 30"

This approach is not only tedious but dangerously error-prone:

  • When business logic changes (like how you define an "active" user), you must find and update every copy of the logic.
  • Also, you can't build derived metrics like "daily active users in Europe" that reference and extend existing metric.

Instead, you're forced to recreate the entire calculation from scratch.

Here is the equivalent version in AML:

AML metric lineage
metric active_users {
label: 'Active Users'
type: 'number'
definition: @aql
count_distinct(events.user_id)
;;
}

metric daily_active_users {
label: 'Daily Active Users'
type: 'number'
definition: @aql
active_users
| where(events.event_date matches @(today))
;;
}

metric weekly_active_users {
label: 'Weekly Active Users'
type: 'number'
definition: @aql
active_users
| where(events.event_date matches @(this week))
;;
}

metric monthly_active_users {
label: 'Monthly Active Users'
type: 'number'
definition: @aql
active_users
| where(events.event_date matches @(this month))
;;
}

When you want to change the definition of active users from count distinct of user id to something else, the derived metrics daily_active_users, weekly_active_users, and monthly_active_users automatically pick up the change. You don't have to find and update every copy of the logic.

This code is more:

  • Composable
  • Reusable
  • Easy to read and maintain
  • Free of parsing ambiguities

YAML + Jinja become a templating trap

Backed into a corner by YAML's limitations, many platforms resort to bolting on external templating engines. DevOps tools like Ansible have grafted string-based templating systems like Jinja onto YAML. dbt, a popular data transformation tool, also added Jinja templating on top of their SQL-based transformation language.

This is how adding Jinja templating to a YAML-based modeling language looks like:

# Data modeling with Jinja templating - a notorious pain point
metrics:
{% for period in ['daily', 'weekly', 'monthly'] %}
{% set days = 1 if period == 'daily' else 7 if period == 'weekly' else 30 %}
- name: {{ period }}_active_users
description: "Users active in the last {{ days }} days"
type: count_distinct
sql: "SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date >= CURRENT_DATE - {{ days }}"
{% endfor %}

While this approach might seem clever at first glance, it creates a fundamentally flawed hybrid that combines the worst aspects of both technologies:

  • Multi-layer parsing nightmares: The code must pass through multiple parsing layers (Jinja then YAML), creating cryptic errors that reference generated code rather than your source
  • Runtime-only error detection: Unlike proper programming languages, errors with undefined variables, type mismatches, and logic bugs only surface when templates execute, leaving you blind during development.

This approach isn't programming-it's string manipulation masquerading as logic. Instead of coherent language constructs, you get a fragile chain of text processing that breaks in mysterious ways.

The fact that teams resort to this approach highlights how desperately they need actual programming constructs that YAML simply cannot provide.

AMQL is fully programmable and reusable

AMQL takes a fundamentally different approach by being a true programming language designed specifically for analytics. Where YAML is a data format, AMQL gives you real programming constructs.

AMQL reusable code components composed into multiple metrics

Metric composition: build metrics from metrics

Taking the daily_active_users metric as a base, we can easily build more metrics like dau_in_europe that reference and extend the original logic.

metric dau_in_europe {
label: 'Daily Active Users in Europe'
type: 'number'
definition: @aql
daily_active_users
| where(users.country = 'Europe')
;;
}

metric dau_in_europe_pct {
label: 'EU DAU % of Total'
type: 'number'
definition: @aql
(dau_in_europe / daily_active_users) * 100
;;
}

If you want to change the logic of active_users, do it once, and daily_active_users, dau_in_europe, dau_in_europe_pct all pick it up automatically.

Extend: build on existing models, don't rewrite them

Need a model that's 90% the same as events but with extra fields? extend grafts new properties onto an existing object:

model user_events = events.extend({
dimension user_email {
label: 'User Email'
type: 'text'
definition: @sql {{#SOURCE.users.email}};;
}

metric conversion_rate {
label: 'Conversion Rate'
type: 'number'
definition: @aql
(count_distinct(purchases.user_id) / count_distinct(user_id)) * 100
;;
}
})

In YAML, you'd copy the entire model and add your changes. In AMQL, you declare the delta. The compiler merges, preserves the original's integrity, and catches conflicts at build time.

Func: parameterize logic, not strings

When you need to generate similar analytics objects with different inputs, Func gives you typed, parameterized functions:

Func active_users_for(period: String, threshold: Number) {
metric active_users_last_${period} {
label: 'Active Users (last ${period}, min ${threshold})'
type: 'number'
definition: @aql
active_users
| where(events.event_date matches @(last ${period}))
| where(events.session_count >= threshold)
;;
}
}

active_users_for('7 days', 5) // active_users_last_7_days
active_users_for('30 days', 10) // active_users_last_30_days

Compare to Jinja: no multi-layer parsing (Func → AML, not Jinja → YAML → SQL), no string concatenation bugs, and the IDE autocompletes parameter names because the language understands the function signature.

Constants

const default_currency = 'USD'
const high_value_threshold = 10000

const price_category_sql = @sql
select
case
when price >= high_value_threshold then 'Expensive'
else 'Cheap'
end as price_category
;;

metric high_value_orders {
label: 'High-Value Orders'
type: 'number'
definition: @aql
orders
| where(orders.amount >= high_value_threshold and orders.currency = default_currency)
;;
}

When the business redefines "high value" from $10,000 to $25,000, you change one constant. Every dashboard, every metric, every report that references it updates automatically.

The code sample below ties it all together: models, datasets, and metrics for calculating user conversion rates across multiple time granularities, built from modular components and composed on the spot.

Conclusion

We're not the first to recognize YAML's limitations for analytics. Even Looker, one of the pioneers of analytics-as-code, eventually abandoned their YAML-based LookML 1.0 in favor of creating their own proprietary language (LookML 2.0). As Looker's engineering team explained when announcing the change (see also Google's migration guide): "YAML has a lot of sharp edges... Building a parser with YAML wasn't sustainable... YAML is not a standard designed nor intended for data modeling."

However, while Looker solved YAML's ambiguity and tooling issues, LookML 2.0 remains essentially a markup language. It lacks true programmability features like variables, functions, and composition. AMQL goes further by being a fully programmable language that enables the code reuse and composition that analytics teams desperately need.

We designed AMQL to systematically address every limitation of YAML for analytics development. Where YAML fails due to its nature as a general-purpose data format, AMQL succeeds by being purpose-built for analytics from the ground up through three foundational principles:

  • Clear, unambiguous syntax with first-class analytics concepts that eliminate YAML's interpretation ambiguities
  • Robust static type system that enables intelligent IDE features like autocomplete, inline documentation, and instant error detection
  • Native programmability to enable true code reuse and composition without dangerous workarounds

Your analysts deserve something better than YAML.

tldr: AMQL vs YAML

CategoryYAMLAMQL
Type Safety❌ Schemaless, no type checking
❌ Silent errors at runtime
❌ Ambiguous value interpretation
✅ Strong static type system
✅ Compile-time error detection
✅ Explicit, unambiguous syntax
Developer Experience❌ No autocomplete or inline docs
❌ Manual trial-and-error workflow
❌ Whitespace indentation errors
✅ Smart autocomplete & suggestions
✅ Inline documentation
✅ Instant error feedback
Reusability❌ No variables, functions, or abstractions
❌ Forces code duplication
❌ Dangerous templating workarounds
✅ Variables, functions, and composition
✅ True code reuse and modularity
✅ Native programmability
Analytics Focus❌ General-purpose data format
❌ No understanding of analytics concepts
✅ Purpose-built for analytics
✅ First-class analytics objects
Parsing Consistency❌ Different results across parsers
❌ Version-dependent behavior
❌ "Helpful" but unpredictable interpretation
✅ Consistent, predictable parsing
✅ Analytics-specific language constructs

Bottom Line: YAML is a data serialization format trying to be an analytics language. AMQL is a true programming language designed specifically for analytics development.


Open Markdown
Let us know what you think about this document :)