ruby / haml renderer
The templates in the
web framework use a restricted
"dumb" subset of Haml.
A custom renderer replaced the haml gem.
It parses and renders only the subset grammar,
rejecting everything else at parse time.
Two files, ~1,200 lines total:
lib/haml/subset.rb: parser and rendererlib/haml/expr.rb: constrained expression evaluator
Why replace the gem
The haml gem evaluates arbitrary Ruby at render time.
Templates can call methods, access constants, assign variables.
The dumb template subset
uses none of this.
A custom renderer enforces the subset by construction:
if the parser has no node type for a construct,
it can't appear in templates.
This removes Ruby eval from the rendering path
and drops a dependency.
The prerequisite was restricting all ~360 templates to the dumb subset first: moving method calls, hash access, and formatting into handlers with Data.define structs. Once every template conformed, a CI linter prevented regressions, and the renderer could be built against a frozen grammar.
Parser
Haml::Subset.new(source, path:) parses source into a tree
at construction time. Lines are classified into node types:
:doctype # !!!
:comment # -# ...
:filter # :javascript, :css
:if # - if expr
:elsif # - elsif expr
:else # - else
:each # - collection.each do |item|
:render # = render "name", key: value
:output # = expr (HTML-escaped)
:raw_output # != expr (raw)
:tag # %tag.class#id{ attrs }
:text # static text
There is no :eval or :ruby node.
A method call, constant reference, or variable assignment
has no node type to parse into — the parser raises.
Indentation determines nesting. The parser walks lines at each indent level and recursively parses children:
private def parse(lines, base_indent, from, to)
nodes = []
i = from
while i < to
line = lines[i]
stripped = line.lstrip
indent = line.length - line.lstrip.length
if stripped == ""
i += 1
next
end
if indent != base_indent
raise "#{@path}:#{i + 1}: expected indent #{base_indent}, got #{indent}"
end
# Find children (lines with greater indent)
child_end = i + 1
while child_end < to
next_line = lines[child_end]
next_stripped = next_line.lstrip
if next_stripped != ""
if (next_line.length - next_stripped.length) <= indent
break
end
end
child_end += 1
end
node = parse_line(stripped, indent, lines, i + 1, child_end)
nodes << node
i = child_end
end
nodes
end
The parser extracts tag name, classes, ID, and attributes:
private def parse_tag(stripped, indent, lines, child_from, child_to)
rest = stripped.dup
tag_name = "div"
classes = []
id = nil
if rest.start_with?("%")
m = rest.match(/\A%(\w[\w-]*)/)
tag_name = m[1]
rest = rest[m[0].length..]
end
while rest.match?(/\A[.#]/)
if rest.start_with?(".")
m = rest.match(/\A\.(-?[a-zA-Z_][\w-]*)/)
classes << m[1]
rest = rest[m[0].length..]
elsif rest.start_with?("#")
m = rest.match(/\A#([a-zA-Z_][\w-]*)/)
id = m[1]
rest = rest[m[0].length..]
end
end
# Reject inline content — inner content must be on a new line
rest = rest.strip
if rest != ""
raise "#{@path}: inline content on tags is not allowed: #{stripped}"
end
children = parse(lines, indent + 2, child_from, child_to)
{ type: :tag, tag: tag_name, classes: classes, id: id,
children: children }
end
Inline content on tags is banned.
%h1 Title must be written as:
%h1
Title
This simplifies parsing (every tag's content is children) and makes the structure explicit.
Expression evaluator
Expressions in = field, - if expr, and #{}
interpolation go through Haml::Expr,
a recursive-descent parser with a constrained grammar:
expr ::= or_expr
or_expr ::= and_expr ('||' and_expr)*
and_expr ::= not_expr ('&&' not_expr)*
not_expr ::= '!' not_expr | cmp_expr
cmp_expr ::= primary (('==' | '!=') primary)?
primary ::= STRING | NUMBER | BOOL | NIL | field_access
field_access ::= IDENT ('.' IDENT)*
Three stages — tokenize, parse to AST, evaluate:
def self.eval_string(src, ctx)
tokens = tokenize(src.strip)
parser = Parser.new(tokens)
node = parser.parse_expr
evaluate(node, ctx)
end
The evaluator walks the AST and resolves values against a context object:
def self.evaluate(node, ctx)
case node[:type]
when :string then node[:value]
when :number then node[:value]
when :bool then node[:value]
when :nil then nil
when :field then eval_field(node[:parts], ctx)
when :cmp
left = evaluate(node[:left], ctx)
right = evaluate(node[:right], ctx)
case node[:op]
when "==" then left == right
when "!=" then left != right
end
when :and
evaluate(node[:left], ctx) && evaluate(node[:right], ctx)
when :or
evaluate(node[:left], ctx) || evaluate(node[:right], ctx)
when :not
!evaluate(node[:operand], ctx)
end
end
Field access resolves through send:
def self.eval_field(parts, ctx)
val = ctx.send(parts[0].to_sym)
i = 1
while i < parts.length
val = val.send(parts[i].to_sym)
i += 1
end
val
end
= data.name becomes ctx.send(:data).send(:name),
which works with Data.define structs
and singleton methods on the context.
The evaluator also handles hash literals with ** splat
(for tag attributes), array literals, interpolated strings,
and function calls (for ViewHelper methods on the context).
Rendering
The renderer walks the AST, appending HTML to a buffer:
private def render_nodes(nodes, buf, ctx, partial_renderer)
i = 0
while i < nodes.length
node = nodes[i]
case node[:type]
when :doctype then buf << "<!DOCTYPE html>\n"
when :comment then nil
when :text then buf << Expr.interpolate(node[:text], ctx) << "\n"
when :output
buf << escape_val(Expr.eval_string(node[:expr], ctx)) << "\n"
when :raw_output
buf << Expr.eval_string(node[:expr], ctx).to_s << "\n"
when :render
buf << render_partial_call(node[:expr], ctx, partial_renderer)
when :tag
render_tag(node, buf, ctx, partial_renderer)
when :filter
render_filter(node, buf, ctx)
when :if
chain = [node]
while i + 1 < nodes.length &&
(nodes[i + 1][:type] == :elsif || nodes[i + 1][:type] == :else)
i += 1
chain << nodes[i]
end
render_conditional(chain, buf, ctx, partial_renderer)
when :each
render_each(node, buf, ctx, partial_renderer)
end
i += 1
end
end
= expr always HTML-escapes. != expr outputs raw.
Handlers pre-escape anything that needs !=.
Tags emit opening and closing HTML with escaped attributes:
private def render_tag(node, buf, ctx, partial_renderer)
tag = node[:tag]
attrs = build_attrs(node, ctx)
attr_str = attrs.map { |k, v|
if v == true
" #{k}"
else
" #{k}=\"#{CGI.escapeHTML(v.to_s)}\""
end
}.join
if VOID_ELEMENTS.include?(tag)
buf << "<#{tag}#{attr_str}>\n"
return
end
if node[:children] != []
buf << "<#{tag}#{attr_str}>\n"
render_nodes(node[:children], buf, ctx, partial_renderer)
buf << "</#{tag}>\n"
else
buf << "<#{tag}#{attr_str}></#{tag}>\n"
end
end
Context
Templates receive data through a context object. Locals become singleton methods:
private def make_context(locals, context: nil)
env = context || Object.new
locals.each do |k, v|
env.define_singleton_method(k) { v }
end
env
end
Loop variables clone the context to avoid mutating the parent binding:
private def clone_context(ctx, name, value)
child = ctx.clone
child.define_singleton_method(name.to_sym) { value }
child
end
Without cloning, - items.each do |data| would overwrite
the parent data local for the rest of the template.
The renderer IS the linter
Before the custom renderer, a regex-based linter scanned templates for banned constructs. It was incomplete: regexes can't parse nested expressions, and every new violation pattern needed a new rule.
The custom renderer replaced the linter.
Templates are parsed at boot by cache_all.
A template with a construct outside the subset
crashes the process before it serves a request.
If it parses, it's in the subset. If it's not in the subset, it doesn't parse.