Ohm: The Parsing Toolkit That Finally Makes Grammars Fun
Parsing is one of those problems that sounds simple until you actually try it. You need to turn a string of characters into something structured and meaningful, whether that string is a programming language, a configuration file, a math expression, or some custom notation you just invented for your domain. Most developers reach for regex first, hit a wall when nesting enters the picture, and then stare down the barrel of writing a recursive descent parser by hand. Ohm offers a better path.
ohm-js is a parsing toolkit built on Parsing Expression Grammars (PEGs) that takes one opinionated stance and rides it all the way home: grammars describe syntax, and nothing else. Semantic actions, the code that actually does something with the parsed structure, live in a completely separate layer. This separation sounds like a small thing, but it changes everything about how you think about, write, and reuse parsers.
Why Ohm Is Different
The parsing world is not short on options. PEG.js, Nearley, Chevrotain, and others all have their followings. What makes Ohm stand out is not raw performance or download count but a philosophical clarity that permeates the entire design.
- Grammars are pure. No JavaScript embedded in rule definitions. A grammar in Ohm reads like a formal specification, which means the same grammar can power a parser, a syntax highlighter, a compiler, and an interpreter without changing a single rule.
- Left recursion works. Most PEG parsers choke on left-recursive rules, forcing you into awkward workarounds for something as basic as left-associative operators. Ohm handles it natively.
- Grammars can inherit. Using an object-oriented extension model, you can create a child grammar that inherits from a parent, overriding or adding rules. Build a base language and extend it into dialects.
- Syntactic rules skip whitespace automatically. Rules that start with an uppercase letter handle whitespace for you. Lowercase rules match exactly. This one convention eliminates a staggering amount of tedious boilerplate.
- Zero dependencies. The entire package ships with nothing else attached.
Getting Ohm Into Your Project
Install via npm:
npm install ohm-js
Or with yarn:
yarn add ohm-js
Ohm ships ESM, CommonJS, and browser bundles, so it works in Node, Deno, or directly in a <script> tag. TypeScript type definitions are included out of the box.
Writing Your First Grammar
Hello, Parser
The core workflow in Ohm has three steps: define a grammar, match some input, and apply semantic actions to the result. Here is the simplest version of that:
import * as ohm from 'ohm-js';
const g = ohm.grammar(String.raw`
Greeting {
message = "Hello" name
name = letter+
}
`);
const match = g.match('Hello World');
console.log(match.succeeded()); // true
console.log(match.failed()); // false
The grammar Greeting has two rules. message expects the literal string "Hello" followed by a name, and name is one or more letters. Because message starts with a lowercase letter, it is a "lexical" rule and matches exactly. But wait, "Hello World" has a space between the two parts. If message were an uppercase Message, the syntactic rule would skip whitespace automatically. Let us fix that:
const g = ohm.grammar(String.raw`
Greeting {
Message = "Hello" name
name = letter+
}
`);
const match = g.match('Hello World');
console.log(match.succeeded()); // true -- whitespace is handled
That uppercase M on Message is all it takes. No _ = space* boilerplate, no explicit whitespace tokens scattered through every rule.
Building an Arithmetic Evaluator
Grammars become useful when you pair them with semantic actions. Here is a classic example: a calculator that parses and evaluates arithmetic expressions with correct operator precedence.
import * as ohm from 'ohm-js';
const g = ohm.grammar(String.raw`
Arithmetic {
Exp = AddExp
AddExp
= AddExp "+" MulExp -- plus
| AddExp "-" MulExp -- minus
| MulExp
MulExp
= MulExp "*" PriExp -- times
| MulExp "/" PriExp -- divide
| PriExp
PriExp
= "(" Exp ")" -- paren
| number
number = digit+
}
`);
const semantics = g.createSemantics().addOperation('eval', {
Exp(e) { return e.eval(); },
AddExp(e) { return e.eval(); },
AddExp_plus(left, _op, right) { return left.eval() + right.eval(); },
AddExp_minus(left, _op, right) { return left.eval() - right.eval(); },
MulExp(e) { return e.eval(); },
MulExp_times(left, _op, right) { return left.eval() * right.eval(); },
MulExp_divide(left, _op, right) { return left.eval() / right.eval(); },
PriExp(e) { return e.eval(); },
PriExp_paren(_open, e, _close) { return e.eval(); },
number(_digits) { return parseInt(this.sourceString, 10); },
});
const match = g.match('(1 + 2) * 3 + 4');
console.log(semantics(match).eval()); // 13
Notice the -- plus, -- minus suffixes in the grammar. These are case names, and they let you write a dedicated semantic action for each alternative branch. The action AddExp_plus fires only when the plus alternative matches. This is much cleaner than receiving a generic list of children and figuring out which branch was taken.
Also notice the left recursion: AddExp = AddExp "+" MulExp. In most PEG parsers, this would cause an infinite loop. In Ohm, it just works.
Parsing a Custom Config Format
Real-world parsing often means dealing with domain-specific formats. Suppose you have a simple key-value configuration language:
import * as ohm from 'ohm-js';
const g = ohm.grammar(String.raw`
Config {
Document = Entry+
Entry = key "=" value ";"
key = alnum+
value = stringVal | numberVal | boolVal
stringVal = "\"" (~"\"" any)* "\""
numberVal = digit+ ("." digit+)?
boolVal = "true" | "false"
}
`);
const semantics = g.createSemantics().addOperation('toObject', {
Document(entries) { return Object.assign({}, ...entries.toObject()); },
Entry(key, _eq, value, _semi) { return { [key.toObject()]: value.toObject() }; },
key(chars) { return this.sourceString; },
value(v) { return v.toObject(); },
stringVal(_q1, chars, _q2) { return chars.sourceString; },
numberVal(whole, _dot, frac){ return parseFloat(this.sourceString); },
boolVal(b) { return this.sourceString === 'true'; },
});
const input = `
host = "localhost";
port = 8080;
debug = true;
`;
const match = g.match(input);
console.log(semantics(match).toObject());
// { host: 'localhost', port: 8080, debug: true }
The grammar reads almost like a specification document. Anyone coming to this code for the first time can understand the syntax rules without knowing JavaScript. That is the whole point of separation.
Leveling Up
Grammar Inheritance
One of Ohm's most powerful features is grammar extension. Say you have a base expression language and want to create an extended version with exponentiation:
import * as ohm from 'ohm-js';
const grammars = ohm.grammars(String.raw`
BaseCalc {
Exp = AddExp
AddExp
= AddExp "+" MulExp -- plus
| MulExp
MulExp
= MulExp "*" PriExp -- times
| PriExp
PriExp
= "(" Exp ")" -- paren
| number
number = digit+
}
ScientificCalc <: BaseCalc {
PriExp
+= PriExp "^" number -- power
}
`);
const scientific = grammars.ScientificCalc;
const semantics = scientific.createSemantics().addOperation('eval', {
Exp(e) { return e.eval(); },
AddExp(e) { return e.eval(); },
AddExp_plus(left, _op, right) { return left.eval() + right.eval(); },
MulExp(e) { return e.eval(); },
MulExp_times(left, _op, right) { return left.eval() * right.eval(); },
PriExp(e) { return e.eval(); },
PriExp_paren(_open, e, _close) { return e.eval(); },
PriExp_power(base, _op, exp) { return Math.pow(base.eval(), exp.eval()); },
number(_digits) { return parseInt(this.sourceString, 10); },
});
const match = scientific.match('2 ^ 3 + 1');
console.log(semantics(match).eval()); // 9
The <: syntax declares that ScientificCalc inherits from BaseCalc. The += operator adds a new alternative to an existing rule without replacing the original. You can also use := to completely override a parent rule. This composability means you can build a family of related languages from shared building blocks.
Multiple Semantic Operations on One Grammar
Because grammars and semantics are separated, you can define multiple operations for the same grammar. A single arithmetic grammar might have an eval operation for evaluation, a prettyPrint operation for formatting, and a compile operation for code generation:
const semantics = g.createSemantics();
semantics.addOperation('eval', {
AddExp_plus(left, _op, right) { return left.eval() + right.eval(); },
number(_digits) { return parseInt(this.sourceString, 10); },
// ...remaining actions
});
semantics.addOperation('prettyPrint', {
AddExp_plus(left, _op, right) {
return `(${left.prettyPrint()} + ${right.prettyPrint()})`;
},
number(_digits) { return this.sourceString; },
// ...remaining actions
});
const match = g.match('1 + 2 * 3');
const s = semantics(match);
console.log(s.eval()); // 7
console.log(s.prettyPrint()); // (1 + (2 * 3))
This is where the separation of concerns truly pays off. In a parser with embedded actions, you would need a completely different grammar or an elaborate visitor pattern to achieve the same reuse.
Incremental Parsing for Editor Integration
If you are building a code editor or an IDE plugin, re-parsing the entire document on every keystroke is wasteful. Ohm provides a Matcher API that supports incremental parsing: when the input changes, it reuses partial results from the previous parse, making small edits nearly instantaneous:
const matcher = g.matcher();
matcher.setInput('1 + 2');
let result = matcher.match();
console.log(result.succeeded()); // true
matcher.replaceInputRange(4, 5, '3');
result = matcher.match();
console.log(result.succeeded()); // true -- only re-parsed what changed
Under the hood, this leverages the memoization table from packrat parsing. The Matcher keeps the memo table between calls and invalidates only the entries affected by the edit. For large documents, this can mean the difference between real-time feedback and a noticeable lag.
A Few Things to Know
Ohm uses packrat parsing, which trades memory for guaranteed linear-time complexity. For most use cases, this is a great deal. But if you are parsing megabytes of input or need the absolute fastest throughput, a hand-optimized recursive descent parser or a tool like Chevrotain will be faster. The team is actively working on a WebAssembly backend for v18 (currently in beta at 18.0.0-beta.8) that aims to close this performance gap significantly.
Error messages on parse failure are generally helpful but not spectacular. You get the rightmost failure position and a description of what was expected, which is usually enough to point users in the right direction. Full error recovery (like continuing to parse after an error) is not yet built in.
The library only parses strings. If you need to work with token streams or binary data, you will need to look elsewhere. And while Ohm supports parameterized rules and built-in helpers like ListOf<elem, sep>, it does not yet support semantic predicates (embedding arbitrary JavaScript conditions directly in grammar rules).
The Grammar Teacher You Wish You Had
Ohm occupies a unique spot in the JavaScript parsing landscape. It is not the fastest parser, and it does not have the largest ecosystem of community grammars. What it has is clarity. The grammar DSL is genuinely pleasant to read and write. The separation of syntax from semantics means your grammars stay portable and your action code stays organized. Grammar inheritance lets you build language families without copy-pasting rules. And the online editor at ohmjs.org gives you a visual window into exactly how your parser processes every character of input, which is invaluable when you are learning or debugging.
With 1.5 million weekly downloads, zero dependencies, solid TypeScript support, and active development on a WASM-powered v18, ohm-js is not a research toy. Shopify uses it in production. The repo includes grammars for full ECMAScript and TypeScript, proving it scales to real language complexity.
If you need to parse something more structured than regex can handle and you want the experience to be enjoyable rather than punishing, Ohm is the toolkit worth reaching for. Define your grammar, wire up your semantics, and let the packrat do the rest.