Some of my random thoughts...

Building a DSL in PHP, Part 1: The Pipeline

Posted: October 20th, 2010 | Author: Doug | Filed under: PHP Snippet | 2 Comments »

According to c2 a Domain Specific Languages (DSL) is:

… an application programming interface (hosted as either a programming language or executable specification language) whose grammar matches the problem domain, not the base language’s standard grammar.

DSLs are great meta programming tools, allowing both programmers to build multiple instances of the same class of system much faster while also allowing the end users to validate and often contribute to the implementation.

This post is the first in a series of posts about how to build a simple DSL in PHP. By the end of the series we’ll have built a compiler for a DSL that represents multiple choice question tests. Why bother to build a DSL for multiple choice question tests you may ask? Everyone has taken a multiple choice question sometime in their life so it’s pretty universal, it’s small enough to build and show all the code within blog posts and it’s a example of end users (educators in this case) can use DSLs to define their own applications.

What’s a pipeline?

Some languages, like Ruby and Lisp, have flexible enough grammars to allow DSLs to be directly implemented as methods or macros within the language. With PHP we have no such luxury. If we want a DSL in PHP we have to build our own outside the language by implementing a compiler pipeline like this:

DSL Code ⇨ Tokenizer ⇨ Parser ⇨ Code Generator ⇨ PHP Code

Each step the pipeline takes input from the previous step and generates output that is used as input in the next step. The input to the pipeline is a string of DSL code and the output of the pipeline is the PHP code representation of the DSL code.

Looking within the pipeline, the tokenizer takes the DSL code and breaks it up into atomic pieces that are consumed by the parser. Each piece the tokenizer generates consists of a token and a lexeme along with line number and column position information for error messages. The token is a type description, like STRING or NUMBER and the lexeme is the value of the type, like “my pencil is big and yellow” or 12.67.

The parser takes each token/lexeme and validates the DSL syntax while at the same time building an in-memory representation of the DSL code in an abstract syntax tree (AST). Any syntax errors found during this step will terminate the pipeline and return an error message pointing to the line and column where the error was found.

The final part of the pipeline, the code generator, takes the AST and emits a PHP representation of the DSL code, often calling internal common runtime functions to implement the DSL constructs. In order to take advantage of PHP opcode caches like APC its a good idea to write the PHP code to an external file, unless your DSL input is constantly changing. You don’t want to incur the cost of compiling your DSL on each PHP script execution.

The actual PHP code for the pipeline follows the diagram above, using exceptions to jump out of the pipeline flow:

	try {
		$phpCode = DSL::GenerateCode(DSL::Parse(DSL::Tokenize($dslCode)));
	}
	catch (Exception $e) {
		echo $e->getMessage();
	}

We’ll use static class functions to both easily namespace the functions and show that each step in the pipeline uses a clean input/output API and does not need to know the internal state of the previous step in the pipeline.

That’s all there is to the pipeline. In future posts we’ll break down each step in the pipeline and build the code needed to implement it. It should be fun!

Next: Part 2 – Choosing the DSL syntax


2 Comments on “Building a DSL in PHP, Part 1: The Pipeline”

  1. 1 tonyl said at 3:34 pm on October 20th, 2010:

    Interesting, I’ll be following the series. I just started to learn more about compilers and interpreters (more then just using them and basic understanding of what are the steps they take) and this will help.

  2. 2 Doug Martin, Software Guy » Blog Archive » Building a DSL in PHP, Part 2: DSL Syntax said at 7:32 am on November 2nd, 2010:

    [...] the first post I outlined the steps to build a multiple choice test DSL in PHP. In this post I’ll describe [...]


Leave a Reply