elmiinated unparse1 from interface. Explain the plan! Note that LOADFUNC needs a terminator in the concrete syntax. EOF , end , $ , anything that won’t ever be mistaken for an instruction.
start with optional , not many (at this stage, the recursion is too much)
other combinator: bracketed?
outcome 4, “use your unparser as evidence” not strong enough. We need to ask for line numbers.
(This module is also available in PDF, in two-column and one-column formats.)
This week you’ll design your assembly language and implement an unparser for it. And you’ll start learning about parsing combinators and about the Universal Forward Translator. The parsing stuff is a lot to swallow, but you’ll have two weeks to assimilate it; your full parser isn’t due until module 4.
The theme for the week is designing concrete syntax that promotes readability.
The professional video, which starts at 28:14, needs a little explanation:
Talk code | Our code |
---|---|
pure | succeed |
(:) | curry op :: |
runParser | produce (roughly) |
some | many1 |
satisfy | sat |
Lab proceeds in three parts. Course staff will keep time and will move people along in lock step.
Now define the parsing combinator for choice:
val <|>: 'a producer * 'a producer -> 'a producer
You may define an ML function using fun , or you may complete this algebraic law:
(p1 <|>p2) ts == .
Plan on writing one case for each form of result from p1 ts .
If you finish before the course staff moves the group to step 7, define the parsing combinator for satisfaction:
val sat : ('a -> bool) -> 'a producer -> 'a producer
Again you may write code or you may complete this law:
sat predicate p ts = .
Again you will have to consider all possible forms of p ts .
val many : 'a producer -> 'a list producer (* zero or more *) val many1 : 'a producer -> 'a list producer (* one or more *) val optional : 'a producer -> 'a option producer (* zero or one *)
Looking ahead to step 8, here’s an example use of many to capture the arguments to a function application in Scheme: 1
the "(" >> curry APPLY exp many exp the ")"
And here’s an example of optional to get the optional else clause in a C if statement:
the "if" >> curry3 C_IF (the "(" >> exp the ")") statement optional (the "else" >> statement)
Complete these algebraic laws:
many p == . many1 p == . optional p == .
val count : int -> 'a producer -> 'a list producer (* exactly N *) count 0 p == . count (n+1) p == .
val : 'a producer * 'b producer -> 'a producer val >> : 'a producer * 'b producer -> 'b producer
In the first combinator, the result from the 'a producer is ignored is used and the result from the 'b producer is ignored ; in the second combinator, the result from the 'b producer is ignored is used and the result from the 'a producer is ignored . Using , , and curry , among other functions, complete the following algebraic laws:$>
p q == . p >> q == .
type reg = int val reg : reg producer (* parses a register number *) val int : int producer (* parses an integer literal *) val string : string producer (* parses a string literal *) val name : string producer (* parses a name *) val the : string -> unit producer (* one token, like a comma or bracket *)
As functions you can use with , I provide these encoders:$>
type opcode = string type instr (* instruction *) val eR0 : opcode -> instr val eR1 : opcode -> reg -> instr val eR2 : opcode -> reg -> reg -> instr val eR3 : opcode -> reg -> reg -> reg -> instr
The parsers and functions above are meant to be combined with , , , <|>, and the other parsing combinators. As an example, here’s a parser that recognizes the assembly-language instruction r3 := r1 + r2 :$>
eR3 "add" reg the ":=" reg the "+" reg
If git gives you trouble, please post on the #git-fu channel in Slack.
> make . > ../../bin/uft Usage: ../../bin/uft - [file] where and are one of these languages: ho! Higher-order vScheme with mutation ho Pure, higher-order vScheme fo Pure, first-order vScheme cl Pure, first-order vScheme with closure and capture forms kn K-Normal form vs VM assembly language vo VM object code
> echo '@ print 33' | uft vs-vo | svm nil
> echo '@ print 33' | uft vs-vs an unknown assembly-code instruction
If you compiled with MLton, you’ll use uft.opt instead of uft .
For the LOADFUNC form, I recommend that you not require persons to count the number of instructions in the instruction list. Instead, try bracketing the instructions in some way.
Function unparse1 requires a ton of case analysis: pattern matching on opcode names. Every instruction your VM accepts must unparse to a readable assembly-language syntax.
Test your unparser by creating a test file unparse.vs . In this file, using my “passthrough parser”, or using your own parser if you have completed step 18, write at least half the instructions that your VM recognizes. (If you use my passthrough parser, you will be limited to instructions in formats R0 through R3 , like halt and print but not loadliteral or check .) Test that these instructions unparse correctly by running
uft vs-vs unparse.vs
and comparing the output with what you expect from your grammar.
The other half of your instructions should be implemented in your unparser, but they can remain untested until next week.
Extend instr with at least one case of concrete syntax in your assembly language. A good case to implement would be concrete syntax for loading a literal value into a register. You’ll add more cases next week.
provide comp150vm hw03 src
provide comp150vm reflection03 REFLECTION
Occasionally I’ll suggest reading that may enrich your view of programming-language implementation. Here are three works on combinator parsing:
Toy parser and unparser for assembly code. You will build a real unparser this week and a real parser next week.
Before looking at this file, look at and understand asm.sml and object-code.sml .
The internal representation of assembly code.
The internal representation of virtual object code, plus a module that unparses this representation to its on-disk form.
The interface to our version of the error monad.
The interface to our library of parsing combinators.
Exports function StringEscapes.quote , which will be useful for unparsing string literals.
Defines function Impossible.impossible , which you’ll call if you ever need to signify an assertion failure.
This file lists, directly or by reference, all the ML code used to build the UFT, except main.sml . It’s worth looking at now because it will give you some sense of the pieces that make up the UFT.
My lexical analyzer for assembly language: it uses a combinator parser to convert a sequence of characters into a sequence of tokens. It may be worth looking at as an example of combinator parsing. And if you are ambitious, you may wish to modify it.
The heart of the assembler. Function Assembler.translate translates an assembly-language program into object code. Right now it’s just a stub: it extracts object-code instructions embedded in assembly language, and if it encounters anything else, it barfs. It will suffice to allow you to assemble code that doesn’t use labels.
Next week you will write a real version of Assembler.translate .
Lists all the languages the UFT will eventually understand.
The Universal Forward Translator. It is given a source language and a target language, and it figures out how to translate the source to the target.
An environment abstraction. You’ll use it next week to save the meanings of assembly-language labels.
File error-module.sml implements the error monad, and producer-functor.sml implements our parsing combinators. You will have learned all the good stuff either in lab (parsing combinators) or preparing for lab (error monad).
File main.sml reads the command line and calls the UFT; main.cm builds it.
File impossible.cm works around an annoyance in Moscow ML’s build process.
File ioutil.sml defines a few convenience functions that I wish were in the Standard ML basis library, but aren’t.
Learning outcomes available for project points:
Learning outcome available for depth points:
Each of the numbered learning outcomes 1 to N is worth one point, and the points can be claimed as follows:
val commaSep : 'a producer -> 'a list producer (* `commaSep p` returns a parser that parser a sequence of zero or more p's separated by commas *)
> echo "r1 := r2 + r3" | uft vs-vs r1 := r2 + r3
Your most interesting design choice will probably be the notation you want to use for global variables. Here are your options: