Something has finally clicked when writing a programming language and hopefully it stays clicked. I thought I had this feeling when I worked through Crafting Interpreters but it didn't quite stick. I got part of the logic but not the entire thing.
I watched the scanning portion of the Pikuma Course and that seemed to really firm things up. I didn't follow the entire course but the videos on scanning and the bonus videos on pratt parsers really helped explain things to me such that I feel much more at ease writing a parser and a treewalk interpreter. I need to work on emitting the bytecode and actually running the vm but that's further down the line.
I think another thing that really helped was that I stuck was javascript and ran it directly in the browser. This made it so that I could begin to see how I could incorporate this in my websites rather than have it be on the server. There was something about having it in my browser that I really liked.
The big lesson here was to recognize that the scanner requires 4 specific functions and once you have those functions, everything else falls into place.
Advance - Returns the current character and moves the pointer
Peek - Looks at the current character without moving anything
Match - Compares the current character to something and if it matches, moves the pointer
Lookahead - Look at the next character coming up
The Pikuma videos showed these 4 functions and I think that really helped solidify things in my mind.Once I could see these functions, scanning the text and returning tokens was straightforward.
The scanner at this point I think is something one could automate and I can see why programmers write tools to automate this. This is quite fun and I imagine pretty fulfilling to get a token stream based on some language specification.
Once you have tokenized your source file, the next step is to convert this to a tree. This is the job of the parser.
For parsing, I followed the crafting interpreters article on parsing. Writing a recursive descent parser was easy and everything sort of falls out of the grammer of the language. Once again, parsing also has some functions that make things drastically easier to hold in your head.
Advance - similar to above
Peek - similar to above
Match - similar to above
Lookahead - similar to above
Previous - This returns the previous token
Expect - Now we throw an error if we don't get the expected character. This also consumes the expected character
The key thing to note during parsing is that there is two type of things we can process. There are expressions and there are statements. Statements usually have a starting token and then we do something similar to a regex where we match against a structure to build a tree for that statement.For expressions, we need to recursively build out the expression taking into account all the operators and their binding powers. The expressions is what uses recursive descent.
After I got the recursive version working, I went back and got the pratt parser version working. This was helped quite a bit by first watching the videos that Pikuma has on pratt parsers and then reading the Matklad article on pratt parsers. Forcing myself to re-read the article until I understood how it mapped to the version I had coded helped me get a better understanding.
Matklad consolidates the functions of led and nud whereas seeing it seperately helped me understand what was going on. But after the understanding, I can see why keeping consolidate it is better as it simplifies the code.
The pratt parser is much more elegant in my eyes, the code is smaller and I think it's a bit easier to get what is going on and step through things. However it is a bit of a mind bend when compared to the recursive descent.
I don't understand the table part of the crafting interpreters article on pratt parsing. That function table seems to be throwing me, I'll need to spend time to figure out how that maps to the code I've written as I'm sure it's just a rewrite of what I've already done.
Once you have the AST of the program, the next step is to interpret things.
The biggest difference here that I made was one that I'm not sure where it came from.
I had each node execute it's own data. This meant that each node of the AST was responsible to evaluate itself and this resulted in a pretty straightforward implementation of the interpreter.
I was a bit surprised at how quickly everything began to work.
This also made it quite easy to add code generation as similar to the interpreter, I could have each node write out the opcodes to a stack.
I'm not sure if this is the right way to do it but it does work and I did see other examples elsewhere of this style.