:: krowemoh

Monday | 13 JAN 2025
Posts Links Other About Now

previous
next

Language Development with Pest

2025-01-04

I'm currently using ChatGPT to learn how to write a language. I figure a document would be helpful to look back on.

The plan is to write something that looks like BASIC using Pest as the parser library. For now it will run interpreted but the goal is to use inkwell to generate code.

Possible Chapters

  1. Creating the project and getting PRINT to work with numbers and strings

  2. Adding assignments and expressions

  3. Adding binary operations with a pratt parser

  4. Adding floats to the language

  5. Adding concatenation

  6. Adding conditionals

  7. Adding boolean operators

  8. Adding semicolons to allow multiple statements

Day 1 - Setup and PRINTING

Day 1 was spent getting things set up. This day was spent building out a language that can print out a string.

Day 2 - Assignments and Expressions

Day 2 was spent adding variables and expressions to the language. For now it is the simplest expressions of plain strings and numbers. The variables are put in a global hashmap.

ChatGPT wants to use the grammar to do precedence whereas the docs recommend pratt parsers and give some examples. I'll lean on the docs for this step as the docs show a worknig calculator example which I should be able to adapt into my current language. The goal is to be able to do arithmetic in my language.

Day 3 - Binary Operations, Pratt Parser, Floats, Concatenation

A bit of gap but day 3 went relatively well. The great things about using ChatGPT here was that I had a history of everything tied to this project. I would actually love this for firefox and searching. Some side bar that I can use to trigger new queries or to continue off of a jumping point. Most of the time it'll be useless as each query is independent but for this learning it was extremely helpful.

I could look at the code I wrote and see what questions I had asked last. This gave me a quick lay of the land of the codebase and where I was in the process and I could jump back into things.

The actual advancement here is that I have arithmetic working now using the examples from the docs. It was relatively straightforward to add the calculator logic to my existing program and now I can do assignments and math in addition to printing things out.

Pest definitely hides some things from me which is what I wanted but it does feel a bit cheaty that I didn't have to understand or figure out the pratt parser and its just something I can use. To be fair, the fact that an abstraction exists probably means it is something rote. Still probably good to know and understand. I'll need to come back to this and handroll my own scanner and parser but the goal is to get something working right now.

I think the next step is to add support for floating point as currently I only have integer math. This is probably going to require some parsing changes to handle having decimals in numbers.

The parsing changes were pretty simply, I was able to get the basic idea from chatgpt and it made sense. I added a new atom to my grammar that would let numbers be either floats or integers. Testing this on the pest website was great and helped to get a feel for how to structure things. I figured out the grammar by doing it on the website.

I then added a new Expression type called Float and renamed my Number to Integer. This was a quick process to get the parser hooked up. I could then get to the point where I could print out floats.

Once the floats were printing, the next step was to modify the evaluate function so that it could type promotion. I was thinking about forcing everything to be floats but chatgpt explained that this would be unexpected for most people and wasteful in general. Good enough reasons for me that I went ahead with getting type promotion working. I also made it handle strings the way it does in BASIC which is if the number isn't parseable, the language spits out an error but defaults to using 0.

This was the fun part of coding where you try to think through all the different things that could happen when you have integers, floats and strings getting mixed up. The rust pattern matching came in pretty clutch here and I think this enum pattern matching style is interesting. Not a fan of some of the nesting but its a good first step.

Now I have a usefulish calculator.

The below code is now capable of being executing:

PRINT "WORLD"

PRINT 1 + 3
PRINT 1 + 3.5

PRINT 1 + "3" / 2
PRINT "1" + "3"

X = "1" + "3.5asd"
PRINT X
 
PRINT 3 / 3.0

The next thing I want to is string concatenation. It would be handy to do:

PRINT '3 + 4 = ' : 3 + 4

The first step was to add concatenation to the grammar. I gave it the lowest precedence.

Once it was added the grammar, I then added it to the parser.

After that, it was a simple job to add it to the evaluator. I implemented the Display trait so that I could do .to_string on my Value objects.

Now I can print and concatenate strings which is great. The next step is to add conditionals.

Day 4 - Conditionals

This was the hardest day so far, this is also the day that I don't fully grokk. Adding conditionals to pest was straightforward and my initial implementation of conditionals was simplistic.

I copied what I did in a previous project where I built an array of conditions and blocks and then loop around them to evaluate it. ChatGPT critiqued this as I knew it was wrong but didn't know what the actual solution was. It quickly showed me that I could instead just have the else clause be a list of statements and run the build against that statement. This would gracefully handle the else if and else.

This was neat to see laid out and the implementation got drastically better. I could tell by looking at the code that things were now much cleaner than my own solution.

The other issue was that I expanded the grammar to handle single line if statements and multiline if statements by adding options. ChatGPT critiqued this that there was ambigiouity and that there could be subtle errors around the edge cases. It recommended making things more explicit and creating two new rules. I am however lazy so I've ignored this for now. I think it would require me to mess with the parser again which I don't really want to do.

Now that conditionals are working, the next step is to add boolean operators.

Day 5 - Goose Chase

Mostly a goose chase today. I wanted to add semicolons so that I could have multiple statements on the same line. I tried to do it multiple ways with multiple rewrites but nothing came out well. Even with AI help, this is is one of those thinsg that is probably outside my range and understanding so using ChatGPT isn't actually helpful here.

It has some ideas but it's not doing quite what I want. Trying to add this also made it clear why language development is said to be O(n^2). Adding semicolons causes interactions with the if statements. The fact that things don't work well means that my grammar isn't capable enough yet.

I finally got tired of it and reset back to the day 4 code. I'll need to study how other grammars structure having single lines with multiple statements and how they interact with if statements. I think it might be that I'm doing something strange with BASIC.

This day also highlighted that I really want a checkpoint system. I want to say checkpoint and have it automatically make a backup up of the directory. it should also prompt me for a comment and then I should be able to see it in the browser.

Day 6 - Boolean operators and multiple statements

I wired up the boolean operators pretty quickly and just as quickly realized that I had a fundemental error somewhere in my code. Expression parsing was broken from earlier that I hadn't catched and was finally biting me in the ass.

This is where I dumped almost all my code to chatgpt to see if it can figure out where I went wrong and unfortunately, it had no idea. It was swearing up and down that my code was fine and that things should be working. I had thought maybe it had something to do with parenthesis and chatgpt correctly kept telling me that part of the logic was fine.

It's funny because I harped on that for a bit before I let it go and chatgpt was relatively patient.

The ultimate solution involved going back to the pest tutorial for calculators and seeing how the example did it and how I had adapted it. Slowly it dawned on me that I had though WHITESPACE was just a variable name that was considered good practice rather than being a special keyword inside of pest. I had created my own WS variable as that was shorter to type and so I had littered my grammar with WS everywhere. This mixed with NEWLINE also getting short cutted meant that I had screwed the fundemental grammar.

This was a very silly thing to do as it was entirely my fault. I didn't read the tutorials or documentation carefully enough and had sort of winged my way through it. It was just by luck my test cases were passing and I was able to move from each step. The great thing here is that I won't be making that mistake again!

I spent some time fixing my grammar and working with pest rather than against and with that everything started falling into place. My boolean operators started working and my if statements started being a bit more robust. The single like if statement and the multiline if statement were hacky before but are now more stable. I'm not entire convinced that it is fundementally correct but it is better now than before.

Fixing my grammar also allowed me to add semicolons to the language without much pain. Whereas earlier I had to write maybe 3-4 implementations and none worked right, this time I was able to write it in one shot and get a working version. This also worked for conditionals without issue which was a great sign.

Now that I have booleans working, the next thing to finally focus on is looping. I want to add the regular LOOP statement and the FOR statement. I think these two will be the finaly step to making this into an actual programming language.

Day 7