I'm currently using ChatGPT to learn how to write a language. I figure a document would be helpful to look back on.
The plan is to write something that looks like BASIC using Pest as the parser library. For now it will run interpreted but the goal is to use inkwell to generate code.
01. Creating the project and getting PRINT to work with numbers and strings
02. Adding assignments and expressions
03. Adding binary operations with a pratt parser
04. Adding floats to the language
05. Adding concatenation
06. Adding conditionals
07. Adding boolean operators
08. Adding semicolons to allow multiple statements
09. Adding for loops
10. Adding the LOOP statement
Day 1 was spent getting things set up. This day was spent building out a language that can print out a string.
Day 2 was spent adding variables and expressions to the language. For now it is the simplest expressions of plain strings and numbers. The variables are put in a global hashmap.
ChatGPT wants to use the grammar to do precedence whereas the docs recommend pratt parsers and give some examples. I'll lean on the docs for this step as the docs show a worknig calculator example which I should be able to adapt into my current language. The goal is to be able to do arithmetic in my language.
A bit of gap but day 3 went relatively well. The great things about using ChatGPT here was that I had a history of everything tied to this project. I would actually love this for firefox and searching. Some side bar that I can use to trigger new queries or to continue off of a jumping point. Most of the time it'll be useless as each query is independent but for this learning it was extremely helpful.
I could look at the code I wrote and see what questions I had asked last. This gave me a quick lay of the land of the codebase and where I was in the process and I could jump back into things.
The actual advancement here is that I have arithmetic working now using the examples from the docs. It was relatively straightforward to add the calculator logic to my existing program and now I can do assignments and math in addition to printing things out.
Pest definitely hides some things from me which is what I wanted but it does feel a bit cheaty that I didn't have to understand or figure out the pratt parser and its just something I can use. To be fair, the fact that an abstraction exists probably means it is something rote. Still probably good to know and understand. I'll need to come back to this and handroll my own scanner and parser but the goal is to get something working right now.
I think the next step is to add support for floating point as currently I only have integer math. This is probably going to require some parsing changes to handle having decimals in numbers.
The parsing changes were pretty simply, I was able to get the basic idea from chatgpt and it made sense. I added a new atom to my grammar that would let numbers be either floats or integers. Testing this on the pest website was great and helped to get a feel for how to structure things. I figured out the grammar by doing it on the website.
I then added a new Expression type called Float and renamed my Number to Integer. This was a quick process to get the parser hooked up. I could then get to the point where I could print out floats.
Once the floats were printing, the next step was to modify the evaluate function so that it could type promotion. I was thinking about forcing everything to be floats but chatgpt explained that this would be unexpected for most people and wasteful in general. Good enough reasons for me that I went ahead with getting type promotion working. I also made it handle strings the way it does in BASIC which is if the number isn't parseable, the language spits out an error but defaults to using 0.
This was the fun part of coding where you try to think through all the different things that could happen when you have integers, floats and strings getting mixed up. The rust pattern matching came in pretty clutch here and I think this enum pattern matching style is interesting. Not a fan of some of the nesting but its a good first step.
Now I have a usefulish calculator.
The below code is now capable of being executing:
PRINT "WORLD"
PRINT 1 + 3
PRINT 1 + 3.5
PRINT 1 + "3" / 2
PRINT "1" + "3"
X = "1" + "3.5asd"
PRINT X
PRINT 3 / 3.0
The next thing I want to is string concatenation. It would be handy to do:
PRINT '3 + 4 = ' : 3 + 4
The first step was to add concatenation to the grammar. I gave it the lowest precedence.
Once it was added the grammar, I then added it to the parser.
After that, it was a simple job to add it to the evaluator. I implemented the Display trait so that I could do .to_string
on my Value objects.
Now I can print and concatenate strings which is great. The next step is to add conditionals.
This was the hardest day so far, this is also the day that I don't fully grokk. Adding conditionals to pest was straightforward and my initial implementation of conditionals was simplistic.
I copied what I did in a previous project where I built an array of conditions and blocks and then loop around them to evaluate it. ChatGPT critiqued this as I knew it was wrong but didn't know what the actual solution was. It quickly showed me that I could instead just have the else clause be a list of statements and run the build against that statement. This would gracefully handle the else if and else.
This was neat to see laid out and the implementation got drastically better. I could tell by looking at the code that things were now much cleaner than my own solution.
The other issue was that I expanded the grammar to handle single line if statements and multiline if statements by adding options. ChatGPT critiqued this that there was ambigiouity and that there could be subtle errors around the edge cases. It recommended making things more explicit and creating two new rules. I am however lazy so I've ignored this for now. I think it would require me to mess with the parser again which I don't really want to do.
Now that conditionals are working, the next step is to add boolean operators.
Mostly a goose chase today. I wanted to add semicolons so that I could have multiple statements on the same line. I tried to do it multiple ways with multiple rewrites but nothing came out well. Even with AI help, this is is one of those thinsg that is probably outside my range and understanding so using ChatGPT isn't actually helpful here.
It has some ideas but it's not doing quite what I want. Trying to add this also made it clear why language development is said to be O(n^2). Adding semicolons causes interactions with the if statements. The fact that things don't work well means that my grammar isn't capable enough yet.
I finally got tired of it and reset back to the day 4 code. I'll need to study how other grammars structure having single lines with multiple statements and how they interact with if statements. I think it might be that I'm doing something strange with BASIC.
This day also highlighted that I really want a checkpoint system. I want to say checkpoint and have it automatically make a backup up of the directory. it should also prompt me for a comment and then I should be able to see it in the browser.
I wired up the boolean operators pretty quickly and just as quickly realized that I had a fundemental error somewhere in my code. Expression parsing was broken from earlier that I hadn't catched and was finally biting me in the ass.
This is where I dumped almost all my code to chatgpt to see if it can figure out where I went wrong and unfortunately, it had no idea. It was swearing up and down that my code was fine and that things should be working. I had thought maybe it had something to do with parenthesis and chatgpt correctly kept telling me that part of the logic was fine.
It's funny because I harped on that for a bit before I let it go and chatgpt was relatively patient.
The ultimate solution involved going back to the pest tutorial for calculators and seeing how the example did it and how I had adapted it. Slowly it dawned on me that I had though WHITESPACE was just a variable name that was considered good practice rather than being a special keyword inside of pest. I had created my own WS variable as that was shorter to type and so I had littered my grammar with WS everywhere. This mixed with NEWLINE also getting short cutted meant that I had screwed the fundemental grammar.
This was a very silly thing to do as it was entirely my fault. I didn't read the tutorials or documentation carefully enough and had sort of winged my way through it. It was just by luck my test cases were passing and I was able to move from each step. The great thing here is that I won't be making that mistake again!
I spent some time fixing my grammar and working with pest rather than against and with that everything started falling into place. My boolean operators started working and my if statements started being a bit more robust. The single like if statement and the multiline if statement were hacky before but are now more stable. I'm not entire convinced that it is fundementally correct but it is better now than before.
Fixing my grammar also allowed me to add semicolons to the language without much pain. Whereas earlier I had to write maybe 3-4 implementations and none worked right, this time I was able to write it in one shot and get a working version. This also worked for conditionals without issue which was a great sign.
Now that I have booleans working, the next thing to finally focus on is looping. I want to add the regular LOOP statement and the FOR statement. I think these two will be the finaly step to making this into an actual programming language.
Now that the grammar has a better base where I'm not actively messing things up, it was much easier to extend it to add FOR loops. The grammar and the parsing step were relatively straightforward with the difficulty coming from the execution step. I have a lot of seemingly junk code and lots of nested pattern matching which I'm still trying to figure out if I can simplify.
However the great news is that I have the FOR loop working and I also added optional STEP and UNTIL functions. The FOR loop code also has quite a bit of repetition which I've gotten some feedback from chatgpt and the only thing it's really telling me is to make things into functions.
I was hoping that I was doing things a bit too verbosely but it looks like it's supposed to be that verbose. Though I have a feeling that it's this way because the base started off this way. I'm sure chatgpt would give me better rust code if I had started with better code.
This gets at the difficulty of using chatgpt as a tutorial engine, I had hoped to learn a bit more or at least have more guidance but so far it hasn't done much of either in the recent sections.
The best thing might be to see some grammars for other languages and to also see how other rust code is written to evaluate an AST.
I'm currently creating custom rules for different parts of a grammar rule instead of trying to make things generic. I'm not sure if this is the right way to go about it but it does click in my head a bit better. I will need to study some grammars and pest specifically to see if this is the right way to do this or if I'm causing myself other issues.
The next step is to add LOOP statements. I had originally wanted to do this on the same day but the FOR loop had taken longer than expected. Hopefully the LOOP statement is a bit more straightforward.
Adding the loop statement was much more straightforward than the for loop as I didn't need to handle both integers and floats. I'm also happier with the way the grammar was written and how straightforward it looks.
The loop code to parse and run the AST also looks good which to me means that I'm on the right track. It does however mean that my for loop handling is probelmatic because it doesn't have the same logic that the LOOP statement does. I was expecting it to be almost the same but it ended up being pretty different.
I'll need to redo the FOR loop at some point as I think there has to be a better way to do things.
With LOOP out of the way, I have the core parts of a programming language done.
Now the next step is to add subroutines.
I did some clean up today with the Value enum. I added in all the operator methods and ordering methods so that I could now use Value directly without having to use match statements to get the data out. This has drastically cleaned up my code now and I'm much happier the for loop logic. Originally I had to match all the variations of float and integers and cast things but once I got the operators implemented for my custom enum, now I can simplify that logic.
This also simplified my evaluation logic which had the same problem. I really should have done the set up much earlier as the code would have looked much better.
Now the biggest eyesore is the parsing logic which doesn't look quite right to me. Hopefully I learn something that helps with that.
I do have quite a bit of waste going on with clones and also using unreachables a bit too much. I already forced my enum to end up in a specific way so I feel like the match should let it through. I want to match on just Integer and Float as I have removed the Str part by doing a parse. It might be I need to create a new type for this specifically.
I have a rough idea of how I want to handle the SUBROUTINE statement and the CALL statement. I'm hoping that adding things to the language starts becoming easier as it still feels cumbersome right now.
I tried to use ChatGPT to debug a grammar problem but it was ultimately worse than rubberduck. It really can't replace having an actual expert help but maybe one day or at least that seems to be the pitch. I had dumped my grammar and example code but it gave me fixes that made no changes to the code. I'm not sure what it was trying to do but it was giving me stuff that I had already given it which is a bit frustrating.
Ultimately ChatGPT didn't help me much so far. I didn't even use it for the operator logic which I'm certain it could do but for some reason the code didn't look quite right. It was easier to pull up the docs and see the basic example than to try and work out if chatgpt was giving me a real example or if it was giving me something it thought was an example. At least with stackoverflow I can be reasonably sure someone compiled the answer at some point.
I added support for the CALL statement. Slow going as I've taken a large break now as the project is getting boring. I think I have enough of the language now that it's a bit of a draw the rest of the owl moment that I would like to skip.
The code is in a clean enough state that I can jump in and get the context pretty quickly which is nice but it does mean I don't feel the urge to really do anything with this project.
I want to finished the SUBROUTINE statement as that way the CALL statement actually has something to do. Once I finish that, I'll need to think about what I want to do with this project.