:: krowemoh

Saturday | 28 DEC 2024
Posts Links Other About Now

previous
next

Rust Lox

2024-12-14

Unfinished and likely to change.

I'm going to build the lox language interpreter in Rust and document how I got about things. This will follow Crafting Interpreters, A Bytecode Virtual Machine relatively closely with some modification for how I think. I've already done this in C and documented my process and hopefully I can do it in rust successfully. In other posts, I've mentioned that my real reason is to try and implement BASIC well enough to build a real Pick flavor.

The first thing I want to do though which I think the book doesn't do is make the bytecode vm part order independent. The book assumes you have already done the tree walk interpreter and so it starts a bit in the middle of things.

I'm going to move some of the early stuff around so that we can start with a empty file and get to something runnable.

Chapter 1 - Installing Rust

The first thing we need to do is install rust and create our project. I'm assuming you already know enough rust to be dangerous but if not, well, good luck.

The install command is:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

You can visit Install Rust for more information.

If rust was installed properly, you should be able to do the following to test it:

rustc --version

This should show the rust version, which as of writing is currently:

rustc 1.82.0 (f6e511eec 2024-10-15) (Arch Linux rust 1:1.82.0-2)

When you install rust, cargo is also installed. Cargo is the rust package and project manager. You will create rust projects using cargo and add packages with it as well.

If everything went well, then we should have rust and cargo ready to go.

cargo new rlox

This will create a starter rust project that we can then run:

cd rlox
ls

We should see the following files inside our newly create rust project:

Cargo.lock  Cargo.toml  src/  target/

Cargo.lock - This file maintains a list of all of our direct and transitive dependencies.

Cargo.toml - This tracks our direct dependencies and any build specific information. For example if we need specific features from dependency or we have a special target that we want to compile for, we would add it here.

src/ - This is where our source code will live.

target/ - This is where any compiled files will live. These will be build both in development and when we go to production, we will build the final binaries into this folder.

There is also a hidden file .gitignore that was created. This excludes certain rust specific folders like target/. We have this so that we don't accidently commit binaries to git as they are generated from the source code and can be generated when we need them.

We can create these files manually and start our project from an empty folder but using cargo is a quick way to get a basic project going.

As part of the basic project, cargo has already created a src/main.rs file with some content. If you open it up, you will see the following:

fn main() {
   println!("Hello, world!");
}

We can then run our starter project by donig the following:

cargo run 

This should result in the following:

...
Hello, World

With that we are done. We now have rust installed, a project started with Cargo and very simple program compiling. This means that we can finally get started!

Chapter 2 - Arguing

Sorry, I lied. We aren't actually getting started. We are going to spend some time wiring things up so that we have the rlox compiler working the way I want it to work.

rlox is going to be a command line utility and so it should act like any other command line tool.

For example, we want it to give an error if it is run with an incorrect number of arguments. We want it to have a help.

We also want it to be able to compile a file given a file path. It should throw an error if that file doesn't exist or if we don't have permission to read it.

We want it to trigger a repl when no arguments are given.

There might be a few more things we want the rlox command to do but for now we have somewhere to start.

The first step is to build our rust project and try and pass arguments.

cargo run -- test.lox

Don't worry about test.lox, it doesn't exist yet.

We use -- syntax to pass command line arguments to our rlox binary. However we aren't running the binary directly as we are currently in development and the binary is constantly getting rebuilt.

If we really wanted, we could run the binary by doing:

target/debug/rlox test.lox 

However if we make change, we would need to compile it and then run it.

By using the cargo command with -- we can do the compile and run in the same step.

Now that we know how to pass arguments into rust, the next step is to actually get those arguments in the program.

Open up src/main.rs and add the following:

use std::{ env, process };

fn main() {
   let args: Vec<String> = env::args().collect();
   
   if args.len() > 2 { 
      eprintln!("Invalid number of arguments, check --help for information.");
      process::exit(64);
   }
   
   println!("{:?}",args);
}

We should then be able to try the following:

cargo run --
cargo run -- test.lox
cargo run -- test.lox test1.lox test2.lox
cargo run -- --help

and get the following responses:

["target/debug/rlox-test"]
["target/debug/rlox-test", "test.lox"]
Invalid number of arguments, check --help for information.
["target/debug/rlox-test", "--help"]

The key thing to note here is that we using the env and process modules of the rust standard library. This way we can get the arguments that a command receives and we can also exit out of our command line utility while setting the shell return values.

I'm using sysexits.h from FreeBSD to get the error return codes.

We currently only throw an error when we get more than 2 arguments.

We are using eprintln to send output to stderr.

Now that we have the basic structure down and we can read in the arguments, we can add some argument handling.

The first thing to add to our main function is if the number of arguments is just 1.

   if args.len() == 1 { 
      println!("Starting repl.");
      process::exit(0);
   }

We are going to fill this in a later chapter but for now this is enough. We also exit here with 0 because presumably the repl exits cleanly. We may change this down the line.

The next thing we need to handle is if the --help argument was passed in:

if args[1] == "--help" {
   println!("Usage:");
   println!("rlox - Run the repl");
   println!("rlox <file_path> - Compile and run a file");
   process::exit(0);
}

We are using println here as we want this to go to stdout. We also set the error to 0 as this is a valid argument and we have handled it properly.

Now time to handle getting a file path as an argument:

...
use std::path::Path;
...

let file_path = &args[1];

if !Path::new(file_path).exists() {
   eprintln!("File does not exist: {file_path}");
   process::exit(66);
}

We add in the Path module from the standard library at the top.

We get the file path and then we check to see if the file exists.

If the file doesn't exist, we will print a message to stderr and then we will exit with a specific error code.

If the path was good, we can then move to read the file:

let source = match fs::read_to_string(file_path) {
   Ok(source) => source,
   Err(_) => {
      eprintln!("Failed to read file: {file_path}");
      process::exit(66);
   }
};

We can have an error when reading the file, for example if we don't have permission for the file, we should let the user know we failed to read the file.

Once again we would print a message to stderr and we would exit with a specific code.

If the file was good, we now have the contents of the file in a variable called source.

For now we can just print out the contents:

println!("{source}");
process::exit(0);

We also exit our rlox binary setting the exit code to 0.

We can now go back and run the previous cargo commands and we should get some helpful error messages.

If we create a test.lox file and put some content in it, we should also see the contents printed to the screen using our rlox command.

> cargo run --
Starting repl.

> cargo run -- test.lox
x = 1 + 1

> cargo run -- test.lox test1.lox test2.lox
Invalid number of arguments, check --help for information.

> cargo run -- --help
Usage:
...

With that we are now done with some of the boilerplate. If you read crafting interpreters, you would notice that this part isn't covered in the first chapter of part 2 of the book. This because part 2 of the book starts with building the bytecode chunks and the testing of that part is done by manually creating bytecode chunks.

This part that I have gone over is something that was handled in Chapter 3 - Scanning on Demand.

I really liked the way part 1 started where you build out an interpreter first and then slowly get into the weeds. I ended up starting this way but it could be that I'm going to run into some problems further ahead.

Anyway, now that we have the basic structure of the project done and we have command line tool that works, we can now get some real language development.

Chapter 3 - Scanning

The first thing we will do is create an interpret function which will take in our source code and compile it and then run it.

This will also be where we introduce our second file, a lib.rs file. We will use main.rs as our driver program that handles command line arguments but the core of our language will go into the library file.

Create src/lib.rs and add the following to it:

pub fn interpret(source: String) {
   println!("{source}");
}

We then update main.rs. We first add a use statement to bring in the library and we then update the println of source to pass source to out newly created interpret function.

use rlox::interpret;
...
   interpret(source);
...

We should be able to run our cargo commands and everything should still be the same.

We now have the structure that I will be using for the rest of the project. We could continue to split things out into different files but I find it easier to hold everything in my head when everything is in a single file.

The first thing we'll do is have our interpret function return if the source code passed in had any errors or if it successfully compiled and ran.

In lib.rs, we want to add the following:

pub enum InterpretResult {
   Ok,
   CompileError,
   RuntimeError,
}

pub fn interpret(source: String) -> InterpretResult {
   println!("{source}");
   return InterpretResult::Ok;
}

We then update main.rs with the following:

use rlox::{ InterpretResult, interpret };

...
   let _result = match interpret(source) {
      InterpretResult::Ok => process::exit(0),
      InterpretResult::CompileError => process::exit(65),
      InterpretResult::RuntimeError => process::exit(70),
   }
...

We now will set a different exit code basied on what we get back from our interpret function.

Now back to lib.rs. We now have a interpret function that takes in some source code as a string. The first step is to compile this source file and then we will run it.

Let's add just the compile step for now:

pub fn interpret(source: String) -> InterpretResult {
    let bytecode = compile(source);
    return InterpretResult::Ok;
}

fn compile(source: String) -> Vec<u8> {
   let bytecode: Vec<u8> = vec![];
   println!("{source}");
   return bytecode;
}

Our compile function is going to return a series of bytecode operations which we will in the future pass to our run function. We are using a Vecor of u8s but this may change in the future.

Now we can fill in the first part of our compile function. Before we can really do anything we need to go character by character and tokenize our source code.

This means that we need to break on whitespace and operators so that the compiler sees whole words or tokens rather than each character.

This is where the scanner comes in. The scanner will take in the raw source code and create a tokenized form. This tokenized form is then used to generate the bytecode.

We will create a Scanner struct which will hold our scanner logic.

#[derive(Debug)]
struct Scanner {
    start: u8,
    current: u8,
    line: u8,
    source: String,
}

impl Scanner {
    fn new(source: String) -> Scanner{
        return Scanner {
            start: 0,
            current: 0,
            line: 1,
            source: source,
        };
    }
}

We add the Debug trait to the Scanner as we want to be able to print out if we need to.

We also create a new function for the Scanner so that we can initialize it.

Now that we have the scanner, we also need to now create the Token struct. We want the scanner to return us tokens that we will then use to generate bytecode.

To do this, we will create a Token that will hold some information. Similar to the Scanner, the Token item will not contain actual strings but it will contain the starting position and length of a lexeme.

The goal is to scan the following source code:

print 1 + 1

This should give us 4 tokens in total. We should get the print, 1, + and 1. In the future we will have this emit bytecode but for now we will simply print each token as we get them.

We first set up the token types that we want:

#[derive(Debug)]
enum TokenType {
    LeftParen, RightParen,
    LeftBrace, RightBrace,
    Comma, Dot, Minus, Plus,
    SemiColon, Slash, Star,
    
    Bang, BangEqual, 
    Equal, EqualEqual, 
    Greater, GreaterEqual,
    Less, LessEqual,
    
    Identifier, TokenString, Number,
    
    And, Class, Else, False,
    For, Fun, If, Nil, Or,
    Print, Return, Super, This,
    True, Var, While,
    
    Error, EOF
}

These are all the things that we can create tokens for in our program. Each part of our language will get a token.

The next step is to create our Token object:

#[derive(Debug)]
struct Token {
    kind: TokenType,
    start: u8,
    length: u8,
    line: u8,
}

Similar to our Scanner, the Token object only holds indicies and lengths. It also stores the type of token it is.

Now we have a Scanner object that will manage where in the source code it is and we have a Token object that we can generate for each part of the source code.

Now we can write our scanToken function which will go through the source code and return back tokens.