boginka's corner

Writing the Lexer

September 24, 2024

Started writing the lexer yesterday into today. It's not necessarily hard, but there's a lot of code, needing to cover a lot of cases. I am able to tokenize a simple program that prints out a string; however, I need to still do reserved keywords and multi character operators (>=, !=, etc.). There's not much to learn from the process of tokenization. The main jist of it is:

  1. Create a "scanner" to take in a filename as an argument to your program and output a buffer (a string in memory)
  2. Read each character and create a token with it based on type

I've been following Crafting Interpretters and a YouTube series to create the lexer. The videos are a very "bare-boned" implementation of a language, while Crafting Interpretters gives me the additional details I need to make it fancy. For example, the video series doesn't have key words or multi operators in it. It also only has two types and two operators. On the other hand, Crafting Interpretters suggested adding line number to the token so that errors can be easily tracked. So, I think it is good grounds for me to be able to rediscover C (the videos help with that) and actually creating a well made lexer (thanks to the book).

September 26, 2024

Finished the lexer today. There's one bug I have with doubles where the terminal prints out the correct value, but the debugger on Visual Studio does not. Looks like I am not reallocating the double variable properly. I have to do some research about the next steps. I know forming AST from these tokens is the next step, but determining the order in which to put the tokens is something I'm unsure about. I'll read about the method tomorrow and since I already determined the grammar for my language, I hope it's not too hard.