boginka's corner

Writing the Parser

November 15, 2024

Finished the conditional statment bodies, and also allowed variables to store expressions. I realized I messed up my lexer and have to specify the difference between a type declaration and the variable name. This will help me allow boolean values into expressions. I don't think I'll do string and characters yet until I know how the next step looks. Tomorrow I'll try to add these booleans to expressions, and then maybe tackle method arguments.

November 14, 2024

Today I added conditional statements ot the parser. I still have to add another layer to allow for 'and' and 'or' blocks. Having my parser be recursive descent is a little bothersome because it needs to have some nodes empty to fit the grammar correctly. I wonder if Pratt parsing or using a stack like in the shunting yard algorithm would also yield the same wastefulness. It's been bothering me a little bit how ineffictient this code is in terms of memory. I am using nested lists to organize my nodes. I wonder if it would be better to store them in a one dimensional list while keeping track how many children each node has so that I can navigate the list. It most certainly would be better for performance, but the trade off is the complexity and difficulty implementing. I am not just scratching the surface of creating a language so my thinking is, is learn how to do it the simple way first, and then if it's something I continue having passion about, then, make it well made. I know that's the logical path but again, it really bothers me how bad this code is. Anyways, I have to still parse if statement bodies, method arguements, expressions, and figure out how to differentiate between the two different method calls I have. I wonder if I can add scope to this layer, or if scope comes into play later on.

October 19, 2024

After a few days of debugging, I was able to track down all the memory leaks. It's amazing how many I had for such a short sample of code (about 72 memory leaks for 20 lines of code). Now, I can continue other parts of the parser like implementing returns, method calls, variable calls, and data types other than ints. I'm a little confused about what the next step is after this. I saw something about a visitor being the next step, but I have to understand better how it fits into the grand scheme of things. Also, I would ideally love if my language was fast. To do this, I would have to take keep my program in the L1 cache. However, I decided that first I'll get the parser to work :D and after 72 memory leaks, I'm a little discouraged about my ability to properly handle memory. But I think after I get my parser to work, then I'll go back in and see if I can try to use the concept of keeping everything together in memory to prevent the os from having to go into different caches to fetch instructions.

October 13, 2024

I took a major step back and went from the start to make sure my lexer doesn't have any memory leaks. Next, I'll rebuild the parser from the ground to make sure that it also has no memory leaks. I think this is what's causing a lot of my issues since it seems like it'll randomly stop working on certain runs. I kind of wish that I wrote this in Java but honestly I'm not a coward and I want my language to be fast so I have to suck it up. Hopefully, I get past this obstacle at a certain point and it'll be easier and easier (I know naive since this is one of the easier parts of writing a language).

October 11, 2024

The parser is slowly but surely coming along. It made me realize how long it's been since I coded in C and I completely forgot about memory management lol. Coding in 3 different programming languages in a day has also began making my head spin. At work we use Java and C# and then writing this parser in C is all becoming a blur of what I'm allowed to do in each language. I've been working top down in creating parsing and I have to take a step back to fix my memory allocation. I downloaded Dr. Memory to help me make sure I have no leaks, and then I also have to observe how memory is allocated. I wonder if it's better to point the program to the location of the values or to duplicate the values into a new spot. For debugging the second approach would probably be better but I think it would be slower. I'm not sure if it's my leaky code or Visual Studios, but the debugger for lists in Visual Studio is not doing me any favors, so I've been tracking the memory itself to make sure I'm not messing anything up. Def a challenge for me.

October 9, 2024

I've been struggling a little bit with how I want to structure my AST nodes and also memory management in C. I decided on implementing a union type that will allow you to have the data structure depending on what kind of AST node you have. However, the debugger on Visual Studio makes this a little difficult to see since it shows all the union options (makes sense since they all point to the same location but still annoying since there's going to be about 30 different node types). The other thing is, I'm not sure how I want to structure methods, should they all be nested under a general program node or should I create a list of methods?

October 2, 2024

The parser is when it seems like it's getting a little bit more complex to write a program. Thankfully, I wrote out the grammar when determining how to structure my language, but I read about how grammar can be ambigious so I have to go back in to double check that it is correct. Also, I have about 20 different definitions in my grammar and all of those have to be translated to code so I can form AST. So while maybe not complex, lots of paths that need to be covered.

The other thing I'm curious about, is different types of trees. I saw someone online use concrete syntax trees, but to my understanding it's better to use AST because it gets rid of the tokens that the computer doesn't really care about, like parenthesis. Also, I saw the choice between parsing top-down or bottom-up. I have to research more about how these differences impact interpretting, when we get to that. Something my grandpa once taught me is "it's better to plan for 5 hours and do for 5 minutes than plan for 5 minutes and do for 5 hours".

UPDATE: seems like top-down is easier to understand/implement but bottom-up accounts for more edge cases and is more performant even though it's harder to set up. Therefore, I will be doing bottom-up parsing.