Writing the Parser
January 18, 2024
I'm moving on to the interpretter, maybe I'll go back and add variable expressions but for the time being, you can't pass in variables in a math expression (ex. a + b) in arguments. you can store it as a variable and then pass it in if you really need to. I just want to move onto something new and since I'm having a hard time determining how to add this, it's not that important. To the interpretter we go!
January 9, 2024
It was a rest day at the gym today so I was able work a little bit on the parser at lunch. I got the variable expressions done but they're still missing parenthesis allowed. So MAYBE i'll fix that but no promises. IL'm going to try to do named arguments later today and then i think it'll be done! Will run my example programs through and make sure they're parsing correctly. Then, the interpretter! I think this is where the hard part comes in so we'll see. I'm giving myself a month to do this so hopefully I can stick to that and not take a month off again.
I was able to get everything done on my one example program but when I ran the other example programs, it seems to not work. It looks like when I have process expressions (for example: result(fibonacci(n - 1) + fibonacci(n - 2));) I guess I'll be finishing this tomorrow sadly :(
January 8, 2024
I can't believe it's been over a month since I've worked on this project. The holidays were super busy with people visiting and staying over. I hope that finishing this project won't take more than a month. Anyways, I got back into the code and added the declared variable parsing. I still have to do the called variable parsing and then the function and helper calls. I also have to add identifiers to be a part of expressions. Tomorrow, I'll try to get the identifiers added to expressions. Also, I started reading about the interpretter and realized that skipping the visitor pattern was not good, so I think I have to do that after the parser! I'm excited to continue this project. It's already kinda in depth with my parser being 500 lines so it would be really cool to get this done and under my belt :)
November 29, 2024
Today I got a type declarations complete. I think the only thing I have left is arguments (and maybe func vs helper calls but i wonder if i can just have them be characterized as one). I'm pretty happy almost being done this part as it took me quite some time. But sadly, I'm not really sure where to go from here so I think it might be more of a struggle moving forward. But either way, I think I can make my repo public after completing this part. I also still struggle with making the data structure linear in the form of a list of nested like I do now. Maybe I'll perform some analysis about the speed and determine where I stand with this appraoch. It kind of sucks that I can only work on this project once like every two weeks or so but whatever - better late than never.
November 15, 2024
Finished the conditional statment bodies, and also allowed variables to store expressions. I realized I messed up my lexer and have to specify the difference between a type declaration and the variable name. This will help me allow boolean values into expressions. I don't think I'll do string and characters yet until I know how the next step looks. Tomorrow I'll try to add these booleans to expressions, and then maybe tackle method arguments.
November 14, 2024
Today I added conditional statements ot the parser. I still have to add another layer to allow for 'and' and 'or' blocks. Having my parser be recursive descent is a little bothersome because it needs to have some nodes empty to fit the grammar correctly. I wonder if Pratt parsing or using a stack like in the shunting yard algorithm would also yield the same wastefulness. It's been bothering me a little bit how ineffictient this code is in terms of memory. I am using nested lists to organize my nodes. I wonder if it would be better to store them in a one dimensional list while keeping track how many children each node has so that I can navigate the list. It most certainly would be better for performance, but the trade off is the complexity and difficulty implementing. I am not just scratching the surface of creating a language so my thinking is, is learn how to do it the simple way first, and then if it's something I continue having passion about, then, make it well made. I know that's the logical path but again, it really bothers me how bad this code is. Anyways, I have to still parse if statement bodies, method arguements, expressions, and figure out how to differentiate between the two different method calls I have. I wonder if I can add scope to this layer, or if scope comes into play later on.
October 19, 2024
After a few days of debugging, I was able to track down all the memory leaks. It's amazing how many I had for such a short sample of code (about 72 memory leaks for 20 lines of code). Now, I can continue other parts of the parser like implementing returns, method calls, variable calls, and data types other than ints. I'm a little confused about what the next step is after this. I saw something about a visitor being the next step, but I have to understand better how it fits into the grand scheme of things. Also, I would ideally love if my language was fast. To do this, I would have to take keep my program in the L1 cache. However, I decided that first I'll get the parser to work :D and after 72 memory leaks, I'm a little discouraged about my ability to properly handle memory. But I think after I get my parser to work, then I'll go back in and see if I can try to use the concept of keeping everything together in memory to prevent the os from having to go into different caches to fetch instructions.
October 13, 2024
I took a major step back and went from the start to make sure my lexer doesn't have any memory leaks. Next, I'll rebuild the parser from the ground to make sure that it also has no memory leaks. I think this is what's causing a lot of my issues since it seems like it'll randomly stop working on certain runs. I kind of wish that I wrote this in Java but honestly I'm not a coward and I want my language to be fast so I have to suck it up. Hopefully, I get past this obstacle at a certain point and it'll be easier and easier (I know naive since this is one of the easier parts of writing a language).
October 11, 2024
The parser is slowly but surely coming along. It made me realize how long it's been since I coded in C and I completely forgot about memory management lol. Coding in 3 different programming languages in a day has also began making my head spin. At work we use Java and C# and then writing this parser in C is all becoming a blur of what I'm allowed to do in each language. I've been working top down in creating parsing and I have to take a step back to fix my memory allocation. I downloaded Dr. Memory to help me make sure I have no leaks, and then I also have to observe how memory is allocated. I wonder if it's better to point the program to the location of the values or to duplicate the values into a new spot. For debugging the second approach would probably be better but I think it would be slower. I'm not sure if it's my leaky code or Visual Studios, but the debugger for lists in Visual Studio is not doing me any favors, so I've been tracking the memory itself to make sure I'm not messing anything up. Def a challenge for me.
October 9, 2024
I've been struggling a little bit with how I want to structure my AST nodes and also memory management in C. I decided on implementing a union type that will allow you to have the data structure depending on what kind of AST node you have. However, the debugger on Visual Studio makes this a little difficult to see since it shows all the union options (makes sense since they all point to the same location but still annoying since there's going to be about 30 different node types). The other thing is, I'm not sure how I want to structure methods, should they all be nested under a general program node or should I create a list of methods?
October 2, 2024
The parser is when it seems like it's getting a little bit more complex to write a program. Thankfully, I wrote out the grammar when determining how to structure my language, but I read about how grammar can be ambigious so I have to go back in to double check that it is correct. Also, I have about 20 different definitions in my grammar and all of those have to be translated to code so I can form AST. So while maybe not complex, lots of paths that need to be covered.
The other thing I'm curious about, is different types of trees. I saw someone online use concrete syntax trees, but to my understanding it's better to use AST because it gets rid of the tokens that the computer doesn't really care about, like parenthesis. Also, I saw the choice between parsing top-down or bottom-up. I have to research more about how these differences impact interpretting, when we get to that. Something my grandpa once taught me is "it's better to plan for 5 hours and do for 5 minutes than plan for 5 minutes and do for 5 hours".
UPDATE: seems like top-down is easier to understand/implement but bottom-up accounts for more edge cases and is more performant even though it's harder to set up. Therefore, I will be doing bottom-up parsing.