Recursive Descent Parser - Java
Published:
A Java program that analyzes the validity of an input files’ lexical syntax for a hypothetical imperative programming language.
Introduction
What is a Recursive Descent Parser? A recursive descent parser has a subprogram for each nonterminal in its grammar. Many of these subprograms are recursive - hence the name - and produces a parse tree in top-down nature.
In this application, I have created a program to analyze the syntax of input programs for a hypothetical imperative programming language. The Extended Backus-Naur Form (EBNF) grammar for this imperative programming language is below:
<program> -> program begin <statement_list> end
<statement_list> -> <statement> {;<statement>}
<statement> -> <assignment_statement> | <if_statement> | <loop_statement>
<assignment_statement> -> <variable> = <expression>
<variable> -> identifier
<expression> -> <term> { (+|-) <term>}
<term> -> <factor> {(* | /) <factor> }
<factor> -> identifier | int_constant | (<expr>)
<if_statement> -> if (<logic_expression>) then <statement>
<logic_expression> -> <variable> (< | >) <variable>
<loop_statement> -> loop (<logic_expression>) <statement>
Where:
- Separators:
|
–> OR{}
–> optional statement with zero or more repetitions()
–> encapsulates an OR selection;
–> statement terminator
- Keywords:
program
begin
end
if
then
loop
- Operators:
+
–> Addition-
–> Subtraction*
–> Multiplication/
–> Division>
–> Greater than<
–> Less than- In this grammar, there are only two logic expressions
>
and<
.
- In this grammar, there are only two logic expressions
- Identifiers:
- A string value that begins with a letter followed by 0 or more letters and/or digits
The nonterminal symbols include:
<program>
<statement_list>
<statement>
<assignment_statement>
<variable>
<if_statement>
<loop_statement>
<expression>
<logic_expression>
<term>
<factor>
<expr>
The program reads in an input test .txt
file and determines whether or not the file contains syntax errors.
Tools & Techniques
- Java
- Eclipse IDE
Implementation
The implementation of the program follows the two steps below:
- Implement a lexical analyzer as a subprogram of the main program. Each time the lexical analyzer is called, it should return the next lexeme and its token code.
- Implement a parser based on the above Extended Backus-Naur Form (EBNF) rules. Create a subprogram for each nonterminal symbol which should parse only sentences that can be generated by the nonterminal.
Test Cases
The following test case files have been included:
BeginMissing.txt
–> The syntax error in this test file is on line 2 where the keywordbegin
should be.
ExtraSemicolon.txt
–> The syntax error in this test file is on line 7 where there should not be a semicolon.
IfMisspelled.txt
–> The syntax error in this test file is on the beginning of line 6, where the keywordif
is misspelled.
MissingThen.txt
–> The syntax error in this test file is at the end of line 4, there should be athen
keyword.
NoErrors.txt
–> There are no syntax errors in this test file.
SemicolonMissing.txt
–> The syntax error in this test file is a missing semicolon at the end of line g
The only file that runs correctly is the NoErrors.txt
file.
Run
To execute the program, follow the instructions below:
- Clone the git repo:
https://github.com/erincameron11/recursive-descent-parser
- Open up the files as a Java project in the IDE of your choice.
- Run the
RecursiveDescentParser.java
file and follow the commands in the console.- _Note: When entering a file name in the console, do not put the file extensions (in this case,
.txt
). For example:NoErrors
is a valid file name entry in the console.
- _Note: When entering a file name in the console, do not put the file extensions (in this case,