D
D
Denis Yakovenko2015-05-05 21:46:13
Parsing
Denis Yakovenko, 2015-05-05 21:46:13

Parser for if-else constructs based on context-free grammar in BNF?

Greetings!
I need to build a parser that would produce a parse tree for an expression like this:

if ( arr[i, j+5, j*j+1] >= 5 + a )
      a = a * b + 1;
else a = 2 * b;

This should be a method that would take a context-free grammar as a parameter and parse the input string according to it, and if the input string does not match the grammar, it would issue an error message.
First, you need to build a grammar in the form of Backus-Naur (BNF). Below is what I got:
Id        = {Letter}{AlphaNumeric}*
Integer   = {Digit}+
          
<Condition> ::= <Condition IF> <Condition ELSE>             

<IF> ::= 'if'

<ELSE> ::= 'else'

<Condition IF>   ::= <IF> '(' <Boolean Exp> ')' <Expression> '=' <Expression> ';'

<Condition ELSE> ::= <ELSE> <Expression> '=' <Expression> ';'


<Expression>   ::= <Expression> '+' <Mult Exp> 
                 | <Expression> '-' <Mult Exp> 
                 | <Mult Exp>
                 
<Array Expression> ::= <Expression>
                    | <Array Expression> ',' <Expression>

<Mult Exp>    ::= <Mult Exp> '*' <Negate Exp> 
                | <Mult Exp> '/' <Negate Exp> 
                | <Negate Exp> 

<Negate Exp>  ::= '-' <Value> 
                | <Value> 

<Value>       ::= ID 
                | Integer
                | '(' <Expression> ')'
                
<BoolOperator> ::= '>'|'<'|'>='|'<='|'=='

<Boolean Exp> ::= <Expression> | ID'['<Array Expression>']' <BoolOperator> <Expression> | ID'['<Array Expression>']'

Actually, the questions are:
1) Is the BNF structured correctly? If not, please help with the construction.
2) How is it more competent to transfer such a grammar task to C #?
3) How to build a parse tree in general? (there is a regular expression that parses all strings of the desired type into an array of tokens).
PS: Third party parsers/lexers cannot be used

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
so-olitary, 2015-05-14
@yakovenkodenis

1) Yes, you built the BNF correctly - there is nothing supernatural there.
But this Expression := Expression;, generally speaking, is still not true, because. a + b = 5cannot be assigned :
And for my taste:

Expression := Expression | ID=Expression
Block := Expression; | Expression; Block
ArrayExpression := { Block }
BoolExp, как и в С, можно отдельно не выделять, 0 - false, !0 - true

2) You can use a ready-made parser that splits into tokens itself according to the rules you generated (yacc, bison), you can write it manually.
3) How to do it manually:
I used to write a programming language interpreter.
BNF - there is an algorithm - actually how you will parse the file. You can look at your structure, it describes all sorts of options for what you can see in the program file. You read the file and check that you were right.
First, use regular expressions to highlight the "tokens" that you have a "word of the language":
1. (\d)+ - Integer
2. \w (\w | \d )* - identificator
3. + | - | * | / | % | := | < | > | <= | >= | != | == | ( | )  | ... - arithmetic sign
4. ; | [ | ] | { | } | ... - separators

Start parsing :
Introduce the concept of a functional operator, as well as the concept of an ID, which includes a variable name or an array reference, as a single "token" with a numeric value (or Expression) inside [...].
Think of a file as consecutive language instructions.
First you have to understand which one you're in - it's (1) variable declaration, (2) arithmetic expression, (3) conditional statement. How to understand it? Just read the words one by one. for example:
1. if this is an if - then
(a) follows ( Expression )it - this is some kind of arithmetic expression:
The first thing to do with it is to get rid of the brackets and reassemble it into a sequence of operations for execution, bearing in mind the priorities of the operations. The easiest way to do this is by translating into a Polish annotation, this is done using a stack - you can see the algorithm on Wikipedia. Meaning: (a + b) * c + d = +*+abcd.
Although you don't have to execute the program, I still recommend translating into Polish notation - so you can check the correctness of the arithm expression and understand where it ended.
Or you can come up with some rules to understand this, for example: there cannot be 2 characters in a row (except for the unary operator), ... but it's more difficult IMHO.
(b) then comes after him Expression | ArrayExpressions, you parse them too.
(c) could be Else or next statement
And so on.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question