W
W
Weageoo2011-10-08 04:06:54
Analytics
Weageoo, 2011-10-08 04:06:54

Isolation of lexemes in mat. expression (using regexps)?

There is a simple mat language. expressions. It contains operations (+-*/, etc., possibly for two or more characters), variables, numbers (integer, fractional).
The task is to split the input expression (for example, “2 + x1 - 12.5 / ((0.24^y) * coeff)”) into lexemes (tokens) (it would also be nice to check the validity).
The task is classical, in the general case it is solved by writing a lexical analyzer (and validation by writing a parser ). Based on Regex pov, I'm currently solving it like this:

<font color="black"><font color="#0000ff">public</font> <font color="#0000ff">static</font> <font color="#2B91AF">IEnumerable</font>&lt;<font color="#0000ff">string</font>&gt; TokenizeInfix(<font color="#0000ff">string</font> infix)<br/>
{<br/>
&nbsp;&nbsp;infix = Regex.Replace(infix, <font color="#A31515">@&quot;[ \t]+&quot;</font>, <font color="#0000ff">string</font>.Empty);<br/>
<br/>
&nbsp;&nbsp;<font color="#0000ff">var</font> match = Regex.Match(infix, <font color="#A31515">@&quot;[-+*/^%()]|[A-Za-z][A-Za-z0-9]*|[+-]?[0-9]+\.?[0-9]*&quot;</font>);<br/>
<br/>
&nbsp;&nbsp;<font color="#0000ff">if</font> (match.Success)<br/>
&nbsp;&nbsp;&nbsp;&nbsp;<font color="#0000ff">do</font> <font color="#0000ff">yield</font> <font color="#0000ff">return</font> match.Value;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;<font color="#0000ff">while</font> ((match = match.NextMatch()).Success);<br/>
}</font><br/>
<br/>
<font color="gray">* This source code was highlighted with <a href="http://virtser.net/blog/post/source-code-highlighter.aspx"><font color="gray">Source Code Highlighter</font></a>.</font>

However, in this case, the user has the option to enter "x 1", and this will be treated as a single "x1":variable token. Is it possible to take into account all the conditions in one regexp, so as not to pre-delete whitespace characters? What is the best way to check the validity of an expression?
At a higher level of reasoning, another question arises: is it possible to get around only regular expressions, or is it better to use a generator like Coco / R (then I need help in describing the grammar, or maybe even someone has an .ATG file lying around just for my case)?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Maccimo, 2011-10-08
@Weageoo

However, in this case, the user has the option to enter "x 1", and this will be treated as a single "x1":variable token. Is it possible to take into account all the conditions in one regexp, so as not to pre-delete whitespace characters?
By pre-deleting spaces, you do not simplify the task of parsing the expression, but on the contrary, you complicate it.
Remove the whitespace removal and add spaces instead as another alternative to the
now second regexp.
Those.:
var match = Regex.Match(infix, @"[-+*/^%()]|[A-Za-z][A-Za-z0-9]*|[+-]?[0-9]+\.?[0-9]*|[ \t]+");

Thus, sticking of tokens will not occur, and spaces that come among significant tokens
can be ignored in subsequent processing.
With a more or less serious task, of course, it is worth writing a normal lexer, and not fencing a garden from Regex. And it will be easier to modify it and it will work faster.
What is the best way to check the validity of an expression?
...
is it possible to get around only regular expressions, or is it better to use a generator like Coco / R
The task of checking the correctness of an arithmetic expression by means of only regexps, as far as I know, cannot be solved.
You can use a parser generator, or you can write a simple one of your own that works according to the recursive descent method .
It all depends on whether you need to parse an expression or figure out how to parse expressions.
If you are interested in the topic of parsers / compilers, etc., then you should definitely read “ Compilers. Principles, technologies and tools ”.
Building an arithmetic parser is there as a step-by-step example.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question