Which part of the translator should catch an error like "invalid number format"?

K

koi com2015-03-30 12:20:38

Programming

koi com, 2015-03-30 12:20:38

I am writing a term paper - a high-level language translator. At the stage of writing a lexical analysis, the question arose about when and where to check an integer literal for validity (for example, does it start from zero, ala 0978)? The task of lexical analysis is to extract tokens that are subsets of the alphabet of the language, I do not think that it should check the validity of numbers.
Here is the grammar rule that sets the pattern for integers:
<number> -> <number>|<number><number>
<number> -> 0|1|2|3|4|5|6|7|8|9
Waiting for suggestions , arguments, links to resources, books, etc.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

L

lam0x86, 2015-03-30
@lam0x86

At the stage of lexical analysis, it is not always possible to check the validity of a lexeme. Let's say the data type int32 (a conventional name for a type containing a signed 32-bit integer) can store a number in the range [-214748364 8 ; 214748364 7 ]. At the same time, the constant expression "2147483648" is considered invalid, and the same thing, but with a minus sign "-2147483648" is quite valid. Ideally, the lexical analyzer should not know about the sign of the number. Determining whether unary or binary is a minus (or maybe the language supports intervals like "10-100" or some other exotic) lies on the shoulders of the parser.
However, basic token validity analysis is indeed easier to do in a lexical analyzer. For example, string literals are most often checked for the validity of escape characters in the process of lexical analysis. The earlier an error is found, the easier it is to restore the erroneous state of the analyzer.

X

xmoonlight, 2015-03-30
@xmoonlight

Immediately after receiving <number>

V

Vapaamies, 2015-03-30
@vapaamies

Where the resulting nonterminal is interpreted as a number.

M

Mikhail Potanin, 2015-04-09
@potan

Here the question is whether invalid numbers are included in the alphabet of the language. If not included, check the very place in the lexer.