Answer the question
In order to leave comments, you need to log in
Which part of the translator should catch an error like "invalid number format"?
I am writing a term paper - a high-level language translator. At the stage of writing a lexical analysis, the question arose about when and where to check an integer literal for validity (for example, does it start from zero, ala 0978)? The task of lexical analysis is to extract tokens that are subsets of the alphabet of the language, I do not think that it should check the validity of numbers.
Here is the grammar rule that sets the pattern for integers:
<number> -> <number>|<number><number>
<number> -> 0|1|2|3|4|5|6|7|8|9
Waiting for suggestions , arguments, links to resources, books, etc.
Answer the question
In order to leave comments, you need to log in
At the stage of lexical analysis, it is not always possible to check the validity of a lexeme. Let's say the data type int32 (a conventional name for a type containing a signed 32-bit integer) can store a number in the range [-214748364 8 ; 214748364 7 ]. At the same time, the constant expression "2147483648" is considered invalid, and the same thing, but with a minus sign "-2147483648" is quite valid. Ideally, the lexical analyzer should not know about the sign of the number. Determining whether unary or binary is a minus (or maybe the language supports intervals like "10-100" or some other exotic) lies on the shoulders of the parser.
However, basic token validity analysis is indeed easier to do in a lexical analyzer. For example, string literals are most often checked for the validity of escape characters in the process of lexical analysis. The earlier an error is found, the easier it is to restore the erroneous state of the analyzer.
Here the question is whether invalid numbers are included in the alphabet of the language. If not included, check the very place in the lexer.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question