How to switch between states?

D

dearname2014-05-28 21:11:15

C++ / C#

dearname, 2014-05-28 21:11:15

Good evening, I can't figure out how to switch between states in the lexical analyzer. So I took one character from the file, then I determine that it is a number, but there are 3 possible cases, then, for example,
a=3 - I must insert into the vector, the "a" identifier; yes, but since there is no space, I have "a=3" entered in one cell of the vector.
ca= 3 - also "a=" - in one cell.
only it works a = 3.
how can I remember, I work all the time, with one character.
You can at least explain with numbers, that is, show how the transition of states will be carried out in this case. We must be in some kind of state with each new character, I understand this, but you can show how it is in the code, at least on such a small example. Then I will implement it myself.
This code reads a file and puts all the words of this file into a vector. Let's suppose that I have come across such a construction a=3 - how can I resolve this situation so that the vector contains not "a=3", but "a", "=", "3" that is, 3 cells are occupied in the vector by this line.

#include "stdafx.h"
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>
#include <vector>

using namespace std;

struct Lexeme {	
  int start, end;
  string lexeme;
};
vector<Lexeme> v;
void gToken(ifstream &fin, vector<Lexeme> &v);
void wToken();
int main()
{
  char filename[30];
  ifstream fin;
  char q;
  //cin >> filename;
  fin.open("text.txt");
  gToken(fin, v);
  fin.close();
  wToken();
  system("PAUSE");
  return EXIT_SUCCESS;
}
void gToken(ifstream &fin, vector<Lexeme> &v)
{
  int j = 0;
  char ch;
  Lexeme l;
  while (fin >> ch) 
  {
    int i = 0;
    l.lexeme += ch;
    while (fin.get(ch))
    {	
    if (ch != ' ' && ch != '\n')
    {
      cout << ch<<endl;
          l.lexeme += ch;
      }
    else
    {
      j++;
      l.start = j - i;
      l.end=j;
      break;
    }
    }
  v.push_back(l);
  l.lexeme ="";
  }
}
void wToken()
{
  for (int i = 0; i < v.size(); i++) cout << v[i].lexeme << endl;
}

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

Rsa97, 2014-05-28
@Rsa97

You need to start writing tokens, build a state machine and modify it to calculate the values of tokens. Example for alphabet [a-z0-9=], identifiers, integers and assignment:
<letter> := [az]
<digit> := [0-9]
<equals> := [=]
<identifier> := <letter>(<letter>|<number>)*
<number> := <number>(<number>)*
Automatic:

_    [a-z]  [0-9]  [=]  ¬
s0    s1     s2    ok  end
s1    s1     s1    ok  ok
s2    ok     s2    ok  ok

Modified machine:

_       in == ['a'-'z']                     in == ['0'-'9']                    in == ['=']           ¬
s0    val := in; next; s1                 val := in-'0'; next; s2            next; ret(ASSIGN)  ret(EOT)
s1    val := concat(val, in); next; s1    val := concat(val, in); next; s1   ret(IDENT, val)    ret(IDENT, val)
s2    ret(INTEGER, val)                   val := val*10+in-'0'; next; s2     ret(INTEGER, val)  ret(INTEGER, val)

s0 - initial state, in - current character, next - transition to the next character in the stream, ret returns the token type and its value.
Jump list when parsing 'a1=95':
First call:

s0 ('a') -> val := 'a'; next; s1
s1 ('1') -> val := 'a1'; next; s1
s1 ('=') -> ret(IDENT, 'a1')

Second call:
Third call:

s0 ('9') -> val := 9; next; s2
s2 ('5') -> val := 95; next; s2
s2 (¬) -> ret(INTEGER, 95)

The following calls:
s0 (¬) -> ret(EOT)

J

jcmvbkbc, 2014-05-28
@jcmvbkbc

discover flex