S
S
Stas Korostelev2015-09-16 02:29:16
Programming languages
Stas Korostelev, 2015-09-16 02:29:16

How are scripting languages ​​made?

Purely out of curiosity, the question arose, how are scripting programming languages ​​made? For example php, everyone knows that it is written in C, but how? I just want to write my own language with just a couple of functions, but with my own compiler. And can this be done in C#? :)

Answer the question

In order to leave comments, you need to log in

7 answer(s)
S
Stanislav Makarov, 2015-09-16
@Nipheris

Short answer: read the book of the dragon . There is also more sophisticated literature, but everyone starts with this book (we were read a language translation course from it at the university).
Long answer: your translator takes a sequence of characters as input (let's say UTF-8 text), "understands" it according to your language specification, and spits out instructions in another language as output (as a text file or a special format file) . This "other language" can be the assembly language of some iron platform (x86_64, ARM, SPARC), and the resulting file will be a binary for the specified architecture (more precisely, an object module , the binary will then be assembled by the linker) - this is how, for example, C/C++ is compiled. "Another language" can be a virtual machine language (LLVM/Java/MSIL bytecode) - this is how C/C++ (if via LLVM), Java, Scala, C#, F#, VB are compiled. "Another language" can also be a higher-level language - often, in order not to take a steam bath at the initial stages of language development by generating machine code, they make a translator that generates C code, and this C code is already compiled by a well-known compiler into a binary. Or, for example, CoffeeScript/TypeScript are translated into JavaScript, because web browsers, apart from javascript, are not yet able to execute anything.
Of course, you can write an interpreter instead of a compiler - then your program will immediately execute instructions in your language without generating any output file. Quite a few systems do this, such as Node.js. Python does the same if you turn off pyc file generation (correct me if I'm wrong).
In which language to implement the translator itself - it does not really matter. Moreover, it is considered good practice to implement a compiler of the same language in the language being developed - this is called self-hosting . Usually, having a language compiler in the same language is considered the first step in taking a language seriously. Of course, the first version of the compiler will need to be implemented in an existing language (or bootstrapping, if you are a very severe developer).
Because there are already a lot of cones in the tasks of parsing the input stream, and people have devoted their lives and scientific careers to studying this issue, then a lot of tools have been made to help develop the compiler. As a rule, such tools make it possible to describe the grammar of your language in some specialized syntax (like BNF), and then, according to this description, generate a lexer and parser code for you in a language convenient for you (these are modules that will perform the initial parsing of the input stream in your language in tokens, and build an abstract syntax tree (AST)). And you are already adding to them the main part of your compiler. As an example, when writing compilers in the C language, flex is often used in conjunction with yacc /bison . There are more complex packages that allow you to generate parser code in various languages ​​- ANTLR , GOLD . Or you can write a lexer and parser yourself, especially if you have already made the first version of the compiler and are rewriting it in your own language).

T
tsarevfs, 2015-09-16
@tsarevfs

Scripting languages ​​are usually called languages ​​that are executed by an interpreter. The compiler converts the program code into machine instructions that can be executed on their own, and this is much more complicated.
Start with something very simple. For example, from a program that reads one line with instructions and displays the result. Let there be only one command in the line at first. For example:
input: add 2 3
output: 5
Then try to write a calculator for expressions based on this. The search for how to do this can be started here .
When this is done, it will be possible to move on.

P
Puma Thailand, 2015-09-16
@opium

At the university, they tell a course on the theory of programming languages ​​​​so that such questions are not asked

E
Eugene, 2015-09-16
@Onni

The first commenter has already said everything and it's hard to add anything, but I would advise looking at forth or other stack languages.
The stack machine is written very quickly, literally in half a day. If you really need a couple of features, then this option may be suitable.

R
Roman Mirilaczvili, 2015-09-16
@2ord

For educational purposes, you can use compiler generators such as Coco / R (read the documentation!)
In short: you create a description of the grammar of the language, and the generator creates scanner and parser files for you, which you need to compile to get the compiler executable file.
The site www.ssw.uni-linz.ac.at/Research/Projects/Coco has examples for various PLs.

D
dponyatov, 2019-03-12
@dponyatov

To write an interpreter, it is very important to understand the most important principle: the interpretation of data structures
The program is represented as a data structure, most often it is a tree (or graph) of objects, each node contains an object representing various elements of the language: a constant, a function, a cycle, etc. .
Each element must be able to contain
(a) elements addressable by name (associative array) and
(b) nested elements in a controlled order (array or list).
The interpreter traverses the program tree/graph, and performs actions through object method calls.
For example, the nest class Operator -> Plusmethod is set to add() { return nest[0].add( nest[1] }
a list of nested elements (operands)
Some objects are able to create new executable data structures in the memory of the interpreter, the implementation and storage of variables is done through the attributes of any object (an associative array), for example, the global object stores global variables in its attributes.
there is something here in deep prealpha (python interpreter) https://github.com/ponyatov/hico/releases/latest

A
Alejandro Esquire, 2015-09-16
@A1ejandro

At one time, I developed (still in Dosovsky Pascal, albeit using OOP) my own interpreter, especially for the PC-network interface system with Mainframe (then all this was also implemented through a self-made Iola network gate in conjunction with Novell - what does the developers Iola argued that such a gate is generally impossible to create). So my system was engaged in high-speed data entry / typing from paper media and transferring all this to the Mainframe (IBM) in the format it needed. In short, I had the task of making a system with "visual design of data entry layouts by operators." It turned out to be implemented, it was enough to "draw" the layout in a text file indicating the types of input data, and it immediately "came to life", it was immediately possible to type data into it. With all this, the speed of data entry in comparison with some old punched card system has increased significantly. It was interesting and quite a practical experience in the development, one might say "scripting language" =)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question