Answer the question
In order to leave comments, you need to log in
Running a large number of parallel tasks in separate sandboxes?
Recently, one problem has arisen that satisfies me by 100%, the answer to which I have not yet been able to find (there are a lot of different options).
I propose to speculate on the topic of possible solutions.
So, there is a large Enterprise solution written in Java. As part of its functionality, it offers users the ability to create their own scripts written in some language that run in sandboxes and return some result. By design, this is exactly what pure functions are - they take values, perform calculations on them, return a result, that's all. No access to external resources, no side effects. There are many scripts that can be launched for execution - hundreds per minute, something like this. The question arises how to organize a reliable sandbox for them.
Let's assume we'll use Jython/Groovy/JRuby as the language for these scripts, whichever scripting language under the JVM, since the rest of the system is written in Java.
Immediately the problem is that the standard JVM tools / Windows / Unix architecture do not allow organizing a sandbox within the parent process in which the main program is running. Restricting access to parts of the JDK according to the list of allowed by the package, the system of permissions for access to the network, files, and everything else is good, but one thing - there is no normal way to prevent the script from executing new int[10000000000000] and dump the JVM with OutOfMemory, stupidly because the entire heap is shared between all threads of the process, and nothing can be done about it (strictly speaking, the only theoretical way is to intercept all memory allocation operations using the jvm agent, and not run bytecodes that request memory allocation through new, without checking whether how much memory is requested… this will require writing an agent,
Those. for each script we need a separate process. But if so, then the JVM is not suitable for us right away. Because it is not originally designed to execute one-time scripts, it has a large overhead for launching, runtime initialization (even for client jvm), it consumes a lot of memory for each of its processes (including because each process needs to create its own perm gen , into which all classes are loaded ... and even class data sharing is unlikely to radically reduce memory consumption, although here I am still going to experiment).
The next option is to use something like python for these scripts, and run its native processes (under Unix, via fork), considering that this way they will be created much faster ... You can integrate with these processes from the main program either via TCP or via pipes .
Python as a language is convenient for users to write scripts in it. The question arises, how are things going with sandboxing in Python, by analogy with Java Security Manager.
The next option is to use a language like Erlang, in which the generation of processes is much simpler, but the language itself is much more exotic ...
So, gentlemen, what are the ideas? Maybe this question should be posted as an independent topic on the Java blog?
Answer the question
In order to leave comments, you need to log in
Transfer calculations to a separate (one or more) constantly running JVM, communicate with them via any IPC, execute scripts there and somehow monitor the state of the machines, restarting in case of a crash. If you look at akka (akka.config.Supervision, for example), there may be fewer bicycles.
With python, the idea is completely bad, you can stir up in Erlang, but it's dreary and it's not clear why.
I think the most reliable way to create sandboxes is to raise a container like LXC. Try to keep a pool of such containers and issue them to users on demand. Communication can be done quite simply with the help of, for example, the Spring Invoker. I won’t tell you for sure, since I’m not special in virtualization, but you need to dig somewhere in this direction. For example, look at how cloud hosters do it, because for them the problem of creating "sandboxes" is one of the most relevant.
I like using the JVM here for everyone, except for one thing - memory. In my understanding, when launched for such purposes, the JVM will have a lot more memory overhead than Python.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question