Java, multithreading, object initialization and reordering - do you always need to synchronize initialization?

N

NikKotovski2018-08-01 09:36:30

Java

NikKotovski, 2018-08-01 09:36:30

Java creates a reference to the object being initialized before the constructor and before the initialization blocks. This circumstance allows from within these blocks to give a reference to an object to another object and another thread using the this word before all the full ones are initialized, if you write the code carelessly - if you first give a reference, and then initialize the fields. And also the logic of Java allows the virtual machine to swap instructions, so even if you write the code carefully and pass the link in the code after all the initialization is written there, it still does not protect against passing the link before the initialization of the fields, because the JVM can execute the code that passes the reference before the initialization is done. In principle, this is all clear.
For me, the question is whether a constructor can return a reference to itself outside before it has been initialized if we do not use the this word - that is, during normal initialization. And if so, is it possible to avoid this by creating a method where the object will be initialized and then returned. I've scoured the internet looking for an answer, and they usually write about what is possible, and that both initialization options are not thread-safe. The problem is that most of the answers are old and the synchronous logic has changed once or twice since then. Moreover, among the more recent notes on this topic, I have already met both points of view - and I cannot understand whether the few notes that say that this is safe are wrong, or whether the majority is mistaken, which, by inertia, believes
Let me give you a concrete example of what we are talking about. Let's say we have a variable
MyClass m;
available to multiple threads. If this variable is not initialized, then one of the threads tries to initialize it
m = new MyClass();
Can the variable m start pointing to some object before it is initialized. In other words, can a constructor return m a reference to its containing object before all the code has been executed in it.
Let's say it can. Then the second example. Let's say we have a
MyClass method creatMyClass() {
MyClass m = new MyClass();
return m;
}
Can this method return m, which will not be fully initialized?
At the same time, I understand that even if the thread code examples are safe, you still have to do some kind of synchronization, otherwise one thread can create an object, work in it, and then another thread will create a new object and the changes of the first thread will not be saved. More specifically, let's say we have this code:
volatile MyClass m;
if (m == null) m = createMyClass();
In this case, a scenario is possible: one thread reads m and sees null, the second thread reads m and sees null, then they both start creating an object, then the thread that created the object first manages to change something in it, and then the lagging thread replaces the object with the other, causing changes to the first thread to be discarded.
However, depending on the logic of work, firstly, there is a different cost of an error in the code, and secondly, it is potentially possible to get by with an atomic variable instead of a full-fledged lock.
That. Here is a specific question - can the variable m contain a reference to an object with underinitialized fields in the first two examples (code), if m is not volatile? And if so, would volatile help avoid the problem?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

N

NikKotovski, 2018-08-02
@NikKotovski

After reading some more time about the question and analyzing some articles in more detail, I sorted out the question, and I can give an answer to it myself. The easiest way to parse it is on the Wikipedia article:
https://en.wikipedia.org/wiki/Double-checked_locki...
The article says that
That is, the JVM can indeed give a reference to an object to the outside before its fields are initialized and the code in the initialization blocks and the constructor is executed. Furthermore
Thus, as the article says, passing a reference to a created, but uninitialized, object, although possible, does not happen so often, and code that does not take it into account can work for a very long time without errors. Which only makes the problem worse.
Now with regard to volatile. Declaring a variable volatile actually guarantees that other threads will not access the uninitialized variable during object creation. Those. the variable will simply be blocked for other threads until the object is initialized. This means that with thread-safe initialization of an object, it is still possible in theory to get by with an atomic variable instead of a lock. But this is still inaccurate. In the near future I will analyze this moment and I will be able to give a 100% answer.
As for the question about the method that creates the object, no, this method cannot return an underinitialized object. Those. a reference to an under-initialized object can get into the variable, but until the initialization has passed, the code will not be executed further in this thread. That. return will already return a reference to a fully prepared object, because m is a local variable and is stored strictly within the thread itself, and no one will get access to it until it is fully initialized.
Well, one more thing that follows from all of the above: if you need guaranteed creation of an object that will definitely be pre-initialized when another thread accesses it, but you do not want to use synchronizations or volatile and are ready to sacrifice some of the changes made to it, then you can just do this:
if (m == null) {
MyClass n = new MyClass();
m = n;
}
In this case, the class is first initialized in a local variable, and then it will be calmly transferred to a shared variable. But, once again, this method will not save you from creating several objects and losing changes.

A

Andrey K, 2018-08-01
@kuftachev

Think for yourself, how can a designer work halfway? He will return a link to the result of his work to the called code when the whole is executed. Of course, you can break a member with a fool, but why do you need to pass a pointer to yourself somewhere else in the constructor?
According to your example, the question is born, is it about Singleton? If yes, then 2 out of 4 ways to implement this pattern described in Joshua Bloch's book are thread-safe.
PS The essence of volatile is that you say not to cache this value in the processor, but always descend into memory (by the way, here I don’t know if there is one Intel processor in the system, unlike AMD, they have a common third-level cache, whether they will be used he or still only memory) to work with a fresh value. Without this, two cores can work with their values for a long time, but the atomicity of writing this value is not guaranteed. That is, for example, int will change atomically, but long will not. Therefore, I’m not sure about the references, they seem to be also 64 bits, in theory, they should also not be atomic, but maybe there is another protection, otherwise we will create two objects and get a link it’s not clear where.