Creating a Garbage Collector

DileSoft2011-05-13 00:29:17

OOP

DileSoft, 2011-05-13 00:29:17

The people, does anyone in the Garbage Collection understand? I'm now listening to the author of Phantom OS , he talks about creating a fast GC. The fastest is ref-counting, but it does not work with circular references. And then he returns to the usual GC spider. And then I thought... To quote my comment:

I had a vague idea in my head that I could use the idea of refcounts for cycles. What I mean. I mean that in the object manager, each object must have a history of creation and links associated with it. That is - object A generates object B, generates objects C, which refers to A ... And then we get the last story from the latter: ABCA and immediately see the cycle! If there are many references to an object, their histories are summed up in an array and checked one by one.

I understand correctly that it will be very cheap and will work?

Answer the question

In order to leave comments, you need to log in

7 answer(s)

lesha_penguin, 2011-05-14
@lesha_penguin

Answering the question, I want to share one well-proven solution. I apologize for the “many letters” comment, because I just wanted to write this as a topic, but alas, I don’t have any karma yet ...
Imagine that we have a very, very tricky data structure, the elements of which are a bunch of (perhaps even heterogeneous) objects containing a bunch of references to other objects of this structure. For example, a tricky graph with cycles and other joys. Of course, our real program, in addition to this tricky graph, also works with a bunch of other objects, including, for example, other equally tricky graphs that also need memory, and actually quite a lot.
It seems that the “task for schoolchildren” wrote data structures with all fields for the vertices and edges of the graph, and then everything seems to be simple, “everything is like in a textbook” - allocate memory for them from the heap using malloc / new and put down pointers.
The problem arises when we have worked with this graph and we no longer need it. The seemingly simple task of "taking and freeing memory for all objects", i.e. for each object, call the corresponding free / delete, and one once and only once, in fact, it turns out not just complicated, but in the general case NP-hard. Those. all previous work with this tricky graph may seem like "flowers" in comparison with an attempt to find in what order to go around the elements so as not to forget anyone (because otherwise there is a memory leak) and cause the release of each element only once (because with double release we will violate the logic other threads of our program).
Compared to the problem that suddenly fell on us, the idea of using various tricky algorithms for automatic memory management, reference counting, garbage collection, etc. no longer seems such a crazy idea to us. It is also not always easy to screw some garbage collection library into a stupid one, because not all garbage collection and reference counting algorithms work correctly with circular references.
However, let's look at the root of the problem. The problem of the complexity of element-by-element release as such is generated by the very element-by-element allocation of objects from dynamic memory (heap). By the way, this is in addition to all other problems of dynamic memory such as fragmentation, slowdown in allocation / release operations when dynamic memory is a mess of hundreds of thousands of fragments of different sizes mixed with free elements of different sizes too (namely, such a hellish mess is dynamic memory in STL lovers).
The solution can be simple and elegant - a memory allocator without element-by-element release. Sounds complicated, but the implementation is simple.
The basis of our memory allocator is a chunk. The size of a chunk can be anything, as long as it is greater than the maximum size of the object. The ideal option is knowing (approximately) how much we need, we choose the size of the chunk. If the structure of our graph took more - it's okay, we select one more chunk.
The chunk itself has the following structure:

struct _alloc_pool_mem_chunk{ // чанк пула-аллокатора без поэлементного освобождения
  size_t size;  // размер данных чанка
  size_t curr;  // текущее заполнение чанка
  struct _alloc_pool_mem_chunk*next; // указатель на следующий чанк нашего пула
  uint8_t data[0]; // данные чанка
  };

Allocating memory from such a chunk is elementary:

static void *memchunk_alloc_from_chunk(struct _alloc_pool_mem_chunk*ch,size_t s){
  if(!ch||!s){return 0;} // простая проверка
  if( (ch->size-ch->curr) < s){return 0;} // у нас в чанке свободно (ch->size-ch->curr) байт
  void *rv=&ch->data[ch->curr];
  ch->curr+=s; // сдвигаем на s байт
   return rv;
  }

If there is no room for an object in a chunk, we simply add another chunk:

static struct _alloc_pool_mem_chunk *memchunk_add_chunk(struct _alloc_pool_mem_chunk **chp,size_t chunk_size){
  struct _alloc_pool_mem_chunk*ch=malloc(sizeof(struct  _alloc_pool_mem_chunk)+chunk_size);  // выделяем чанк 
  if(!ch){return 0;}
  ch->curr=0;
  ch->size=chunk_size;
  ch->next=*chp; // сцепляем чанки в связный список
  *chp=ср;
  return ch;
  }

The whole procedure for allocating memory from such an allocator in its simplest form:

void *memchunk_alloc(struct _alloc_pool_mem_chunk **chp,size_t s,size_t chunk_size){
 void *rv; 
 static struct _alloc_pool_mem_chunk *ch=*chp;
 if((rv=memchunk_alloc_from_chunk(ch,s))){return rv;} // выделяем из текущего чанка
 if(!(ch=memchunk_add_chunk(chp,chunk_size))){return 0;} // не удалось добавить чанк-облом
 rv=memchunk_alloc_from_chunk(ch,s);  // выделаем из нового чанка
 return rv;
  }

The function of freeing the entire pool - just bypass all the chunks:

void memchunk_free_all(struct _alloc_pool_mem_chunk **chp){
  struct _alloc_pool_mem_chunk *ch=*chp;
  while(ch){
     struct _alloc_pool_mem_chunk *chn=ch->next;
     free(ch);
     ch=chn;
     }
  }

Advantages:

Extreme ease of implementation. Even a schoolboy is able to write such an allocator accurately the first time in a few minutes. By the way, I did not copy-paste the code above, but simply typed it from memory (there may be mistakes from errors).
low overhead. only 12/24 bytes (x86/x86_64) per chunk.
Does not fragment memory. An essential advantage, especially when there are hundreds of thousands of elements.
Selecting an element is faster, many times faster than malloc/new. (Memory allocation for an element from chunk 3 of the line). Releasing an element is nop at all!
If we suddenly want to threadsafe, we can easily get it just by replacing the ch->curr+=s entry with cmpxchg and adding some lightweight primitive like spinlock to add a chunk.
We have our own fast and lightweight garbage collector with the “free all” command. Walking through a (NB: small) list of chunks is not counting references in millions of objects.
Yes, the control of occupied memory by our tricky structure. We can always quickly tell that for example this graph is 3.5Mb, just by running through the chunks.
Verified by me personally on a dozen highload projects.

Disadvantages:

And they happen only in cases where this allocator is used for unintended purposes :)

PS: I wish all habrausers pleasant experiments.

dborovikov, 2011-05-13
@dborovikov

As far as I know, the main problem with refcounting is to count links atomically. If the object is accessed from different threads, then you will have to synchronize on the counter, and this will become a bottleneck. Refcounting seems to work well only in a single-threaded environment.

pennanth, 2011-05-13
@pennanth

The problem is not in such simple cycles, but in branching ones.
For example, 20 A(i) objects generate their 20 B(j) objects, and those - generate C(k) objects - and B(j).C(k) refers to A(k+j+i % 20) . Will there be a cyclic relationship here?
Those. you need to maintain either a direct graph A(i) -> B(j) -> C(k), but there the problem is that A does not know about C without special magic, or a reverse graph C(k) -> B(j) -> A(i), and here the problem arises - how to find all the Cs that the given A(i) refers to. And, of course, for an object with 3 byte fields, maintaining such lists is extremely expensive - two references to parent objects (each in 64 bits) will already turn out to be more in terms of memory for such an object (taking into account alignment).
There are, of course, effective implementations of trees and hash tables (and other similar structures), but for managing 16-byte pieces of memory, all this power is like from a howitzer to a hummingbird.

shadowjack, 2011-05-13
@shadowjack

Hmm, well, for example, in Perl, the garbage collector, as far as I understand, works on the principle of ref-counter. If you need to make a cyclic structure, then there is the Scalar::Util::weaken function for this, which will decrease the counter value and the memory will be normally freed.
I mean, there is no special black magic when using ref-counter with cyclic structures. eight)

diostm, 2011-05-13
@diostm

The book Thinking in Java describes very well what's what about the work of Garbage Collector'a in Java.
May lead to further reasoning other than "ref-counting".

googol, 2011-05-14
@googol

The Garbage Collection algorithms are best described in this book www.amazon.com/Garbage-Collection-Algorithms-Automatic-Management/dp/0471941484/ref=sr_1_1?s=books&ie=UTF8&qid=1305334759&sr=1-1
Required reading.