GARBAGE COLLECTION ABC
Ok, guys! Time to talk Garbage collection (or GC). Why do we need it? What problems does it solve? Let's try answering all these questions, while taking a closer look at the GC modes in .NET and other actual languages.
By the way, have you ever wondered, when did we have the first language to support garbage collection? You’ll be a little surprised, but it was 1964. 50 years ago, people were already thinking that developers should be freed from dealing with memory. The language was called APL. Later on, languages supporting garbage collection included Smalltalk (1972), Erlang (1990), Eifel and of course C#, and any modern language that is coming out now, like Go. Actually, GC is a must have now.
Interesting fact: according to research, developers who write code in languages that do not support garbage collection spend 40% of their productive time on memory management operations, which is quite a lot and, most likely, will not always be understood by management.
WHAT IS GARBAGE COLLECTION
GC (Garbage Collection) is a high-level abstraction that relieves developers of the need to worry about freeing managed memory.
Let's remember the main theses on garbage collection. In .NET, garbage collection is based on traceback.
There is a concept of root elements application. The root element is a memory spot containing a reference to a heap-allocated object. Strictly speaking, we can call root such elements as:
REFERENCES TO GLOBAL OBJECTS
(although they are not allowed in C#, but the CIL code allows you to place global objects).
References to any static objects or static fields.
References to local objects within the application's codebase.
References to object parameters passed to the method.
References to the object awaiting finalization.
Any CPU (central processing unit) registers that reference an object.
During the garbage collection process, the runtime will examine objects on the heap to determine if they are still reachable (i.e., rooted) by the application. To do this, the common language runtime will create object graphs that represent all the objects that the application can reach. Also, keep in mind that the garbage collector will never create a graph for the same object twice, eliminating the need for circular reference, common in COM (Component Object Model) environment.
GARBAGE COLLECTION PHASES
Marking (mark phase).
Cleaning (sweep phase).
Compression (compact phase).
Generations of objects: zero, first, second generation.
Zero and the first generations are also called ephemeral generations. They are needed to speed up the response of our application.
GARBAGE COLLECTION THEORY
Thinking of designing a garbage collection algorithm? Here are some vital factors you might want to consider:
Program throughput: how much does your algorithm slow the program down? Can be seen as a percentage of CPU time spent on collection vs useful work.
GC throughput: how much garbage can the collector clear given a fixed amount of CPU time?
Heap overhead: how much additional memory over the theoretical minimum does your collector require? If your algorithm allocates temporary structures whilst collecting, does that make memory usage of your program very spiky?
Pause times: how long does your collector stop the world for?
Pause frequency: how often does your collector stop the world?
Pause distribution: do you tolerate very short pauses combined with very long pauses? Or do you prefer pauses to be a bit longer but consistent?
Allocation performance: how do you estimate allocation of new memory: fast, slow or unpredictable?
Because the design space is so intricated, garbage collection is a subfield of computer science to develop. New algorithms are continuously proposed and implemented, both by researchers and industry. Unfortunately, nobody has yet proposed a single algorithm that is a one-stop-shop for all cases.
JAVA VS .NET GARBAGE COLLECTORS
First of all, this is more about the difference between the CLR (.Net) GC and the JVM GC, rather than the languages themselves.
There are some historical differences largely due to .Net being designed with outcomes from the java evolution (and other GC based platforms). But it doesn’t mean that the .Net was in some way superior because it included functionality from the beginning, it has simply come out later.
Initial JVM's (Java virtual machine) did not have generational garbage collectors though this feature was swiftly added. It was realized that a mark-sweep-compact approach would result in much better memory locality justifying the additional copying overhead.
Active research in GC strategies is ongoing in both companies (and in open-source implementations).
Soon or a later we all face the problems of non-optimal work of the designed application, caused by unknown reasons. When analyzing them, quite often we do not look closely at: how the GC works, how it affects the overall operation of the application, whether the chosen GC mode is optimal for this specific application. Our advice is to get these answers right away as they might be the key to your analysis.