Wednesday, June 15, 2011

Demystifying the garbage collector

In my work history I have met a lot of people that were using languages like PHP, Java, .NET but had no clue of how garbage collectors actually work. Most common answer was: 'Memory is no problem, because we have garbage collector, that free the resources.'. Unfortunately although this is true, we have garbage collectors, memory leaks are as big problem as they were in C or C++ for example. To illustrate the problem I have written PHP snippet bellow, but could be in any language.

echo 'Memory usage on script run:' .memory_get_usage() . "\n"; 
//let us create a simple object 
$var = array(); 
for($i=1; $i < 10000; $i++) { 
   $var[] = 'Hello'; 
//after array 
echo 'Memory usage when object is created:' .memory_get_usage() . "\n";

//memory usage is the same 
//because there was no data cloning, just pointing 
$new_var = $var; 
echo 'Memory usage when new pointer is made:' .memory_get_usage() . "\n";

//unset does not free resource 
//because there is still pointer to that memory location $var 
echo 'Memory usage when one after uset:' .memory_get_usage() . "\n";

echo 'Memory usage when one after all pointers:' .memory_get_usage() . "\n"; 

The code illustrates the problem. If you are using objects and you assign a variable an instance of an object it will hold the resource, because are actually pointing to that object and not cloning it. Therefor the resources used by an object will be collected by garbage collector only when all the pointers (variables) pointing to that resource are released.
Good article that explains garbage collectors can be found here

UPDATE: This problem has been solved by implementing the Concurrent Cycle Collection in Reference Counted Systems Algoritm