Sunday, December 22, 2013

Python Garbage Collection

As an accidental programmer, I'm often encountering very interested things, which are probably used daily by professional developers, but I didn't  need to use them in my work. One of this things is "garbage collector". To be precise, I mean garbage collector in Python.

I like using Python because of its easiness. It is dynamically typed, and allows me to use quite large data structures without previous memory allocation. I was wondering how it is done, but didn't had time to check it out. But after some reading it looks pretty simple.

Every time Python allocates memory for data, it is starting reference counter for this object.  Every reference for this object is incrementing this counter. If references are removed, counter is decremented. If it equals zero, memory is deallocated.

Sometimes keeping precise subtractions of references is impossible, so tool called garbage collection is needed. This procedure (garbage collection) is time consuming, but can retrieve allocation of memory which is not longer needed. By default, garbage collection is invoked after threshold of counted references. In my Python setup it is 700. You can check this by (first returned value is mentioned threshold):
 import gc  
 print gc.get_threshold()  
As garbage collection is time expensive, sometimes it might be worth to execute it "by hand", for example after data preparation phase, and before
actual calculations on data. If program is responsible for handling 24/7 online user actions, it might be worth to run garbage collection once for twenty four hours when traffic is smallest.

What is interesting, this method is based on "how many" objects and not "how large" objects, so there is possibility that one will run out of memory and will be far from threshold on the same time. Then this is good time to analyze situation for possible tweaks.

In my work I didn't needed to adjust automated garbage collection. Luckily my data and surrounding objects didn't forced me into deeper research of this interesting problem. But now I'm little bit more aware that my programs don't have access to unlimited memory, and how to handle it ;).