Garbage Collection Implementation Considerations

Francis Glassborow: It is possible that in copy-editing Henrik's submission I may have inadvertently changed the meaning. His English is much better than my German. His grasp of Garbage Collection is also much better than mine. If any specialist in this area would like to take over editing future submissions in this area I am sure all concerned will be grateful.

Abstract

I have been writing compilers for more than 15 years. Six years ago I developed my own programming language. It is a pure functional language with interfaces to C, C++ and Java. This language needs a garbage collector (GC). This is extremely difficult because each language has its own memory management and so the requirements of the GC differ. In this article I will give a global overview of the problems of implicit and explicit GC's. In a future Overload I will write about the GC of my programming language. To give you an easy start I begin with the GC issues of C and C++.

First you must decide if you want to implement the GC in the language itself - implicit - or in an external library -explicit. A second possibility is to use the GC in a tool like a debugger. The differences which are shown in this article are based on the implicit and explicit kinds of GC.

What has to be Done?

The development time is likely to be less if you use implicit GC rather than writing an explicit GC. The reason is that the implicit version is much more specific to the language and the applications written in the language.

The explicit version has many more requirements because you don't know the memory structure of the application on which you want to use the explicit GC. A typical example for a small implicit implementation of GC are C++ constructors and destructors. You can find examples of explicit GC in any GC-Library. Generally you can say that from the technological side it is much better to use an implicit GC and not an explicit one. You can use the implicit GC of a programming language for your application.

Write Once - Use Often

If you write an explicit tool like a debugger which contains GC you have the advantage that you can use the GC in each application which is written in the language the debugger understands. The disadvantage is that there is no certainty that the GC works correctly. In some cases the GC may seem to work incorrectly. In such a situation you cannot determine if it is the GC or the application that is at fault. You have to test the application in several different ways to find out what is going wrong. If you could use an implicit GC you can be sure that the GC works correctly. In addition, in such an application you can also work with an explicit GC. In this way you see what is going wrong because you can compare the results of the implicit GC with the explicit one.

Does it work - or Doesn't it?

We need to fill out the requirements to make the work of the GC useful. That means that you should write your application independent of the GC used. The result is that during the test phase in which you use the GC, it is much easier to find errors. In addition the performance of the application improves. In particular, strings and large numbers must be checked very carefully. In some cases it is necessary to test with GC packages which are specialised for handling strings and large numbers. This technology is also preferred for other language constructs which should handle special kinds of data.

Safe or not Safe ^{[

1

]}

In the most cases an implicit GC is much safer than an explicit one. The reason for that is that an implicit GC can be used on every application that is written in the language that the debugger understands. So the requirements of the GC for the application are much less than for explicit GC. An implicit GC depends on the GC of the programming language being used. Because of this it is much more effective from the point of view of safety to use an implicit GC.

The handling of pointers is much easier with implicit GC than with explicit ones. A very special point for explicit GC is the handling of pointers that are cast. Such problems only happen in object oriented languages and not in structured ones. The GC of an OO language is much more complex. In general it is recommended to use an implicit GC for handling casts and pointers in OO languages. The pointers and casts in a structured language are less complicated and can be handled by any explicit GC. The reason for that is that the possibilities in a structured language for handling pointers and casts are less than in an OO language. Finally there are two points which make GC unsafe. The first is heavily optimised code and the second is threaded code. In the first case it is difficult to locate the problem in the code because the optimisation deletes a lot of code and the behaviour of the management systems changes from unoptimised code to optimised code. The problem in threaded code is to locate the GC in the right place. In the most cases you don't know where the GC happens. There can be two locations. The first is the local one, the second the remote one. It is too difficult to for an explicit GC to handle remote code. Remote code can only handled by an implicit GC which "follows" the code. Local code can handled by both kinds of GC.

Bugs, Bugs, Bugs

In the above sections I have often mentioned the use of GC inside of a debugger. This makes sense because checking an application with a GC is the something like finding a bug. Because of this it makes sense to combine the GC and the functionality of a debugger. Of course if you use very complicated constructs it could be possible that the debugger cannot handle them. The consequence is that the GC doesn't give correct results. If the GC is included in a debugger it depends on the results of it. So you have to decide whether to put the GC in a debugger or in a separate library. The advantage is that in the case of a library you can put the statements of the GC in the code of the application wherever you want. In the case of the debugger you cannot do that. The placement of GC statements in most cases will be automated by the debugger itself. The decision to put the GC in a debugger or not depends on that what you want to do.

Less or More Memory

The biggest difference between implicit and explicit GC is that explicit GC needs much more memory than the implicit one. The disadvantage of the implicit GC in C or C++ is that you can only handle small memory structures. With the explicit one you can handle the real big ones. The reason for that behaviour is clear: The explicit GC needs much more memory because of the handling of large memory areas. The implicit needs less memory because of the less memory it has to handle. A good solution for this problem is for explicit GC to handle large memory in small sections. In this way it can allocate part by part if as necessary. That is it works like an extended implicit GC.

Fast or Slow

The handling of memory is not the only problem. The performance of the application is also affected. Implicit GC is the best solution from that viewpoint. A special case occurs if the application runs in a multithreaded environment. The GC is there much more complex.

Of course using implicit GC results in the disadvantages of the section above but at the moment there is no other solution if the application is speed limited.

A future solution will be to include an explicit GC in a programming language. In this way the explicit GC becomes an implicit ones. The explicit GC becomes part of the programming language ^{[

2

]} }.

Time Enough?

The time factor for an explicit GC is much higher for an implicit one. An allocation process of an implicit GC is about 60% faster than for an explicit one. If you have to use an explicit GC and want to avoid this disadvantage you have to ensure that the compiler supports the explicit GC. A solution could be if virtual memory is provided by the GC. Modern compilers can handle this kind of memory. A much more complex technology is necessary if the GC only works on idle-time. This can not be realised in real-time applications but only in applications which strongly depend on the user interaction.

Which Page Please?

As I mentioned in the last section, virtual memory helps to minimise time. Any kind of dynamic memory should work together with virtual memory. The advantage is that the access time to the disk for local memory paging is much less. This is recommended in any kind of application - whether or not it is real-time software, whether is a small or a big software application. The technology of using/connecting GC with virtual memory is always a good solution. This kind of memory can be used in both implicit and explicit GC systems. Especially the handling of pointers of very large memory sections can be done on VM. In some cases it is possible to allocate a very big memory area with only one pointer by using VM. The final question is where the VM comes from.

There are two possibilities. First you can use real RAM. This is very fast but you need a lot of because the VM needs it as well as the application. A much better solution is to use a part of a hard disk. UNIX systems can change normal hard-disk memory to virtual memory. Then existing VM is much faster on access to the normal hard-disk. The size of the VM depends on the operating system used and on the requirements of the application. In general 128MB is often enough for a normal business application.

^{[

1

]} Henrik seemed to have inverted implicit and explicit in this section. If GC experts think the section is wrong, the error is probably mine. FG

^{[

2

]} GC is done automatically in my own programming language

Abstract

What has to be Done?

Write Once - Use Often

Does it work - or Doesn't it?

Safe or not Safe ^{[

1

]}

Bugs, Bugs, Bugs

Less or More Memory

Fast or Slow

Time Enough?

Which Page Please?

Advertisement

Advertisement

Your Privacy

Garbage Collection Implementation Considerations

Abstract

What has to be Done?

Write Once - Use Often

Does it work - or Doesn't it?

Safe or not Safe [ 1 ]

Bugs, Bugs, Bugs

Less or More Memory

Fast or Slow

Time Enough?

Which Page Please?

Advertisement

Advertisement

Your Privacy

Safe or not Safe ^{[

1

]}