Memory Management
Pascal, by default, provides three kinds of memory management mechanisms for different datatypes and in different contexts: Local Lifetime, Manual Memory Management and Reference Counting.
In the following these mechanisms will be explained in detail, as well as outlining some best practices for their use.
Local Lifetime
Local lifetime is the first form of memory management most programmers come into contact with. This is the default way on how variables and function parameters are handled. Here the compiler fully takes over the management of the lifetime of the variable. Take for example this simple function:
procedure PrintTenth(i: Integer);
var
x: Double;
begin
x := i / 10;
WriteLn(x);
end;
The compiler will create the memory for both the parameter i and the variable x when the function is called. Once the function ends, the memory will be freed. For functions this is ususally done through the local program stack. When calling the function, the compiler generates code that will push the memory for all local variables and parameters on the stack. This is called a stack frame. Once the function returns, the whole stack frame will be poped from the stack, and thereby freeing the allocated memory.
Global variables are created on program startup and will be freed when the program is closed.
var
GlobalX: Double;
procedure PrintTenth(i: Integer);
begin
GlobalX := i / 10;
WriteLn(x);
end;
Here GlobalX, as a global variable is not bound by the lifetime of the function PrintTenth and will be available as long as the program is running.
Lastly there are thread local variables. Those are global variables, whose lifetime is bound to a thread. The memory will be newly allocated for every thread started within the program, and will be freed when it's corresponding thread is finished or killed.
threadvar
ThreadedX: Double;
procedure PrintTenth(i: Integer);
begin
ThreadedX := i / 10;
WriteLn(x);
end;
So when PrintTenth would be called from two different threads, they would use a different memory for their thread local ThreadedX, while when called twice from the same thread, it would be the same memory.
Dangling Pointers
Local lifetime is a very easy and also very efficient way of memory management. This allows for usually quite care-free usage of that memory, as the compiler will take care of all of it.
That said, there is a big limitation here, any data can only live as long as it's context. This is not an issue for global variables, because by definition, they live as long as any code that could access them. But specifically when using local variables, it can happen that there may still be references to that variable after the function containing it has ended. This is called a dangling pointer.
Take the following program:
function Dangling: PInteger;
var
x: Integer;
begin
x := 42;
Result := @x;
end;
var
p: PInteger;
begin
p := Dangling;
WriteLn(p^);
end.
The function Dangling returns a pointer to a local variable.
The problem here is, that as soon as the function ends, x doesn't exist anymore, so the Pointer returned (and stored in p) will point to already freed memory.
Those kinds of bugs can be hard to find, because of the nature of the stack, the memory is still around and may not have been reused at this point.
In the example above, WriteLn(p^)
will still print 42, as nothing happend to override this newly freed memory yet.
But if another function is called:
function Dangling: PInteger;
var
x: Integer;
begin
x := 42;
Result := @x;
end;
procedure Smack;
var
x: Integer;
begin
x := -1;
WriteLn('X: ', x);
end;
var
p: PInteger;
begin
p := Dangling;
Smack;
WriteLn(p^);
end.
Now the function Smack
has it's own stack frame, and it will be overlapping with the memory previously used by x
and pointed to by p
.
This makes finding such bugs quite hard, as the code might work as expected initially, but later, when another function was called, suddenly the results are completely different.
Best Practices
Generally usage of local lifetime managed variables is very easy and mostly safe. Due to the implementation through the stack it is also very efficient. It is therefore recommended to always use local variables whenever possible, and therefore preferable over the other methods outlined in this article.
That said, the programmer should take a few precautions to avoid having dangling pointers:
- Never return the pointer to a local variable:
- This is the most straight forward to follow guideline. Simply avoid taking the address of any local variable in the
Result
of a function. - This is always an error.
- Avoid taking pointers from local variables:
- While it's quite easy to check if a function returns a pointer to a local variable through it's
Result
, there are other ways a pointer to a local variable can be used after the function that took that pointer has ended. - For example if the pointer is written into a global variable, or written into an object. A common mistake is to put the pointer into the tag of an LCL control, e.g. in a ListView:
procedure TForm1.CreateItem(Data: TMyTreeData);
begin
With ListView1.Items.Add do
begin
Caption := Data.Name;
Tag := IntPtr(@Data);
end;
end;
- Because the Tag allows to put in a Pointer for storing additional data, the usage of local variables is a common mistake here. As the function ends, the local variable (or parameter in this case) will be destroyed, but the newly created list item will still prevail. Therefore if later the tag is accessed, it will be trying to dereference a pointer to memory that is already long gone, and therefore result in a bug.
- The best way to avoid this, is to simply never take pointers to local variables if you can avoid it. Passing parameters as
var
to another function allows to reference them in a safe manner, as the var parameter will also only live as long as the called function.
- Know your lifetimes:
- If you must take the pointer of a local variable, e.g. when you create an object that must requires a pointer, e.g. to a buffer, you must make sure that any of those occurances where the pointer is used, cannot outlive the function.
- In case of objects this means that the object is freed within the same function that holds the local variable. In case of global variables, the global variable must be reset before the function has ended, or it must be guaranteed that the global variable is not touched afterwards.
- This requires extensive knowledge about the code base, and can be really hard to track. Therefore this should be avoided whenever possible.
Notes
Writeable Constants
One big issue with global variables is the (lack of) scoping. A global variable is, as the name suggests, globally accessible. In order to avoid bugs through misuse, you may want to restrict access to a variable within a context, the so called scope, but without restring the lifetime to that scope. This can be achived with writable consts. Those are local variables with a global lifetime, meaning they will be created when the program first calls the encapsulating function, but they won't be freed until the program ends.
function NextID: Integer;
const
CurrentID: Integer = 0;
begin
Result := CurrentID;
CurrentID += 1;
end;
This function declares CurrentID as writable const, meaning it has a global lifetime, even though it is locally defined in NextID. Therefore every call to the function NextID uses the same memory for CurrentID, so NextID can "remember" the CurrentID from the last call. This allows this function to return a new ID, always being one more than the last time, in each consecutive call.
Unlike local variables, because the lifetime of writable consts is the one of a global variable, pointers can be taken and passed without creating the dangling pointer problems outlined above.
Management Operators
For most types the local lifetime will just manage the memory. Often a programmer wants to associate code with the construction or destruction of the memory. For example, when a TFileStream is created, not only the memory is allocated, but also a file is opend. Similarly when a TFileStream is freed, not only the memory will be freed, but also that file will be closed.
Previously this was only possible for classes, which had constructors and destructors tied to their memory lifetime.
But those classes can only be used with manual memory management, or reference counting (through COM interfaces, see below), and therefore was not available for local lifetime managed variables.
This has changed in FPC 3.2.0 with the use of management operators. These allow to write code that will be automatically called by the compiler when the memory is allocated (through the Initialize
operator), and when it is freed (through the Finalize
operator). See the wiki article for further information.
Manual Memory Management
While local lifetime based memory management is extremely efficient and very safe, it requires the compiler to know exactly what data is required at which point in time. Also it explicetly binds the lifetime of the memory to the lifetime of the scope it is in.
While this works great for local variables, this can be a problem for dynamic data, i.e. where the size is not known at compiletime (like for Lists or Arrays), in cases where the type of data may vary (Polymorphism), or where the lifetime is not directly tied to the scope where it is created.
Take for example a linked list, whenever a new item is added to the list, its memory must be allocated and then used by the list. As you may want to use functions for adding, removing or searching the list, the lifetime of each list element cannot be bound to the function that created it, otherwise it would be gone as soon as the add function returns.
For this dynamic memory allocation, also often called heap allocation, is used. When you allocate memory on the heap, the memory usually (with the exception of the reference counted types outlined below), must be manually freed. Therefore both the creation of the memory as well as the freeing must be done manually.
While the reasons for requiring dynamic allocation outlined above seem quite niche, there is a more practical reason why this is required. The major Object-Oriented Programming (OOP) implementation in Pascal, Classes are by design always dynamic and unlike other types like records or Integers, cannot be used as local lifetime variables. So a programmer will often find themselves in the situation where it is theoretically not be necessary to do dynamic (and manual) memory management, but due to the design of classes is practically forced to do so anyway.
Dynamic Allocation
There are 3 kinds of memory allocation:
- untyped memory allocations
- typed pointer allocations
- class constructors
Untyped Memory Allocation
This is the rawest form of memory allocation, and is done with the functions GetMem, ReallocMem and FreeMem.
These functions allocate untyped blocks of memory, where all the initialization and finalization must be performed by the user. This can be utilized for some optimizations, such as Pre Allocation of large chunks of memory, to avoid many smaller allocations and potetnial copying of data in the future.
As such they are rather rarely used, instead programmers will most often use one of the allocation methods outlined below.
Typed Memory Allocation
Typed memory allocation allows the programmer to dynamically allocate memory for a specific type. This is done with the New and Dispose functions.
Compared to the untyped memory allocation described above, New
and Dispose
have knowledge about the datatype for which memory is allocated and can therefore do the required initialization and size computations.
Therefore the following code:
var
p: PInteger;
begin
New(p);
p^ := 42;
Dispose(p);
end;
Is equivalent to the following code using raw memory allocation:
var
p: PInteger;
begin
p := GetMem(SizeOf(p^));
Initialize(p^);
p^ := 42;
Finalize(p^);
FreeMem(p);
end;
This initialization and finalization is necessary because Composite Types such as Records or Objects contain member fields, whose lifetime is bound to the lifetime of the owning record.
It will therefore call the Initialize
and Finalize
operators implicetly (see management operators).
Therefore New
and Dispose
are basically just a bit of syntactic sugar to clean up the code and not have all that additional size computation and initialization/finalization code that is required when using raw allocation.
Classes
While New
and Dispose
are the usualy way of implementing dynamic allocation for all base types, as well as Records and Objects, when Classes where introduced, they got their own mechanism for memory allocation, which ties directly with the constructor and destructor functions of the class.
The constructor, usually named Create
(but can have other names, such as CreatFmt
for Exceptions), operates in three steps:
- Allocation of the memory for the class
- Initialization of the allocated memory (equivalent to the
Initialize
call that is performed byNew
as described above) - Execution of the Constructor function
The complement to the Constructor is the Destructor, which should be a virtual/overriden function named Destroy. The destructor performs similar function to the constructor just in opposite order:
- Execution of the Destructor function
- Cleanup of the allocated memory (equivalent to the finalize call that is performed by
Dispose
as outlined above) - Freeing of the memory used by the class
In practice the usage of a class looks like this:
var
sl: TStringList;
begin
sl := TStringList.Create; // Constructor call
sl.Add('Hello');
sl.Add('World');
WriteLn(sl.Text);
sl.Free; // Internally calls the destructor Destroy
end;
Note that usually the destructor Destroy
is not called directly, but rather Free
, which first checks for nil before calling Destroy.
How sensical it is to call Free
instead of Destroy
directly is debatable, but it's a common convention that can be seen in most code and will also be used here.
Memory Management Errors
No matter what form of allocation is used, every allocation requires a correspopnding free.
So for each GetMem
, there should be one call to FreeMem
to clean up the allocated memory afterwards.
For each call to New
there should be a corresponding Dispose
.
And for every class created through it's constructor there should be a call to Free
or the Destructor directly.
Memory Leaks
Because Programs can get very complex, it can sometimes be quite hard to make sure that all allocated memory is correctly freed. If there is memory which is allocated but never freed, this is called a "Memory Leak". Memory leaks can be sole occurances, in which case they are not very harmful. But often memory leaks occur within a piece of code that is called multiple times. Over these calls the memory leaks add up, and could result in major waste of memory, which can result in performance loss and potentially even to the computer running out of memory and having to forcefully kill processes or crash completely.
Use-After-Free
Even worse than memory leaks are so called use-after-free bugs, which is when the memory was freed to early, while there is still code that has a reference to that memory and wants to access it later in the programs execution. The dangling pointers described above are a form of use-after-free.
Use-After-Free bugs are considered worse than memory leaks, because while memory leaks can result in performance loss and in the worst case crashes, they do not result in any misbehavior. Also memory leaks are usually quite easy to find. Use-After-Free on the other hand can result in unpredicted behavior, and may even result in exploitable vulnerabilities through which an attacker could trick your program into executing malicious code, or circumventing other security mechanisms such as access control.
The reason for this is, that memory as a limited resource, does not go away after it has been freed, but will be reused eventually, resulting in access to other data than was originally intended. An example for this behavior is:
procedure UseAfterFree;
var
sl1, sl2: TStringList;
begin
sl1 := TStringList.Create;
sl1.Add('Hello SL1');
sl1.Free;
sl2 := TStringList.Create;
sl2.Add('Hello SL2');
WriteLn(sl1.Text); // Use of sl1 after it has been freed above
sl2.Free;
end;
Because memory is re-used, in this example sl2 will get the same memory allocated as sl1 previously held.
Therefore when accessing sl1 after it has been freed, it will unknowingly access sl2 and print 'Hello SL2'
.
This means in a sufficiently complex program, a use-after-free could potentially access any memory at any time, making the resulting errors hard to predict and hard to catch. Note that in the example above the use-after-free does not throw any exception/error, and produces a completely valid result. This is what makes finding these bugs so difficult, while giving them also the potential to completely alter the behavior of the program, by writing data to memory that the code shouldn't be able to touch.
Best Practices
Ownership Model
In order to avoid both memory leaks, and use-after-free, it is helpful to structure your memory management around an "ownership model". The core idea is, similar to the compiler managed local memory described above, that the lifetime of any memory is tied to the one of the logical owner of that piece of memory. Everytime memory is allocated there must be an "owner", some piece of code that is responsible for that memory. This onwer then must make sure that to always free the memory, eventually (potentially when itself is finished it frees all the objects it owns). Also important is that whenever the owner "dies" no one else has access to that memory anymore.
This ownership model is already provided by many classes in the RTL, FCL and LCL, e.g. in the Component Model, or in the container libraries like Generics.Collections.
Note, in the following we will always use classes as an example, but the same holds true for memory allocated with new/dispose or GetMem/FreeMem.
The easiest case is for temporary and local objects, those are essentially equivalent to the local variables whose lifetime is bound to their scope and that are managed by the compiler. E.g. a TStringList to read a config file:
procedure ReadConfigFile(const FileName: String);
var
sl: TStringList;
begin
sl := TStringList.Create;
try
sl.LoadFromFile(FileName);
// Do something with the data in SL
ConfigValue1 := sl.Values['Config1'];
ConfigValue2 := sl.Values['Config2'];
finally
sl.Free;
end;
end;
In this case the owner of that SL is the function ReadConfigFile. It is created by that function, and in the end freed by that function. The Try-Finally ensures that this will always happen, no matter if it has an early break, exception or any other form of jump.
Borrowing
To avoid use-after-free, sl cannot leave this function. After the finally block sl should not be used anymore. This also includes passing sl to some other object, this is fine as long as this other object can only use SL while the function is still active:
function EncodeBase64(data: Array of Byte): String;
var
ss: TStringStream;
enc: TBase64EncodingStream;
b: Byte;
begin
ss := TStringStream.Create;
try
enc := TBase64EncodingStream.Create(ss);
try
for b in data do
enc.WriteByte(b);
enc.Flush;
ss.Seek(0, soFromBeginning);
Result := ss.DataString;
finally
enc.Free;
end;
finally
ss.Free;
end;
end;
Here this function owns two objects, the enc and ss. Again both are ensured to be freed through try-finally. But enc uses ss, so basically enc is "borrowing" the ss, which is owned by the function. It is important to ensure that enc is not using ss after it is freed. The only way to ensure this is by having enc be freed before ss was.
So when an another piece of code "borrows" memory owned by a different piece of code, the borrower must "give up" the borrowed memory before it can be freed. When writing object oriented code, most of the time this means either freeing the object that borrows the memory before freeing the borrowed memory. In other cases this may mean to revoke access to said object, or to stretch the borrowing metapher further, for the borrower to return the borrowed memory:
var
GlobalLogger: TStream; // Global variable will be our borrower
procedure LogString(str: String);
begin
if Assigned(GlobalLogger) then // If logger is registered use it
begin
GlobalLogger.Write(str[1], Length(str));
GlobalLogger.WriteByte(10); // Newline
end
else // otherwise use writeln
WriteLn(str);
end;
procedure FileLoggerTest;
var
fs: TFileStream;
begin
fs := TFileStream.Create('test.log');
try
// Set global logger to "borrow" fs
GlobalLogger := fs;
LogString('Test log file');
finally
// ensure global logger is reset to revoke access before freeing fs
GlobalLogger := nil;
fs.Free;
end;
LogString('Test log WriteLn');
end;
Here the global variable GlobalLogger
, which is used by anohter function, is the borrower. So before the FileStream can be freed by it's owner (the FileLoggerTest
function), the owner first must make sure that the borrowing is revoked, and does this by resetting the global variable to nil
.
Transferral of Ownership
The examples above describe the easy case, where the creator of the memory is also the owner. In some scenarios this may not be the case. For example, when adding data to the list, the function that creates the data, may be just called to create that piece of data, but the real owner is the list where that data will be stored in even after the creating function is finished.
The easiest example for this is the use of TObjectList
, because this class already provides functionality for ownership handling:
uses
classes,
Generics.Collections;
type
TStringListList = specialize TObjectList<TStringList>;
var
list: TStringListList;
begin
list := TStringListList.Create(True); // The parameter AOwnsObjects is True so now the list owns the object
try
list.Add(TStringList.Create); // creates new object that is owned by list
finally
list.Free; // TObjectList with AOwnsObjects will Free all objects it owns when freed
end;
end.
By setting the AOwnsObjects parameter of the Constructor of TObjectList to true, we tell the TObjectList that it is the owner of all objects we pass to it. This means, when adding a new object to the list, the list will automatically ensure that the object is going to be freed when the item is deleted, or the list is freed. Because ownership is handed over to the list, the function that creates the object does not need to care about freeing it's memory afterwards.
TObjectList
can not just claim ownership through the Add
function, but also provides the Extract
method to give up ownership of an element:
var
list: TStringListList;
sl: TStringList;
begin
list := TStringListList.Create(True); // The parameter AOwnsObjects is True so now the list owns the object
try
list.Add(TStringList.Create); // creates new object that is owned by list
sl := list.Extract(0); // Transfer ownership of the object back to the caller
try
// ...
finally
sl.Free; // As sl is now owned by the function, it must free it manually
end;
finally
list.Free; // TObjectList with AOwnsObjects will Free all objects which have not been extracted
end;
end.
Summary
The ownership model is an easy way to structure the code in a way to make it clear when memory can be used, and when it can be freed. By always making sure that every manually allocated memory has exactly one owner, this owner can make sure that the data will be freed. It also can help preventing use-after-free bugs, as every piece of memory can live at most as long as it's owner, so by tracking the lifetime of the owner, it can be ensured that the memory is still valid.
The ownership model is already implemented in many libraries that are provided by the RTL, FCL or RTL, it forms the basis for the whole Component Model, on which the LCL is based:
procedure TForm1.AddButtonClick(Sender: TObject);
begin
with TButton.Create(Self) do // Setting self as owner for the new button
begin
Caption := 'New Button';
Parent := Self;
Left := 0;
Top := 0;
end;
end;
By setting the owner of the button to Self, this Button is owned by the Form, and when the Form is destroyed, it will also destroy the button.
The ownership model can be summarized by a few rules of thumb:
- Always ensure there is one and only one owner
- When used only locally use try-finally to enforce ownership
- Borrows are not allower to live longer than the owner
- When borrowing a local object, the borrow must be finished within the same try-finally block
- When borrowing between objects, the borrowing object cannot be destroyed before the borrow ended
- When transferring ownership ensure that the new owner satisfies all the previous rules
Testing
The only way to truely ensure that there are no use-after-free bugs or memory leaks is through testing. There are multiple levels of testing, but for finding these sorts of bugs, Unit Tests are usually the best approach. These are very small programs, that only test a very small subset of the functionality, e.g. a single class or function. FPC provides the fpcunit framework, which allows to easiely create and manage these test cases.
HeapTrc
In order to detect such memory errors, fpc provides the heaptrc unit. This unit checks if for every allocation there is a free, and then reports any memory leaks, including where the memory was allocated. Despite this, heaptrc also taints memory after it is freed, this helps with finding use-after-free bugs.
Through the use of the keepreleased
it also prevents memory being reused. While this would in a real application result in massive memory consumption, for small test cases, such as unit tests, this can be used to ensure that use-after-free will always result in access to this taineted memory.
Taking the example from above, simply adding heaptrc and keepreleased:
uses
HeapTrc, Classes;
procedure UseAfterFree;
var
sl1, sl2: TStringList;
begin
HeapTrc.KeepReleased := True;
sl1 := TStringList.Create;
sl1.Add('Hello SL1');
sl1.Free;
sl2 := TStringList.Create;
sl2.Add('Hello SL2');
WriteLn(sl1.Text); // Use of sl1 after it has been freed above
sl2.Free;
end;
Now because sl1 has been tainted and is not re-used by sl2, the call to sl1.Text will throw an error, making it easy to find and fix.
Valgrind
Another tool to detect memory errors is Valgrind. Unlike heaptrc, which simply adds some additional checks to the GetMem and FreeMem functions, valgrind is a complete emulator. When executing your code in Valgrind, it will check at every single instruction if some memory errors are present.
This makes valgrind by far the most powerful tool for finding memory errors, but this comes at a price. Valgrinds emulation is around a factor 1000 slower than native execution, making it potentially infeasable to test larger programs. So also here this must be used in combination with very small and very specialized unit tests.
Don't Free
There is this quote, which I could not figure out whom to attribute:
You cannot use after free if you never call free
As outlined above, use-after-free errors are much worse than memory leaks. Infact, when your program is not going to use that much memory in the first place, memory leaks will have nearly no impact on the user at all.
One strategy can therefore be to simply ignore the memory leaks and never free the memory. This strategy is often employed by extremely security critical software, which have quite a limited memory consumption. For example when creating a CGI module for a webservice, which will be started just to serve a single request, as soon as the request is over, the process is killed and all the memory will be released by the OS anyhow. Therefore memory leaks will not matter. But if it is an external facing webservice, security vulnerabilities that may be introduced by use-after-free could be devastating. Therefore not freeing the memory could be a completely valid strategy in order to avoid any use-after-free errors.
The problem with this is that classes who need their destructor called (e.g. Streams to flush their buffered content), cannot call the destructor without calling Free. So in order to not use Free, either no classes need to be used, or a memory manager must be installed that does not actually free the memory. While HeapTrc.KeepReleased does exactly that, HeapTrc other functionality may impact the performance to much to make it a viable alternative.
Avoiding Manual Memory Management
Even when following the ownership model strictly, and doing a lot of tests, memory management errors are unavoidable. Manual memory management is hard, and errors can be very hard to find due to their subtlety.
When Mozilla rewrote their CSS engine in Rust, they analyzed the security related bugs that where previously found in that code and published the results in their blog. Of all the bugs they've got reported, nearly half where memory management bugs, i.e. where use-after-free or memory leaks.
Because memory management is so hard, the most simple solution is often to simply avoid manual memory management whenever possible. Because Classes are designed to require manual memory management most of the time, it may sometimes not be avoidable, but there are a few ways reliance on manual memory management can be reduced:
- These types can utilize the local lifetime management of the compiler to not require manual memory management
- Use Dynamic arrays instead of Class based Datastructures
- Dynamic arrays are automatically reference counted (see below), so no manual memory management is necessary. While specialized classes like TList, THashMap, TRBTree, etc. may be more efficient for a certain task, if the data structure does not contain many entries (e.g. a List with just a few dozen entries), this won't make much difference to the user experience, but can result in code that requires no manual memory management.
- Dynamic arrays with just one element can be even used instead of a pointer (e.g. to implement linked lists)
- Use COM interfaces to your classes
- Like dynamic Arrays Interfaces are also reference counted. By writing interfaces for your classes, you can use these classes without needing to care for memory management
Reference Counting
While the ownership principle can help to easiely identify when an and where any memory needs to be freed, it is not applicable in all cases. Sometimes the architecture requires a piece of memory to be shared by multiple peers, where there is no clear owner, or multiple owners.
Considering an expression tree for the expression (5 + 3) * (5 + 3)
. This could be represented as either a strict tree:
*
/ \
+ +
/ \ / \
5 3 5 3
This can be simply put into code, whith each node being it's own object and having exactly one parent. When the parent node is freed, it would free all of it's children.
But, as an optimization, the representation in code is changed such that equal subtrees are replaced with just a reference to the same object:
*
/ \
+ <---)
/ \
5 3
But now there is a problem, because now the "+" node has two parents, and thereby no clear owner.
There are multiple solutions to this. For example, all those nodes could be put into one list, which will be the owner of all the nodes, and all the nodes just borrow each other from that list. But a more flexible solution is to simply allow multiple owners.
This can be achieved with so called reference counting. For this each allocated object will have a counter embedded. When a new owner gets access to the object, it notifies the counter and the count will be incremented. Then when an owner does not need the object anymore, it just tells the counter that it's reference will not be used anymore, and the counter is decreased. When the counter hits 0, the memory will be freed.
In the example expression tree above, the reference count of '*' would be 1, '+' is owned two times by '*' so its reference count is 2, '5' and '3' are owned exactly once, so their reference count is 1. When '*' reference is lost reducing the counter to 0, it will reduce the reference counter of it's first child, '+' to 1, and then when freeing the second child, it is reduced again to 0. '+' then reduces both it's childrens reference counter from 1 to 0, freeing them, before freeing itself, and finally '*' get's freed.
This results in no memory leaks, and when used consequently, no use-after-free, because it can only be used when there are references, but then the counter would not be 0.
The major problem is, implementing this manually is extremely error prone, as it is easy to simply forget to decrease or increase the counter. But the FPC already implements reference counting, at least for some types, on a language level. Dynamic arrays, Strings and COM Interfaces are already reference counted. This means, whenever such an object is created, it's reference count is set to 1. When it is assigned to a new variable, it's reference count will increment, if a variable pointing to that object is finalized, the reference counter will be decremented:
procedure PrintArray(arr: TIntegerArray>);
var
i: Integer;
begin
// Function is entered parameter arr is in scope, the reference counter of arr is incremented
for i in arr do
Write(i, ', ');
// Function is finished, parameter arr goes out of scope, reference counter of arr is decremented
end;
var
arr, arr2: Array of Integer;
begin
arr := [1, 2, 3]; // Initialized with refcount 1
arr2 := arr; // Refcount gets incremented to 2
PrintArray(arr); // Function enters -> Increment, function finished -> Decrement
arr := []; // Reference lost -> Decremetned back to 1
// End of program, arr2 get's finalized -> Reference drops to 0, arr gets freed
end.
This is why, even though Arrays are dynamically allocated, there is no need to manually free them.
COM Interfaces
While the usage of arrays and strings is straight forward, Interfaces are often underutilized for this.
To use Interfaces to solve the original problem of representing the expression tree of (5+3)*(5+3), we can simply implement the expressions as classes implementing a common interface, and use this:
type
IExpression = interface
function Compute: Integer;
end;
TMulExpression = class(TInterfacedObject, IExpression)
private
FLeftChild, FRightChild: IExpression;
public
constructor Create(LeftChild, RightChild: IExpression);
function Compute: Integer;
end;
TAddExpression = class(TInterfacedObject, IExpression)
private
FLeftChild, FRightChild: IExpression;
public
constructor Create(LeftChild, RightChild: IExpression);
function Compute: Integer;
end;
TNumExpression = class(TInterfacedObject, IExpression)
private
FValue: Integer;
public
constructor Create(AValue: Integer);
function Compute: Integer;
end;
constructor TMulExpression.Create(LeftChild, RightChild: IExpression);
begin
FLeftChild := LeftChild;
FRightChild := RightChild;
end;
function TMulExpression.Compute: Integer;
begin
Result := FLeftChild.Compute * FRightChild.Compute;
end;
constructor TAddExpression.Create(LeftChild, RightChild: IExpression);
begin
FLeftChild := LeftChild;
FRightChild := RightChild;
end;
function TAddExpression.Compute: Integer;
begin
Result := FLeftChild.Compute + FRightChild.Compute;
end;
constructor TNumExpression.Create(AValue: Integer);
begin
FValue := AValue;
end;
function TNumExpression.Compute: Integer;
begin
Result := FValue;
end;
var
FivePlusThree: IExpression;
TimesExpr: IExpression;
begin
FivePlusThree := TAddExpression.Create(TNumExpression.Create(5), TNumExpression.Create(3));
TimesExpr := TMulExpression.Create(FivePlusThree, FivePlusThree);
WriteLn(TimesExpr.Compute);
end.
This will be able to evalute the expression tree to 64, without any memory leaks, or risk of running into use-after-free.
Operator Overloading
Another interesting use of COM interfaces to facilitate reference counting is the GMP unit provided by the FPC. Because one issue with manual memory management is that the usage of any temporary object would create a memory leak.
Take the following expression: (a + b) * c
.
First (a + b) is computed, which gives a temporary object that then is used to be multiplied with c.
If the object requires dynamic allocation, because of it's temporary nature, (a + b) could not be freed, resulting in a memory leak, therefore operator overloading is not possible with classes.
So in classical, manually managed GMP this computation would be written like this:
var
a, b, c, tmp: mpz_t;
begin
// init a, b, c
mpz_init(tmp); // allocate memory for tmp
try
mpz_add(tmp, a, b); // tmp := (a + b)
mpz_mul(Result, tmp, c); // Result := tmp * c
finally
mpz_clear(tmp); // Free tmp
end;
end;
This is alot of boiler plate code distracting from the actually quite simple computation. Here is how it looks like using the reference counted interfaces
var
a, b, c, tmp: MPInteger; // Reference counted COM interfaces
begin
// Init a, b, c
Result := (a + b) * c;
end;
So while normal classes cannot use operators, because the creation of temporary objects would result in memory leaks, through the use of COM interfaces this is possible, which can result in much cleaner code
Circular References
One issue with reference counting are circular references. Assuming two objects, A and B, if A has a reference to B and B holds a reference to A, then their respective reference counters can not drop below 1, because they always reference each other. Especially when representing cyclic graphs, this can acutally lead to memory leaks, if not resolved properly.
Therefore, while the usage of interfaces can help massively with writing clean and memory safe code, it is not a "silver bullet". The programmer must still always be aware if circular references can occur, and if so, they must be resolved manually.