Memory Management

From Free Pascal wiki
Jump to navigationJump to search

Pascal, by default, provides three kinds of memory management mechanisms for different datatypes and in different contexts: Local Lifetime, Manual Memory Management and Reference Counting.

In the following these mechanisms will be explained in detail, as well as outlining some best practices for their use.

Local Lifetime

Local lifetime is the first form of memory management most programmers come into contact with. This is the default way on how variables and function parameters are handled. Here the compiler fully takes over the management of the lifetime of the variable. Take for example this simple function:

procedure PrintTenth(i: Integer);
var
  x: Double;
begin
  x := i / 10;
  WriteLn(x);
end;

The compiler will create the memory for both the parameter i and the variable x when the function is called. Once the function ends, the memory will be freed. For functions this is ususally done through the local program stack. When calling the function, the compiler generates code that will push the memory for all local variables and parameters on the stack. This is called a stack frame. Once the function returns, the whole stack frame will be poped from the stack, and thereby freeing the allocated memory.

Global variables are created on program startup and will be freed when the program is closed.

var
  GlobalX: Double;

procedure PrintTenth(i: Integer);
begin
  GlobalX := i / 10;
  WriteLn(x);
end;

Here GlobalX, as a global variable is not bound by the lifetime of the function PrintTenth and will be available as long as the program is running.

Lastly there are thread local variables. Those are global variables, whose lifetime is bound to a thread. The memory will be newly allocated for every thread started within the program, and will be freed when it's corresponding thread is finished or killed.

threadvar
  ThreadedX: Double;

procedure PrintTenth(i: Integer);
begin
  ThreadedX := i / 10;
  WriteLn(x);
end;

So when PrintTenth would be called from two different threads, they would use a different memory for their thread local ThreadedX, while when called twice from the same thread, it would be the same memory.

Dangling Pointers

Local lifetime is a very easy and also very efficient way of memory management. This allows for usually quite care-free usage of that memory, as the compiler will take care of all of it.

That said, there is a big limitation here, any data can only live as long as it's context. This is not an issue for global variables, because by definition, they live as long as any code that could access them. But specifically when using local variables, it can happen that there may still be references to that variable after the function containing it has ended. This is called a dangling pointer.

Take the following program:

function Dangling: PInteger;
var
  x: Integer;
begin
  x := 42;
  Result := @x;
end;

var
  p: PInteger;
begin
  p := Dangling;
  WriteLn(p^);
end.

The function Dangling returns a pointer to a local variable. The problem here is, that as soon as the function ends, x doesn't exist anymore, so the Pointer returned (and stored in p) will point to already freed memory. Those kinds of bugs can be hard to find, because of the nature of the stack, the memory is still around and may not have been reused at this point. In the example above, WriteLn(p^) will still print 42, as nothing happend to override this newly freed memory yet. But if another function is called:

function Dangling: PInteger;
var
  x: Integer;
begin
  x := 42;
  Result := @x;
end;

procedure Smack;
var
  x: Integer;
begin
  x := -1;
  WriteLn('X: ', x);
end;

var
  p: PInteger;
begin
  p := Dangling;
  Smack;
  WriteLn(p^);
end.

Now the function Smack has it's own stack frame, and it will be overlapping with the memory previously used by x and pointed to by p.

This makes finding such bugs quite hard, as the code might work as expected initially, but later, when another function was called, suddenly the results are completely different.

Best Practices

Generally usage of local lifetime managed variables is very easy and mostly safe. Due to the implementation through the stack it is also very efficient. It is therefore recommended to always use local variables whenever possible, and therefore preferable over the other methods outlined in this article.

That said, the programmer should take a few precautions to avoid having dangling pointers:

  • Never return the pointer to a local variable:
This is the most straight forward to follow guideline. Simply avoid taking the address of any local variable in the Result of a function.
This is always an error.
  • Avoid taking pointers from local variables:
While it's quite easy to check if a function returns a pointer to a local variable through it's Result, there are other ways a pointer to a local variable can be used after the function that took that pointer has ended.
For example if the pointer is written into a global variable, or written into an object. A common mistake is to put the pointer into the tag of an LCL control, e.g. in a ListView:
procedure TForm1.CreateItem(Data: TMyTreeData);
begin
  With ListView1.Items.Add do
  begin
    Caption := Data.Name;
    Tag := IntPtr(@Data);
  end;
end;
Because the Tag allows to put in a Pointer for storing additional data, the usage of local variables is a common mistake here. As the function ends, the local variable (or parameter in this case) will be destroyed, but the newly created list item will still prevail. Therefore if later the tag is accessed, it will be trying to dereference a pointer to memory that is already long gone, and therefore result in a bug.
The best way to avoid this, is to simply never take pointers to local variables if you can avoid it. Passing parameters as var to another function allows to reference them in a safe manner, as the var parameter will also only live as long as the called function.
  • Know your lifetimes:
If you must take the pointer of a local variable, e.g. when you create an object that must requires a pointer, e.g. to a buffer, you must make sure that any of those occurances where the pointer is used, cannot outlive the function.
In case of objects this means that the object is freed within the same function that holds the local variable. In case of global variables, the global variable must be reset before the function has ended, or it must be guaranteed that the global variable is not touched afterwards.
This requires extensive knowledge about the code base, and can be really hard to track. Therefore this should be avoided whenever possible.

Notes

Writeable Constants

One big issue with global variables is the (lack of) scoping. A global variable is, as the name suggests, globally accessible. In order to avoid bugs through misuse, you may want to restrict access to a variable within a context, the so called scope, but without restring the lifetime to that scope. This can be achived with writable consts. Those are local variables with a global lifetime, meaning they will be created when the program first calls the encapsulating function, but they won't be freed until the program ends.

function NextID: Integer;
const
  CurrentID: Integer = 0;
begin
  Result := CurrentID;
  CurrentID += 1;
end;

This function declares CurrentID as writable const, meaning it has a global lifetime, even though it is locally defined in NextID. Therefore every call to the function NextID uses the same memory for CurrentID, so NextID can "remember" the CurrentID from the last call. This allows this function to return a new ID, always being one more than the last time, in each consecutive call.

Unlike local variables, because the lifetime of writable consts is the one of a global variable, pointers can be taken and passed without creating the dangling pointer problems outlined above.

Management Operators

For most types the local lifetime will just manage the memory. Often a programmer wants to associate code with the construction or destruction of the memory. For example, when a TFileStream is created, not only the memory is allocated, but also a file is opend. Similarly when a TFileStream is freed, not only the memory will be freed, but also that file will be closed.

Previously this was only possible for classes, which had constructors and destructors tied to their memory lifetime. But those classes can only be used with manual memory management, or reference counting (through COM interfaces, see below), and therefore was not available for local lifetime managed variables. This has changed in FPC 3.2.0 with the use of management operators. These allow to write code that will be automatically called by the compiler when the memory is allocated (through the Initialize operator), and when it is freed (through the Finalize operator). See the wiki article for further information.

Manual Memory Management

While local lifetime based memory management is extremely efficient and very safe, it requires the compiler to know exactly what data is required at which point in time. Also it explicetly binds the lifetime of the memory to the lifetime of the scope it is in.

While this works great for local variables, this can be a problem for dynamic data, i.e. where the size is not known at compiletime (like for Lists or Arrays), in cases where the type of data may vary (Polymorphism), or where the lifetime is not directly tied to the scope where it is created.

Take for example a linked list, whenever a new item is added to the list, its memory must be allocated and then used by the list. As you may want to use functions for adding, removing or searching the list, the lifetime of each list element cannot be bound to the function that created it, otherwise it would be gone as soon as the add function returns.

For this dynamic memory allocation, also often called heap allocation, is used. When you allocate memory on the heap, the memory usually (with the exception of the reference counted types outlined below), must be manually freed. Therefore both the creation of the memory as well as the freeing must be done manually.

While the reasons for requiring dynamic allocation outlined above seem quite niche, there is a more practical reason why this is required. The major Object-Oriented Programming (OOP) implementation in Pascal, Classes are by design always dynamic and unlike other types like records or Integers, cannot be used as local lifetime variables. So a programmer will often find themselves in the situation where it is theoretically not be necessary to do dynamic (and manual) memory management, but due to the design of classes is practically forced to do so anyway.

Dynamic Allocation

There are 3 kinds of memory allocation:

  • untyped memory allocations
  • typed pointer allocations
  • class constructors

Untyped Memory Allocation

This is the rawest form of memory allocation, and is done with the functions GetMem, ReallocMem and FreeMem.

These functions allocate untyped blocks of memory, where all the initialization and finalization must be performed by the user. This can be utilized for some optimizations, such as Pre Allocation of large chunks of memory, to avoid many smaller allocations and potetnial copying of data in the future.

As such they are rather rarely used, instead programmers will most often use one of the allocation methods outlined below.

Typed Memory Allocation

Typed memory allocation allows the programmer to dynamically allocate memory for a specific type. This is done with the New and Dispose functions.

Compared to the untyped memory allocation described above, New and Dispose have knowledge about the datatype for which memory is allocated and can therefore do the required initialization and size computations.

Therefore the following code:

var
  p: PInteger;
begin
  New(p);
  p^ := 42;
  Dispose(p);
end;

Is equivalent to the following code using raw memory allocation:

var
  p: PInteger;
begin
  p := GetMem(SizeOf(p^));
  Initialize(p^);
  p^ := 42;
  Finalize(p^);
  FreeMem(p);
end;

This initialization and finalization is necessary because Composite Types such as Records or Objects contain member fields, whose lifetime is bound to the lifetime of the owning record. It will therefore call the Initialize and Finalize operators implicetly (see management operators).

Therefore New and Dispose are basically just a bit of syntactic sugar to clean up the code and not have all that additional size computation and initialization/finalization code that is required when using raw allocation.

Classes

While New and Dispose are the usualy way of implementing dynamic allocation for all base types, as well as Records and Objects, when Classes where introduced, they got their own mechanism for memory allocation, which ties directly with the constructor and destructor functions of the class.

The constructor, usually named Create (but can have other names, such as CreatFmt for Exceptions), operates in three steps:

  1. Allocation of the memory for the class
  2. Initialization of the allocated memory (equivalent to the Initialize call that is performed by New as described above)
  3. Execution of the Constructor function

The complement to the Constructor is the Destructor, which should be a virtual/overriden function named Destroy. The destructor performs similar function to the constructor just in opposite order:

  1. Execution of the Destructor function
  2. Cleanup of the allocated memory (equivalent to the finalize call that is performed by Dispose as outlined above)
  3. Freeing of the memory used by the class

In practice the usage of a class looks like this:

var
  sl: TStringList;
begin
  sl := TStringList.Create; // Constructor call
  sl.Add('Hello');
  sl.Add('World');
  WriteLn(sl.Text);
  sl.Free; // Internally calls the destructor Destroy
end;

Note that usually the destructor Destroy is not called directly, but rather Free, which first checks for nil before calling Destroy. How sensical it is to call Free instead of Destroy directly is debatable, but it's a common convention that can be seen in most code and will also be used here.

Memory Management Errors

No matter what form of allocation is used, every allocation requires a correspopnding free. So for each GetMem, there should be one call to FreeMem to clean up the allocated memory afterwards. For each call to New there should be a corresponding Dispose. And for every class created through it's constructor there should be a call to Free or the Destructor directly.

Memory Leaks

Because Programs can get very complex, it can sometimes be quite hard to make sure that all allocated memory is correctly freed. If there is memory which is allocated but never freed, this is called a "Memory Leak". Memory leaks can be sole occurances, in which case they are not very harmful. But often memory leaks occur within a piece of code that is called multiple times. Over these calls the memory leaks add up, and could result in major waste of memory, which can result in performance loss and potentially even to the computer running out of memory and having to forcefully kill processes or crash completely.

Use-After-Free

Even worse than memory leaks are so called use-after-free bugs, which is when the memory was freed to early, while there is still code that has a reference to that memory and wants to access it later in the programs execution. The dangling pointers described above are a form of use-after-free.

Use-After-Free bugs are considered worse than memory leaks, because while memory leaks can result in performance loss and in the worst case crashes, they do not result in any misbehavior. Also memory leaks are usually quite easy to find. Use-After-Free on the other hand can result in unpredicted behavior, and may even result in exploitable vulnerabilities through which an attacker could trick your program into executing malicious code, or circumventing other security mechanisms such as access control.

The reason for this is, that memory as a limited resource, does not go away after it has been freed, but will be reused eventually, resulting in access to other data than was originally intended. An example for this behavior is:

procedure UseAfterFree;
var
  sl1, sl2: TStringList;
begin
  sl1 := TStringList.Create;
  sl1.Add('Hello SL1');
  sl1.Free;

  sl2 := TStringList.Create;
  sl2.Add('Hello SL2');
  WriteLn(sl1.Text); // Use of sl1 after it has been freed above
  sl2.Free;
end;

Because memory is re-used, in this example sl2 will get the same memory allocated as sl1 previously held. Therefore when accessing sl1 after it has been freed, it will unknowingly access sl2 and print 'Hello SL2'.

This means in a sufficiently complex program, a use-after-free could potentially access any memory at any time, making the resulting errors hard to predict and hard to catch. Note that in the example above the use-after-free does not throw any exception/error, and produces a completely valid result. This is what makes finding these bugs so difficult, while giving them also the potential to completely alter the behavior of the program, by writing data to memory that the code shouldn't be able to touch.

Best Practices

Ownership Model

In order to avoid both memory leaks, and use-after-free, it is helpful to structure your memory management around an "ownership model". The core idea is, similar to the compiler managed local memory described above, that the lifetime of any memory is tied to the one of the logical owner of that piece of memory. Everytime memory is allocated there must be an "owner", some piece of code that is responsible for that memory. This onwer then must make sure that to always free the memory, eventually (potentially when itself is finished it frees all the objects it owns). Also important is that whenever the owner "dies" no one else has access to that memory anymore.

This ownership model is already provided by many classes in the RTL, FCL and LCL, e.g. in the Component Model, or in the container libraries like Generics.Collections.

Note, in the following we will always use classes as an example, but the same holds true for memory allocated with new/dispose or GetMem/FreeMem.

The easiest case is for temporary and local objects, those are essentially equivalent to the local variables whose lifetime is bound to their scope and that are managed by the compiler. E.g. a TStringList to read a config file:

procedure ReadConfigFile(const FileName: String);
var
  sl: TStringList;
begin
  sl := TStringList.Create;
  try
    sl.LoadFromFile(FileName);
    // Do something with the data in SL
    ConfigValue1 := sl.Values['Config1'];
    ConfigValue2 := sl.Values['Config2'];
  finally
    sl.Free;
  end;
end;

In this case the owner of that SL is the function ReadConfigFile. It is created by that function, and in the end freed by that function. The Try-Finally ensures that this will always happen, no matter if it has an early break, exception or any other form of jump.

Borrowing

To avoid use-after-free, sl cannot leave this function. After the finally block sl should not be used anymore. This also includes passing sl to some other object, this is fine as long as this other object can only use SL while the function is still active:

function EncodeBase64(data: Array of Byte): String;
var
  ss: TStringStream;
  enc: TBase64EncodingStream;
  b: Byte;
begin
  ss := TStringStream.Create;
  try
    enc := TBase64EncodingStream.Create(ss);
    try
      for b in data do
        enc.WriteByte(b);
      enc.Flush;
      ss.Seek(0, soFromBeginning);
      Result := ss.DataString;
    finally
      enc.Free;
    end;
  finally
    ss.Free;
  end;
end;

Here this function owns two objects, the enc and ss. Again both are ensured to be freed through try-finally. But enc uses ss, so basically enc is "borrowing" the ss, which is owned by the function. It is important to ensure that enc is not using ss after it is freed. The only way to ensure this is by having enc be freed before ss was.

So when an another piece of code "borrows" memory owned by a different piece of code, the borrower must "give up" the borrowed memory before it can be freed. When writing object oriented code, most of the time this means either freeing the object that borrows the memory before freeing the borrowed memory. In other cases this may mean to revoke access to said object, or to stretch the borrowing metapher further, for the borrower to return the borrowed memory:

var
  GlobalLogger: TStream; // Global variable will be our borrower
     
procedure LogString(str: String);
begin
  if Assigned(GlobalLogger) then // If logger is registered use it
  begin
    GlobalLogger.Write(str[1], Length(str));
    GlobalLogger.WriteByte(10); // Newline
  end
  else // otherwise use writeln
    WriteLn(str);
end;
     
procedure FileLoggerTest;
var
  fs: TFileStream;
begin
  fs := TFileStream.Create('test.log');
  try
    // Set global logger to "borrow" fs
    GlobalLogger := fs;
    LogString('Test log file');
  finally
    // ensure global logger is reset to revoke access before freeing fs
    GlobalLogger := nil;
    fs.Free;
  end;
  LogString('Test log WriteLn');
end;

Here the global variable GlobalLogger, which is used by anohter function, is the borrower. So before the FileStream can be freed by it's owner (the FileLoggerTest function), the owner first must make sure that the borrowing is revoked, and does this by resetting the global variable to nil.

Transferral of Ownership

The examples above describe the easy case, where the creator of the memory is also the owner. In some scenarios this may not be the case. For example, when adding data to the list, the function that creates the data, may be just called to create that piece of data, but the real owner is the list where that data will be stored in even after the creating function is finished.

The easiest example for this is the use of TObjectList, because this class already provides functionality for ownership handling:

uses
  classes,
  Generics.Collections;
     
type
  TStringListList = specialize TObjectList<TStringList>;
     
var
  list: TStringListList;
begin
  list := TStringListList.Create(True); // The parameter AOwnsObjects is True so now the list owns the object
  try
    list.Add(TStringList.Create); // creates new object that is owned by list
  finally
    list.Free; // TObjectList with AOwnsObjects will Free all objects it owns when freed
  end;
end.

By setting the AOwnsObjects parameter of the Constructor of TObjectList to true, we tell the TObjectList that it is the owner of all objects we pass to it. This means, when adding a new object to the list, the list will automatically ensure that the object is going to be freed when the item is deleted, or the list is freed. Because ownership is handed over to the list, the function that creates the object does not need to care about freeing it's memory afterwards.

TObjectList can not just claim ownership through the Add function, but also provides the Extract method to give up ownership of an element:

var
  list: TStringListList;
  sl: TStringList;
begin
  list := TStringListList.Create(True); // The parameter AOwnsObjects is True so now the list owns the object
  try
    list.Add(TStringList.Create); // creates new object that is owned by list
    sl := list.Extract(0); // Transfer ownership of the object back to the caller
    try
      // ...
    finally
      sl.Free; // As sl is now owned by the function, it must free it manually
    end;
  finally
    list.Free; // TObjectList with AOwnsObjects will Free all objects which have not been extracted
  end;
end.
Summary

The ownership model is an easy way to structure the code in a way to make it clear when memory can be used, and when it can be freed. By always making sure that every manually allocated memory has exactly one owner, this owner can make sure that the data will be freed. It also can help preventing use-after-free bugs, as every piece of memory can live at most as long as it's owner, so by tracking the lifetime of the owner, it can be ensured that the memory is still valid.

The ownership model is already implemented in many libraries that are provided by the RTL, FCL or RTL, it forms the basis for the whole Component Model, on which the LCL is based:

procedure TForm1.AddButtonClick(Sender: TObject);
begin
  with TButton.Create(Self) do // Setting self as owner for the new button
  begin
    Caption := 'New Button';
    Parent := Self;
    Left := 0;
    Top := 0;
  end;
end;

By setting the owner of the button to Self, this Button is owned by the Form, and when the Form is destroyed, it will also destroy the button.

The ownership model can be summarized by a few rules of thumb:

  • Always ensure there is one and only one owner
  • When used only locally use try-finally to enforce ownership
  • Borrows are not allower to live longer than the owner
    • When borrowing a local object, the borrow must be finished within the same try-finally block
    • When borrowing between objects, the borrowing object cannot be destroyed before the borrow ended
  • When transferring ownership ensure that the new owner satisfies all the previous rules

Testing

The only way to truely ensure that there are no use-after-free bugs or memory leaks is through testing. There are multiple levels of testing, but for finding these sorts of bugs, Unit Tests are usually the best approach. These are very small programs, that only test a very small subset of the functionality, e.g. a single class or function. FPC provides the fpcunit framework, which allows to easiely create and manage these test cases.

HeapTrc

In order to detect such memory errors, fpc provides the heaptrc unit. This unit checks if for every allocation there is a free, and then reports any memory leaks, including where the memory was allocated. Despite this, heaptrc also taints memory after it is freed, this helps with finding use-after-free bugs.

Through the use of the keepreleased it also prevents memory being reused. While this would in a real application result in massive memory consumption, for small test cases, such as unit tests, this can be used to ensure that use-after-free will always result in access to this taineted memory.

Taking the example from above, simply adding heaptrc and keepreleased:

uses
  HeapTrc, Classes;

procedure UseAfterFree;
var
  sl1, sl2: TStringList;
begin
  HeapTrc.KeepReleased := True;
  sl1 := TStringList.Create;
  sl1.Add('Hello SL1');
  sl1.Free;

  sl2 := TStringList.Create;
  sl2.Add('Hello SL2');
  WriteLn(sl1.Text); // Use of sl1 after it has been freed above
  sl2.Free;
end;

Now because sl1 has been tainted and is not re-used by sl2, the call to sl1.Text will throw an error, making it easy to find and fix.

Valgrind

Another tool to detect memory errors is Valgrind. Unlike heaptrc, which simply adds some additional checks to the GetMem and FreeMem functions, valgrind is a complete emulator. When executing your code in Valgrind, it will check at every single instruction if some memory errors are present.

This makes valgrind by far the most powerful tool for finding memory errors, but this comes at a price. Valgrinds emulation is around a factor 1000 slower than native execution, making it potentially infeasable to test larger programs. So also here this must be used in combination with very small and very specialized unit tests.

Don't Free

There is this quote, which I could not figure out whom to attribute:

You cannot use after free if you never call free

As outlined above, use-after-free errors are much worse than memory leaks. Infact, when your program is not going to use that much memory in the first place, memory leaks will have nearly no impact on the user at all.

One strategy can therefore be to simply ignore the memory leaks and never free the memory. This strategy is often employed by extremely security critical software, which have quite a limited memory consumption. For example when creating a CGI module for a webservice, which will be started just to serve a single request, as soon as the request is over, the process is killed and all the memory will be released by the OS anyhow. Therefore memory leaks will not matter. But if it is an external facing webservice, security vulnerabilities that may be introduced by use-after-free could be devastating. Therefore not freeing the memory could be a completely valid strategy in order to avoid any use-after-free errors.

The problem with this is that classes who need their destructor called (e.g. Streams to flush their buffered content), cannot call the destructor without calling Free. So in order to not use Free, either no classes need to be used, or a memory manager must be installed that does not actually free the memory. While HeapTrc.KeepReleased does exactly that, HeapTrc other functionality may impact the performance to much to make it a viable alternative.

Avoiding Manual Memory Management

Even when following the ownership model strictly, and doing a lot of tests, memory management errors are unavoidable. Manual memory management is hard, and errors can be very hard to find due to their subtlety.

When Mozilla rewrote their CSS engine in Rust, they analyzed the security related bugs that where previously found in that code and published the results in their blog. Of all the bugs they've got reported, nearly half where memory management bugs, i.e. where use-after-free or memory leaks.

Because memory management is so hard, the most simple solution is often to simply avoid manual memory management whenever possible. Because Classes are designed to require manual memory management most of the time, it may sometimes not be avoidable, but there are a few ways reliance on manual memory management can be reduced:

These types can utilize the local lifetime management of the compiler to not require manual memory management
Dynamic arrays are automatically reference counted (see below), so no manual memory management is necessary. While specialized classes like TList, THashMap, TRBTree, etc. may be more efficient for a certain task, if the data structure does not contain many entries (e.g. a List with just a few dozen entries), this won't make much difference to the user experience, but can result in code that requires no manual memory management.
Dynamic arrays with just one element can be even used instead of a pointer (e.g. to implement linked lists)
  • Use COM interfaces to your classes
Like dynamic Arrays Interfaces are also reference counted. By writing interfaces for your classes, you can use these classes without needing to care for memory management

Reference Counting

While the ownership principle can help to easiely identify when an and where any memory needs to be freed, it is not applicable in all cases. Sometimes the architecture requires a piece of memory to be shared by multiple peers, where there is no clear owner, or multiple owners.

Considering an expression tree for the expression (5 + 3) * (5 + 3). This could be represented as either a strict tree:

     *
   /   \
  +     +
 / \   / \
5   3 5   3

This can be simply put into code, whith each node being it's own object and having exactly one parent. When the parent node is freed, it would free all of it's children.

But, as an optimization, the representation in code is changed such that equal subtrees are replaced with just a reference to the same object:

     *
   /   \
  + <---)
 / \   
5   3

But now there is a problem, because now the "+" node has two parents, and thereby no clear owner.

There are multiple solutions to this. For example, all those nodes could be put into one list, which will be the owner of all the nodes, and all the nodes just borrow each other from that list. But a more flexible solution is to simply allow multiple owners.

This can be achieved with so called reference counting. For this each allocated object will have a counter embedded. When a new owner gets access to the object, it notifies the counter and the count will be incremented. Then when an owner does not need the object anymore, it just tells the counter that it's reference will not be used anymore, and the counter is decreased. When the counter hits 0, the memory will be freed.

In the example expression tree above, the reference count of '*' would be 1, '+' is owned two times by '*' so its reference count is 2, '5' and '3' are owned exactly once, so their reference count is 1. When '*' reference is lost reducing the counter to 0, it will reduce the reference counter of it's first child, '+' to 1, and then when freeing the second child, it is reduced again to 0. '+' then reduces both it's childrens reference counter from 1 to 0, freeing them, before freeing itself, and finally '*' get's freed.

This results in no memory leaks, and when used consequently, no use-after-free, because it can only be used when there are references, but then the counter would not be 0.

The major problem is, implementing this manually is extremely error prone, as it is easy to simply forget to decrease or increase the counter. But the FPC already implements reference counting, at least for some types, on a language level. Dynamic arrays, Strings and COM Interfaces are already reference counted. This means, whenever such an object is created, it's reference count is set to 1. When it is assigned to a new variable, it's reference count will increment, if a variable pointing to that object is finalized, the reference counter will be decremented:

procedure PrintArray(arr: TIntegerArray>);
var
  i: Integer;
begin
  // Function is entered parameter arr is in scope, the reference counter of arr is incremented
  for i in arr do
    Write(i, ', ');
  // Function is finished, parameter arr goes out of scope, reference counter of arr is decremented
end;

var
  arr, arr2: Array of Integer;
begin
  arr := [1, 2, 3]; // Initialized with refcount 1
  arr2 := arr; // Refcount gets incremented to 2
  PrintArray(arr); // Function enters -> Increment, function finished -> Decrement
  arr := []; // Reference lost -> Decremetned back to 1
  // End of program, arr2 get's finalized -> Reference drops to 0, arr gets freed
end.

This is why, even though Arrays are dynamically allocated, there is no need to manually free them.

COM Interfaces

While the usage of arrays and strings is straight forward, Interfaces are often underutilized for this.

To use Interfaces to solve the original problem of representing the expression tree of (5+3)*(5+3), we can simply implement the expressions as classes implementing a common interface, and use this:

type
  IExpression = interface
  function Compute: Integer;
  end;

  TMulExpression = class(TInterfacedObject, IExpression)
  private
    FLeftChild, FRightChild: IExpression;
  public
    constructor Create(LeftChild, RightChild: IExpression);

    function Compute: Integer;
  end;

  TAddExpression = class(TInterfacedObject, IExpression)
  private
    FLeftChild, FRightChild: IExpression;
  public
    constructor Create(LeftChild, RightChild: IExpression);

    function Compute: Integer;
  end;

  TNumExpression = class(TInterfacedObject, IExpression)
  private
    FValue: Integer;
  public
    constructor Create(AValue: Integer);

    function Compute: Integer;
  end;

constructor TMulExpression.Create(LeftChild, RightChild: IExpression);
begin
  FLeftChild := LeftChild;
  FRightChild := RightChild;
end;

function TMulExpression.Compute: Integer;
begin
  Result := FLeftChild.Compute * FRightChild.Compute;
end;

constructor TAddExpression.Create(LeftChild, RightChild: IExpression);
begin
  FLeftChild := LeftChild;
  FRightChild := RightChild;
end;

function TAddExpression.Compute: Integer;
begin
  Result := FLeftChild.Compute + FRightChild.Compute;
end;

constructor TNumExpression.Create(AValue: Integer);
begin
  FValue := AValue;
end;

function TNumExpression.Compute: Integer;
begin
  Result := FValue;
end;

var
  FivePlusThree: IExpression;
  TimesExpr: IExpression;
begin
  FivePlusThree := TAddExpression.Create(TNumExpression.Create(5), TNumExpression.Create(3));
  TimesExpr := TMulExpression.Create(FivePlusThree, FivePlusThree);

  WriteLn(TimesExpr.Compute);
end.

This will be able to evalute the expression tree to 64, without any memory leaks, or risk of running into use-after-free.

Operator Overloading

Another interesting use of COM interfaces to facilitate reference counting is the GMP unit provided by the FPC. Because one issue with manual memory management is that the usage of any temporary object would create a memory leak.

Take the following expression: (a + b) * c. First (a + b) is computed, which gives a temporary object that then is used to be multiplied with c. If the object requires dynamic allocation, because of it's temporary nature, (a + b) could not be freed, resulting in a memory leak, therefore operator overloading is not possible with classes. So in classical, manually managed GMP this computation would be written like this:

var
  a, b, c, tmp: mpz_t;
begin
  // init a, b, c 
  mpz_init(tmp); // allocate memory for tmp
  try
    mpz_add(tmp, a, b); // tmp := (a + b)
    mpz_mul(Result, tmp, c); // Result := tmp * c
  finally
    mpz_clear(tmp); // Free tmp
  end;  
end;

This is alot of boiler plate code distracting from the actually quite simple computation. Here is how it looks like using the reference counted interfaces

var
  a, b, c, tmp: MPInteger; // Reference counted COM interfaces
begin
  // Init a, b, c 
  Result := (a + b) * c;
end;

So while normal classes cannot use operators, because the creation of temporary objects would result in memory leaks, through the use of COM interfaces this is possible, which can result in much cleaner code

Circular References

One issue with reference counting are circular references. Assuming two objects, A and B, if A has a reference to B and B holds a reference to A, then their respective reference counters can not drop below 1, because they always reference each other. Especially when representing cyclic graphs, this can acutally lead to memory leaks, if not resolved properly.

Therefore, while the usage of interfaces can help massively with writing clean and memory safe code, it is not a "silver bullet". The programmer must still always be aware if circular references can occur, and if so, they must be resolved manually.