AnsiString

From Lazarus wiki
Jump to navigationJump to search

English (en)

AnsiString is a variable-length string data type. It can store characters that have a size of one Byte.

implementation

In FPC an AnsiString is implemented as a pointer. It is a managed data type. As such it is initialized with nil as soon as it enters the scope. Memory for the character sequence is dynamically allocated and freed.

An AnsiString points to the first character. This facilitates interfacing to libraries or foreign functions expecting pChar strings. For that, an AnsiString always concludes with a null Byte. In Pascal, this terminating null Byte has no significance as to the string’s value (including its length). An AnsiString always entails some management data before the first character. These are

  • a code page
  • the size of a character
  • a reference count
  • the length of the string.
253 233  0   1   0   0   0   1   0   0   0   3  'F' 'o' 'o'  #0
code page maximum character size reference count length payload complimentary Null
pointer points here ⤴  
AnsiString memory layout sample (32-bit platform)

Only the length field has significance in Pascal. In Pascal, an AnsiString may contain #0 characters.

An AnsiString can furthermore be associated with a code page (since 3.0.0).

application

The data type AnsiString can be used like any other string data type. You may assign string literals to an AnsiString variable as normal. String values can be compared (=) just as usual. The entire pointer-characteristic is transparent.

Characters in AnsiString have a 1-based index. myAnsiString[1] refers to the first character.

Light bulb  Note: The linear character index is only guaranteed to work for strings that have a maximum character size of 1. That means, using an integer index for example on an UTF-8 encoded string (not exclusively containing ASCII characters) will produce erroneous results.

The length function, and for that matter also high, will return a string’s length by examining the length data field.

Because an AnsiString is essentially a pointer, copying strings of this type is fast, since only the reference is copied and the reference count increased. Modifications may trigger a COW.

caveats

  • The compiler directive {$longStrings on} (or {$H+}) aliases string (without a specified length) to AnsiString.
  • AnsiString as a managed data type introduces a certain overhead. See Avoiding implicit try finally section for more explanations.
  • The sizeOf value of an AnsiString variable is merely the size of a pointer.
  • Assigning an empty string '' to an AnsiString variable will in fact assign nil to the variable and, if the reference count hit zero, release underlying memory (if any was previously allocated at all). Empty strings are not stored as described above.

see also