Amongst Our Weaponry

Amongst Our Weaponry

By Steve Love

Overload, 29(162):4-11, April 2021


Records in C# v9.0 and .Net 5.0. Steve Love explains a new way to define types in C#.

The release of .Net 5.0 in November 2020 was a major upgrade, bringing .Net Core and .Net Framework together under a single banner with a number of improvements, updates, and fixes. The release also represents an update to the C# language, and .Net 5.0 brings C# to version 9.0. One of the flagship features of C# v9.0 is the record, a new way to create user-defined types.

C# has supported classes and structs since it was introduced in the early 2000s, and the general concept of user-defined types goes back to the 1960s. A reasonable question to ask, then, is why does C# need a new way to create user-defined types?

In this article we’ll look at what records are useful for, and how to use them in our own code. We’ll contrast them with classes and structs, the other main ways of creating user-defined types in C#, with some common use-cases and examples. We will also look at some of the performance characteristics of records at a high level.

Let’s begin with some examples of their main features.

What is a record?

Records are a light-weight way to define a type that has value semantics for the purposes of comparing two variables for equality. Here is the full definition of a simple record type:

  public record Point(int X, int Y);

The Point type defined here is almost as simple as it could be. The record definition itself hides the fact that a Point has two int properties named X and Y, and a constructor taking two int parameters to initialize those property values. The syntax shown here is known as Positional Record. The X and Y parameters in the record declaration correspond to properties of the same name and type in the Point record. How we create an instance of a Point and access its properties will be familiar to any C# developer:

  var coord = new Point(2, 3);
  Assert.That(coord.X, Is.EqualTo(2));
  Assert.That(coord.Y, Is.EqualTo(3));

Those generated properties have no set accessors, so Point is immutable. Once we’ve created an instance of Point, its value can never be changed. This is in keeping with the common recommendation to make all value types, and value-like types, immutable. Some familiar examples include string, which is a class but has value-like semantics, and System.DateTime, which is a struct.

Equality

When we compare two record instances to determine if they’re equal, their values are compared. For two Point variables, this means that both the X and Y properties match. This is similar behaviour to comparing two struct instances, but differs from most class types. We use reference variables to manipulate class instances on the heap. Two references compare equal if they refer to the same instance in memory.

The following code shows two Point values being compared for equality in different ways. Firstly, they’re compared using == which gives a value-based comparison. Next, they’re compared using ReferenceEquals, a method defined on the object base class that returns true if two variables refer to the same instance. Note that deliberately performing a reference comparison gives a different result from a direct equality check:

  public record Point(int X, int Y);
  var coord = new Point(2, 3);
  var copy = new Point(2, 3);
  Assert.That(coord == copy, Is.True);
  Assert.That(ReferenceEquals(coord, copy),
    Is.False);

Two instances of the Point record compare equal if all of their respective properties compare equal, irrespective of whether they refer to the same object in memory. This is a defining characteristic of value types. More generally, two record instances compare equal if they are the same type and all their properties also compare equal.

Copying

Records are in fact reference types. They are allocated on the heap, are subject to garbage collection, and under normal circumstances are copied by reference. If we assign one Point variable to another, as shown below, we have two references to the same Point instance:

  var coord = new Point(2, 3);
  var copy = coord;
  Assert.That(ReferenceEquals(coord, copy),
    Is.True);

There are times when we need to copy a value, but change only some of its properties. Records have an associated feature called non-destructive mutation which allows us to create a new instance from an existing one, but with some altered properties. When we assign one record variable to another, we can add a with clause to the assignment, as shown here:

  var coord = new Point(2, 3);
  var copy = coord with { Y = 30 };

In this example, copy is an independent instance of Point. It’s a copy of the original coord record, except that it has a different value for the Y property. The X property of copy is taken from the corresponding X property of coord, and is unchanged in copy. Again, we can confirm this with a few tests:

  Assert.That(coord.Y, Is.EqualTo(3));
  Assert.That(copy.Y, Is.EqualTo(30));
  Assert.That(coord.X, Is.EqualTo(copy.X));
  Assert.That(ReferenceEquals(coord, copy),
    Is.False);

Since there are only two properties in this example, the benefit of using the with syntax in this way isn’t immediately obvious. For records with several properties, however, this approach may be significantly more compact than the alternative of creating a new instance, and passing in a mixture of values and properties from an existing record to the constructor.

Deconstruction

We’ve examined construction of records so far, so in the interest of balance let’s have a look at deconstruction. This is the process of capturing the component properties of a record into individual variables, like this:

  var coord = new Point(12, 34);
  var (x, y) = coord;
  Assert.That(x, Is.EqualTo(coord.X));
  Assert.That(y, Is.EqualTo(coord.Y));

Here, the coord variable is being deconstructed into the named variables x and y. We probably wouldn’t use deconstruction directly like this; it’s more useful when we call a method that returns a record instance but we want separate variables. Note that the names of the individual variables can be different to the property names in the record. We can use any valid variable name for those variables.

Sometimes we don’t need to capture all the components because we’re only interested in a subset of the values. Instead of creating a variable that is never used, we can use the underscore as a placeholder like this:

  public Point ParsePoint(string coordinate)
  {
    // ...
  }
  var (_, height) = ParsePoint("3,2");
  Assert.That(height, Is.EqualTo(2));

In this example, only the second component of the record – the Y property – is copied to the named height variable. The placeholder, known as a discard, tells the compiler to ignore the X property of the Point record. For records with more than two properties we can use the underscore identifier to discard multiple values.

String representation

Record types have a built-in consistent string representation available using the ToString method. This method is available to all types, since it’s defined on the object base class. However, unless we override it for ourselves in classes and structs, it returns just the name of the type, qualified with its namespace.

Calling ToString on a record instance, however, returns just the type name along with the names and values of all of the properties, like this:

  var coord = new Point(12, 34);
  Console.WriteLine(coord);

This gives the output:

  Point { X = 12, Y = 34 }

The Console.WriteLine method calls ToString to obtain the representation. We might also use string interpolation to embellish the output with the variable name like this:

  Console.WriteLine($"{nameof(coord)} = {coord}");

Giving the output:

  coord = Point { X = 12, Y = 34 }

Although logging is the obvious choice as a good candidate for uses like this, there are other potential benefits because the output is easily parsed to re-create an object, although records don’t provide that facility for us – we have to write that ourselves.

Inheritance

We can inherit one record type from another in exactly the same way as we can with classes, and the semantics are broadly the same as for class inheritance. For example, we can use a base-class reference to an inherited record, and we can cast from one type to another. In the following example, we derive a Point3d record from our Point record:

  public record Point(int X, int Y);
  public record Point3d(int X, int Y, int Z) :
    Point(X, Y);
  var coord3d = new Point3d(2, 3, 4);
  var coord = (Point)coord3d;
  Point point = new Point3d(2, 3, 4);
  var point3d = point as Point3d;

If we attempt to cast from a base type to a more derived type when the conversion isn’t valid, we get an InvalidCastException, just as with classes:

var coord = new Point(2, 3);
Assert.That(() => (Point3d)coord, Throws.TypeOf<InvalidCastException>());

Also in exactly the same way as we would with a derived class, we can use an instance of a derived record as an argument to a method expecting a base class record, as shown here:

  public double LineDistance(Point a, Point b)
  {
    // ...
  }
  var pointa = new Point3d(2, 5, 4);
  var pointb = new Point3d(6, 8, 4);
  Assert.That(LineDistance(pointa, pointb),
    Is.EqualTo(5));

Records can only inherit from other records, so we can’t inherit a record type from a class, nor can a class derive from a record.

We can seal a record to prevent it from being inherited. Once again, the syntax is identical to that we’d use for a class, as shown here:

  public sealed record Speed(double Amount);

It’s common for value types to be sealed when they’re modelled as classes. The built-in string class is a case in point, and all struct types are effectively implicitly sealed. Conceptually, values are fine-grained bits of information, often representing transient and even trivial bits of data. Values are different from entities most clearly in that value types place great importance on their state.

Value semantics

When we say that value types place great importance on their state rather than their identity, what we really mean is that we determine if two values are equal according to the value they represent. This differs from entity types where they compare equal if they represent the same instance in memory. This latter behaviour is often called reference semantics, where we can have more than one reference to an object instance.

Struct instances are all independent of each other. They have true value semantics in that we can’t generally have two variables representing the same instance. Structs are copied by value, so when we assign one to another we get a whole new distinct instance. However, since two values compare equal if they have the same state, it makes no difference that they’re distinct instances.

Records live in a middle ground. Under the covers, records are really classes and so instances are copied by reference. When we assign one record variable to another, we get a new reference to the same instance in memory, unless we explicitly ask for a copy using the with syntax.

However, records are value like in that when we compare them for equality, it’s their state that’s compared, not their identity.

This behaviour of comparing values instead of identities is much the same as for the string type. string is a class, and so is a reference type. Strings are copied by reference, so the contents of a string variable aren’t usually copied. However, strings have value-like behaviour for the purposes of equality comparisons. The string class overrides the Equals method, and implements the IEquatable<string> interface, which defines a type-specific overload for the Equals method.

String also has an operator== definition, which overrides the behaviour of the built-in comparison with ==. When we compare two string variables using either the Equals method or using ==, we’re determining if the two strings have the same value, whether or not they refer to the same string instance.

Equal by value

We can emulate value based comparison in our own classes by overriding the Equals(object?) method, implementing IEquatable for our type, and by providing both operator== and operator!=. There are some subtleties and potential pitfalls to be aware of in those implementations, including the need to handle null values correctly, and making sure we correctly handle any possible base class implementations. If we were to implement our Point type as a class, it might look something like Listing 1.

public class Point : IEquatable<Point>
{
  public Point(int x, int y)
    => (X, Y) = (x, y);
  public int X { get; }
  public int Y { get; }
  public bool Equals(Point? other)
     => !ReferenceEquals(other, null) &&
        GetType() == other.GetType() &&
        X == other.X && Y == other.Y;
  public override bool Equals(object? obj)
     => Equals(obj as Point);
  public override int GetHashCode()
     => HashCode.Combine(X, Y);
  public static bool operator==(Point? left,
    Point? right)
    => left?.Equals(right) ??
       ReferenceEquals(right, null);
  public static bool operator!=(Point? left, 
    Point? right)
    => !(left == right);
}
			
Listing 1

Note that if we override the Equals(object?) method, we also need to override GetHashCode. If we only override one or the other, we’ll get a warning from the compiler. The reason it’s important is that two objects that compare equal should also have equal hash codes. If we fail to observe this rule, we risk being unable to find objects that are used as keys in collections that depend on hash codes for lookup, such as Dictionary and HashSet.

The overridden GetHashCode method in the class shown above might not be the most efficient implementation, but it does guarantee that if two instances of Point are equal, they will also definitely have the same hash code.

With record types, the compiler provides the implementations for each of those members. The code generated by the compiler takes all of the fields declared in the record into account to provide a value-based equality comparison. When we create our own record types, we’re freed from the need to provide all of this boilerplate code just to be able to compare the values of two variables.

What about structs?

Instead of using a class, we can also model our own types using a struct. All structs derive implicitly from the System.ValueType class which provides the necessary overrides to give structs value semantics when we compare them for equality. In addition to the Equals method, ValueType also overrides GetHashCode in a way that ensures that equal instances have matching hash codes.

We might therefore choose to model our Point type as a struct like Listing 2.

public struct Point
{
  public Point(int x, int y)
    => (X, Y) = (x, y);
  public int X { get; }
  public int Y { get; }
}
			
Listing 2

This is significantly simpler than our class definition for Point, and only a little more verbose than the record version. There are limitations to structs, however.

The first thing to note is that we can’t compare two struct instances with == unless we provide our own implementation of operator==. The implementation of that is straightforward enough, however, and with a matching operator!= it looks very much like the version for the class implementation (Listing 3).

public static bool operator==(Point? left, 
  Point? right)
    => left?.Equals(right) ?? !right.HasValue;
public static bool operator!=(Point? left, 
  Point? right)
    => !(left == right);
			
Listing 3

Struct instances can’t normally be null, but our implementations of == and != here also cater for nullable Point values.

Much more significant are the implementations of the Equals and GetHashCode methods provided by the ValueType base class. Those implementations must cater for every possible struct type, and must therefore be very general. Structs can contain any number of fields, and there is no restriction on the types of those fields. How, then, can the base class implementation work correctly in all cases?

ValueType implementations

For GetHashCode, the answer is straightforward. The hash code for a value is calculated from the first non-null field in the struct. If there are no non-null fields, the hash code is 0. This has the correct behaviour in that any two equal values will always have the same hash code. It’s not necessarily the most efficient implementation, because two values can differ in all their other fields, but will have the same hash code if just the first fields are equal. This might slow down lookups requiring hash codes when we have large numbers of values to be compared.

The Equals method needs to be a bit more sophisticated, because comparing only the first field will not be correct in all cases. To determine if two values are equal, all the fields must be compared. In order for this to work for any value type, the implementation of ValueType.Equals uses reflection to discover the fields, and compares the two values by calling Equals on each field. See [Tepliakov] for more information on how Equals and GetHashCode are implemented.

Reflection is a wonderfully powerful tool used in a variety of circumstances, but one thing it most certainly is not is fast. Fortunately, there are optimizations that remove both the need for reflection and the restriction of calculating hash codes from only the first field. In fact, our Point struct would most likely benefit from this optimization because it has two int fields.

Where a struct has only built-in integral type fields, the Equals method can perform a simple bit-wise comparison of two values, and GetHashCode uses bit-masks and bit-shifting on the raw memory representation to calculate a hash code very quickly.

The optimization gets disabled in a wide variety of relatively common cases, however. If a struct contains any field that’s a reference, a floating-point value, or itself provides an override for either the Equals or GetHashCode methods, the slower algorithm must be used.

For the incorrigibly curious, the reference implementation of ValueType.Equals can be found in [Equals]. The key optimization is the call to CanCompareBits, and for the gory details (in C++), see [DotNetCoreRuntime].

The bottom line here really is that we need to override both Equals and GetHashCode for struct types if we need to be sure about the performance of the implementation. These methods are generated for record types by the compiler. There is no base-class implementation that needs to cater for every possible combination of fields. The code is injected directly into a record, almost exactly as if we’d hand-written it ourselves.

All structs are implicitly sealed, which means implementing equality for a struct is relatively straightforward. Records can inherit from other records, and this makes implementing equality more complicated. To see exactly why that is, let’s look at a naïve implementation for a derived class.

Equality and inheritance

Earlier we saw a class called Point that had an override of the Equals method taking an object parameter, and a type-specific overload of Equals. Here is the Point class again, along with a Point3d class that inherits from it (see Listing 4).

public class Point : IEquatable<Point>
{
  public Point(int x, int y)
    => (X, Y) = (x, y);
  public int X { get; }
  public int Y { get; }
  public bool Equals(Point? other)
    => !ReferenceEquals(other, null) &&
       GetType() == other.GetType() &&
       X == other.X && Y == other.Y;
  public override bool Equals(object? obj)
    => Equals(obj as Point);
  // ...
}
public class Point3d : Point, IEquatable<Point3d>
{
  public Point3d(int x, int y, int z)
    : base(x, y) => Z = z;
  public int Z { get; }
  public bool Equals(Point3d? other)
    => !ReferenceEquals(other, null) &&
       Z == other.Z && base.Equals(other);
  public override bool Equals(object? obj)
    => Equals(obj as Point3d);
  // ...
}
			
Listing 4

The implementation of the IEquatable interface in each of these classes, that is the Equals method taking a Point or Point3d rather than object, follows Microsoft’s advice on correctly defining equality for a class as shown in [MSDN2015]. For brevity, they’re not exactly the same, but they are equivalent to those shown online.

The key points here are that the derived class determines that the properties specific to it are equal, and if they are, it defers to the base class to perform its own comparison. The base class checks that both values being compared are exactly the same type before also comparing its individual properties. The type check is required to catch the following comparison:

  var point = new Point(2, 3);
  var point3d = new Point3d(2, 3, 4);
  Assert.That(point.Equals(point3d), Is.False);

Here we’re comparing a Point variable with an instance of Point3d. The Equals method actually being used here is the one defined on the Point base class. The point3d variable will be implicitly cast to a Point. The comparison fails the type check in Point.Equals because the run time types of the two objects being compared aren’t exactly the same.

Even though the X and Y properties match in both objects, the two objects don’t have the same value. A Point3d instance has an extra property named Z that will not be considered by the base class Equals method.

We wouldn’t usually directly assign a derived type to a base class reference like this. It would more usually occur when we call a method taking parameters of the base class type.

Base class comparisons

Standing in for a real method taking parameters of Point type in this example is a simple method named AreEqual (Listing 5).

bool AreEqual(Point left, Point right)
{
  return left.Equals(right);
}
var p1 = new Point3d(2, 3, 1);
var p2 = new Point3d(2, 3, 500);

Assert.That(p1.Equals(p2), Is.False);
Assert.That(AreEqual(p1, p2), Is.False);
			
Listing 5

In this example, we create two Point3d instances that differ in their Z property. We confirm they do indeed compare not equal when we call the Equals method. On the last line we call the AreEqual method, which takes two parameters of the base class type.

This test fails because the call to AreEqual actually returns True. This time, both objects are exactly the same type, and neither one is null. More than that, their X and Y properties both match. However, the comparison of Z properties never happens when the objects are compared using their base class type.

If we change the AreEqual method to take object parameters instead of Point, the test will pass, because object.Equals is a virtual method call. However, in keeping with the advice given on the MSDN, the type-specific overload of Equals is not virtual. When we use a Point variable to call the Equals method, the Point implementation will be called, irrespective of whether the variable actually refers to a more derived type.

We can resolve this problem by making Point.Equals virtual, and adding an override for it to the Point3d class. There are some subtleties to doing this, however, and it’s very easy to get wrong.

Records, as we noted earlier, can inherit from other records as long as the base record isn’t sealed. Moreover, records behave correctly with inheritance and don’t exhibit the problems demonstrated here. The key is in how equality is implemented for records.

Compiler-generated Equals

The code generated by the compiler to implement equality diverges from that recommended in [MSDN2015] – quite rightly, since that implementation isn’t sufficient, as we’ve demonstrated. Let’s begin with the base type Point. Again, for the sake of brevity, the code in Listing 6 isn’t exactly the same as that created by the compiler, but its equivalent.

public class Point : IEquatable<Point>
{
  public Point(int x, int y)
    => (X, Y) = (x, y);
  public int X { get; }
  public int Y { get; }

  protected virtual Type EqualityContract
    => typeof(Point);

  public virtual bool Equals(Point? other)
    => !ReferenceEquals(other, null) &&
       EqualityContract == other.EqualityContract
       && X == other.X && Y == other.Y;
  public override bool Equals(object? obj)
    => Equals(obj as Point);
  // ...
}
			
Listing 6

There are two things of note here. The first is the synthesized EqualityContract method. This is used in the Equals method to confirm that both the invoking object and the argument are exactly the same type. It replaces the call to object.GetType for this purpose.

The GetType method is available to any type, but it’s a non-virtual method that involves a native system call. The EqualityContract method is virtual, but makes use of the typeof operator which is evaluated at compile time. The result of both GetType and EqualityContract under these circumstances is identical, but EqualityContract uses information available to the compiler, whereas GetType calculates the required Type to return at run time.

The second thing to note is that the type-specific implementation of the Equals method is itself virtual. The importance of this becomes apparent when we look at the equivalent code in the derived Point3d class.

Inheriting Equals

Listing 7 is the equivalent code for Point3d that derives from the Point type.

public class Point3d : Point, IEquatable<Point3d>
{
  public Point3d(int x, int y, int z)
    : base(x, y) => Z = z;
  public int Z { get; }

  protected override Type EqualityContract
    => typeof(Point3d);

  public sealed override bool Equals(Point? other)
    => Equals((object?)other);
  public virtual bool Equals(Point3d? other)
    => base.Equals(other as Point) && Z == other.Z;
  public override bool Equals(object? obj)
    => Equals(obj as Point3d);
  // ...
}
			
Listing 7

Not only does Point3d provide its own type-specific implementation for the IEquatable interface, it also overrides the base class’s type-specific Equals. The override invokes the Equals method taking object? as its argument. This in turn resolves to the Point3d.Equals(object?) method, which attempts to cast its parameter to a Point3d.

We should also note that the type-specific implementation of Equals is sealed in the Point3d class. This means that if we were to inherit from Point3d – for the sake of the argument let’s call it Point4d – that more derived type cannot override that method. Sealing a method has the effect preventing a derived type from further customising the implementation of it, but the method is still available for more derived types to call. Our potential Point4d type could still override the Equals(Point3d?) method, however.

Testing Equals for records

There are other minor differences between our Point.Equals implementation and that shown previously, but the main point is that if we were to model our Point and Point3d types as classes, there is quite a lot of boilerplate we need to provide in order for equality to work correctly.

Using records to model these types saves a great deal of code that would otherwise have to not only be written, but tested too. We previously saw a test for equality for our original class implementation of Point and Point3d that failed. Here it is once more:

  bool AreEqual(Point left, Point right)
  {
    return left.Equals(right);
  }
  var p1 = new Point3d(2, 3, 1);
  var p2 = new Point3d(2, 3, 500);
  
  Assert.That(p1.Equals(p2), Is.False);
  Assert.That(AreEqual(p1, p2), Is.False);

Where Point3d is a record that inherits from a Point record, this test now passes. There is more than just equality to consider when we inherit from a value type, however.

Style over substitutability

Although the compiler generates code to correctly perform an equality comparison for records that inherit from one another, it can’t generate code for any of the other operations we might need to implement. For example, if we wanted to implement the IComparable interface for our Point and Point3d types, we’d have to implement it ourselves.

Would it make sense for us to compare a Point3d instance to determine if it was less than an instance of a Point? What about the other way around? What compromises might we have to make?

Inheritance and virtual methods work well for entity types where we want to customize or embellish the behaviour of a base class. We also get the benefit of substitutability between the base type and derived type. An instance of a derived type can be used anywhere a base type reference is needed. This allows us to write code in terms of a base type that can be used seamlessly by objects that inherit from that base type.

Entities are the higher-order objects in our designs. They usually represent the persistent information about a system, and the processing of that information in collaboration with other entities. Identity is often important for entities, because we often need to use a specific instance. By contrast, values place no importance on identity. One value is as good as any other value with the same state.

The benefits of inheritance are much less clear for values, which is the reason that structs don’t – indeed cannot – take part in inheritance relationships. It’s also the reason that value-like classes such as string are sealed. Substitutability doesn’t work so well for values; it’s not fair to say that a Point3d is substitutable for a Point because they have different values, and the value is what really matters for a value type.

Has-A versus IS-A

Inheritance is commonly employed to re-use the characteristics of a type and build on it. When we derive a type from a non-abstract base, such as when inheriting Point3d from Point, we’re really inheriting the implementation. Substitutability between types works best when the implementation doesn’t matter. What we really want is to represent the same interface.

More formally the distinction is between class inheritance and type inheritance. By deriving a Point3d from a Point we’re using class inheritance. In order to make it work correctly, we must alter the interface.

However, a much simpler solution would be to discard the inheritance altogether, and simply have Point3d contain an instance of a Point. We get all the benefits of re-using the implementation of Point, but have none of the difficulties of substitutability. Furthermore, we’d make both classes sealed and the implementations of both would be more straightforward. Perhaps even better, we make them structs instead of classes.

Consider the struct in Listing 8.

public readonly struct Point3d : IEquatable<Point3d>
{
  public Point3d(int x, int y, int z)
    => (xy, this.z) = (new Point(x, y), z);
  public int X => xy.X;
  public int Y => xy.Y;
  public int Z => z;
  public bool Equals(Point3d other)
    => xy.Equals(other.xy) && z == other.z;
  public override bool Equals(object? obj)
    => obj is Point3d other && Equals(other);
  public override int GetHashCode()
    => HashCode.Combine(xy, z);
  public static bool operator==(Point3d? left,
     Point3d? right)
    => left?.Equals(right) ?? !right.HasValue;
  public static bool operator!=(Point3d? left,
     Point3d? right)
    => !(left == right);
  private readonly Point xy;
  private readonly int z;
}
			
Listing 8

Here we have a Point3d type modelled as a struct that contains an instance of a Point as a field. We have no need to consider the case where a base class parameter might really be a Point3d because that’s not possible. The only overridden methods are those necessary to provide the basic equality and hash code calculations from the object base class.

We can’t use an instance of Point3d anywhere that a Point is needed. We might provide an explicit conversion – or projection – to a Point that could be used to invoke a method expecting Point variables. In all other respects, the behaviour of this struct matches all the expected behaviour from a Point3d that inherits from a Point.

The one possible objection to this is that structs are copied and passed by value, whereas records are copied and passed by reference. Since a Point3d contains an instance of another struct, we might expect its performance to suffer as a result of needing to copy the whole instance rather than just the reference.

As with all such questions, we must invoke the wisdom, or at least the objectivity, of a performance profiler.

Performance of structs and records

Our Point3d struct doesn’t do much other than being a value. Similarly, the most important aspect of the record equivalent is its value. Therefore the most obvious thing to compare between the two is how equality is implemented. Just as important as the Equals implementation is the GetHashCode method. We should, then, measure the performance characteristics of both methods.

One simple way to do that is to employ a HashSet, which will use GetHashCode to determine where to look for a key, and then use Equals to determine an exact match. A hash set is a unique collection of keys, so a useful test would be to attempt to introduce duplicate keys so that we can be sure a full lookup of a value takes place.

The following simple test creates a list of Point3d objects, and we deliberately introduce duplicate values. We use the source list to populate a HashSet using the ToHashSet method, which simply discards any values that have already been added to the collection. (See Listing 9.)

const int N = 50000000;
const int Filter = 10000;

var source = Enumerable.Range(0, N)
  .Select(i => new Point3d(0, 0, i % Filter))
  .ToList();
var unique = source.ToHashSet();

Assert.That(unique.Count, Is.EqualTo(Filter));
Assert.That(unique.Contains(new Point3d
  (0, 0, Filter - 1)), Is.True);
			
Listing 9

The number of elements is intentionally very large in order to scale-up the relative cost of each method call to make the differences observable. All the following results were obtained by profiling a test using the dotTrace profiler from JetBrains (https://www.jetbrains.com/profiler/) using a straightforward wall-clock time report. In each case, the test was profiled using a Release build.

Profile results

Figure 1 contains the results from running this test using our Point3d record, which inherits from a Point record. the same test was profiled using our Point3d struct, which contains an instance of a Point struct. The results are also in Figure 1.

Records
► 5.52%   Hashset_of_records  •  5,622 ms  •  TestRecords.Hashset_of_records()
  ► 3.00%   ToList  •  3,055 ms  •  System.Linq.Enumerable.ToList(IEnumerable)
    2.51%   ToHashSet  •  2,554 ms  •  System.Linq.Enumerable.ToHashSet(IEnumerable)
    ► 0.47%   Equals  •  478 ms  •  Point3d.Equals(Point3d)
    ► 0.35%   GetHashCode  •  357 ms  •  Point3d.GetHashCode()

Structs
► 2.95%   Hashset_of_structs  •  3,002 ms  •  TestStructs.Hashset_of_structs()
    2.28%   ToHashSet  •  2,325 ms  •  System.Linq.Enumerable.ToHashSet(IEnumerable)
    ► 0.09%   GetHashCode  •  94 ms  •  Point.GetHashCode()
  ► 0.66%   ToList  •  677 ms  •  System.Linq.Enumerable.ToList(IEnumerable)
			
Figure 1

The headline time shows that the test using structs took not much more than half the time of the test using records. Note that the ToHashSet call is somewhat slower for records, but calls to Equals and GetHashCode are much slower than for structs. In fact, the cost of Equals for the struct type doesn’t even register, which means the JIT compiler probably inlined the code.

The Equals method for records is relatively expensive owing to the number of virtual method calls it makes, in this case to the EqualityContract method.

The remainder of the time difference between the struct and record versions is most likely down to the fact that the struct instances are copied by value, but for the records, only the references are copied from the source to the hash set. The difference of ~200ms is negligible really, considering the huge number of elements we were using.

However, copying by value versus copying by reference has another, less obvious implication, which goes some way towards explaining the significant difference in the cost of the call to ToList.

The impact of the managed heap

Records are reference types, allocated on the heap, and are subject to garbage collection in the same way that class instances are. We deliberately introduced duplicate values in our source list, and when the ToHashSet method discards those duplicates they become unreachable, and so are eligible for garbage collection. Struct instances are never individually garbage collected, they simply go out of scope when they’re no longer needed.

Adding such a large number of elements to the list would certainly put some pressure on memory, and very likely use up enough space to cause several garbage collections. We can see this by digging into the ToList call (see Figure 2).

3.00%   ToList  •  3,055 ms  •  System.Linq.Enumerable.ToList(IEnumerable)
  2.76%   <Hashset_of_records>b__13_0  •  2,809 ms  •  <Hashset_of_records>b__13_0(Int32)
    1.60%   [Garbage collection]  •  1,633 ms
    <0.01%   [Thread suspended]  •  5.8 ms
  ► <0.01%   Point3d..ctor  •  5.7 ms  •  Point3d..ctor(Int32, Int32, Int32)
Figure 2

The cost of the garbage collection here isn’t objects actually being collected, it’s most likely the cost of tracing references to each object to determine if they can be collected.

In fact, since we’re putting so much pressure on memory here, it’s likely that even the discarded objects stay in memory for much longer than necessary because they’ll survive successive garbage collections caused by the huge number of memory requests being made.

All of which demonstrates that while copying objects by reference might be cheaper than copying by value, the associated cost of inhabiting the managed heap can offset that benefit and even overwhelm it.

Summary

The new record types in C# v9.0 provide us with a very compact way of defining value-like types without the need to manually write all the boilerplate code to perform equality correctly. The syntax we’ve explored in this article relates to positional records, which is the most compact representation that allows the compiler the greatest flexibility to generate code on our behalf.

We can choose to write our own version of almost any of the methods generated by the compiler if we wish. The exceptions to this are that we can’t provide our own operator== or operator!=. If we want to customize the behaviour of equality, we need to write our own type-specific Equals method for the type. The compiler-generated operator== just forwards to the Equals method anyway.

Any method we write ourselves prevents the compiler from synthesizing its own version; it simply uses the version we provide.

However, since the compiler provides efficient and correct implementations for each of those methods, there seems to be little benefit in writing our own. If we feel the need to have more control over equality, we may as well just use a struct. Where we just need a simple representation of a value, records work very well and the associated facility of non-destructive mutation with the with keyword is a very useful way of handling those values.

Just because we can inherit one record from another, doesn’t mean that we should. Values in general make poor parents, and so records, like structs and other value-like types such as string, should be sealed to prevent further derivation.

We also need to understand that records really are classes under the hood; when we create a record type, the compiler injects a class definition for us. Records are therefore reference types, and so live on the managed heap. This means they are garbage collected, and we might therefore consider using a struct anyway if we’re very sensitive to performance.

An overview of records in C# v9.0, and more detail on what methods the compiler provides can be found at [MSDN2020].

References

[Equals] https://referencesource.microsoft.com/#mscorlib/system/valuetype.cs,22

[DotNetCoreRuntime] https://github.com/dotnet/runtime/blob/01116d4e145d17adefc1237d55b1e3574919b1c1/src/coreclr/vm/comutilnative.cpp#L1738

[MSDN2015] https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/how-to-define-value-equality-for-a-type

[MSDN2020] https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-9#record-types

[Tepliakov] https://devblogs.microsoft.com/premier-developer/performance-implications-of-default-struct-equality-in-c/

Steve Love has been a professional programmer for over 20 years and is still finding new ways to be lazy.