Modernization of Legacy Arrays: Replacing CArray with std::vector

Many codebases still use mature libraries, such as Microsoft Foundation Classes. Stuart Bergen explains how and why he moved to using modern standard C++ tools instead.

Our scientific desktop application relies on the Microsoft Foundation Class (MFC) library for both user interface (UI) and computational tasks. MFC, introduced in 1992, encapsulates portions of the Windows API within object-oriented C++ classes and is currently maintained [Wikipedia-1]. Despite its age, our users enjoy the interactive, straightforward and visually appealing experience that MFC provides. As a result, there has been little incentive to replace complex UI-related code with alternatives.

Computational tasks using the MFC library show signs of aging compared to standard C++ containers. The primary MFC container class, CArray, closely resembles vector in functionality [MS-1], [CPP-1]. Both offer C-like arrays that dynamically grow and shrink, reserve contiguous space on the heap, and use zero-based indexing. However, a key difference is that CArray lacks iterator support. We previously used boost::iterator_facade [Abrahams06] to supplement basic iterator support, but this only goes so far. The codebase was updated to prepare for future development and embrace modern practices, including the adoption of C++20 ranges. This effort resulted in simpler, more familiar code, performance and scalability improvements, and an enhanced debugging experience.

In this article, methods to replace the MFC container class CArray with vector are proposed. These practical techniques emerged from a real-world refactoring effort on a commercial software project. First, we present class method conversion techniques suitable for direct substitution. In cases where direct substitutions are not feasible, standalone replacement functions are offered. Next, we consider the array index, public inheritance of the CArray class, array resize, array length, and MFC use of CArray. Finally, we suggest further modernizations and draw conclusions.

Class method conversions

The modernization effort mainly focused on replacing CArray class methods and operators with their vector equivalents. Table 1 lists vector replacements for the public CArray class interface.

CArray	vector	Purpose
`a()`	`a()`	Default constructor
`a.Add(val)`	`a.push_back(val)`	Adds an element to the end of the array; grows the array by 1
`a.Append(b)`	`Append(a,b)`	Appends another array to the array; grows the array if necessary
`a.Copy(b)`	`a = b`	Copies another array to the array; grows the array if necessary
`a.ElementAt(i)` `a.GetAt(i)`	`a.at(i)`	Returns a reference to an array element at the specified index
`a.FreeExtra()`	`a.shrink_to_fit()`	Frees all unused memory above the current index upper bound
`a.GetCount()` `a.GetSize()`	`a.size()` `GetSize(a)`	Returns the number of elements in the array. `vector` replacements yield unsigned and signed integral types, respectively
`a.GetData()`	`a.data()`	Provides direct access via pointer to the underlying contiguous storage
`a.GetUpperBound()`	`GetSize(a)-1`	Returns the largest valid index. Is -1 when array is empty
`a.InsertAt(i,val)`	`InsertAt(a,i,val)`	Inserts an element at the specified index; grows the array as needed
`a.IsEmpty()`	`a.empty()`	Determines whether the array is empty
`a.RemoveAll()`	`a.clear()`	Removes all elements from the array
`a.RemoveAt(i,n)`	`RemoveAt(a,i,n)`	Removes an element (or multiple elements) at the specific index
`a.SetAt(i,val)`	`a.at(i) = val`	Sets the value for a given index; array not allowed to grow
`a.SetAtGrow(i,val)`	`SetAtGrow(a,i,val)`	Sets the value for a given index; grows the array if necessary
`a.SetSize(n)`	`a.resize(n)`	Sets the number of elements in the array; allocates memory if necessary
`a.SetSize(n,m)`	`SetSize(a,n,m)`	Sets the number of elements in the array and storage “grow by” factor; allocates memory if necessary
`a[i]`	`a[i]`	Returns a reference to an array element at the specified index

vector replacements for the public CArray class interface

Table 1

The table served as a helpful reference during the refactoring effort. Template arguments are omitted for brevity. Notation: a and b represent CArray/vector objects, i represents an index, n and m represent array lengths or element counts, val represents an array value, and italicized text represents standalone replacement functions. Full method signatures can be found in the CArray and vector references [MS-1], [CPP-1]. The CArray class possesses only a default constructor; its copy constructor and copy assignment operator are deleted.

Standalone replacement functions in Listing 1 (C++17 compiler required) serve as direct replacements when no vector equivalent is available. Function ValidAt validates an index of any integral type (signed or unsigned). All other standalone functions employ unsigned integral types for indexing, array lengths, and element counts to avoid confusion. If necessary, these functions could be modified to support signed integral types similarly to ValidAt. The RemoveAt function does not validate or assert the array index or element count, so use it with caution. Conversely, functions InsertAt and SetAtGrow perform index validation removing the need for assertion checks.

template <typename T, typename I>
bool ValidAt(const std::vector<T>& a, I i) {
  if constexpr (std::is_unsigned_v<I>)
    return i < a.size();
  else if constexpr (std::is_signed_v<I>)
    return i >= 0 && i < a.size();
  else
    return false;
}
template <typename T>
auto Append(std::vector<T>& a,
            const std::vector<T>& b) {
  return a.insert(a.end(), b.begin(), b.end());
}
template <typename T>
auto InsertAt(std::vector<T>& a, std::size_t i,
              const T& val) {
  if (ValidAt(a, i)) return a.insert(a.begin() 
      + i, val);
  a.resize(i + 1);
  a.at(i) = val;
  return a.begin() + i;
}
template <typename T>
auto SetAtGrow(std::vector<T>& a, std::size_t i,
               const T& val) {
  if (!ValidAt(a, i)) a.resize(i + 1);
  a.at(i) = val;
  return a.begin() + i;
}
template <typename T>
auto RemoveAt(std::vector<T>& a, std::size_t i,
              std::size_t n = 1) {
  return a.erase(a.begin() + i,
                 a.begin() + i + n);
}
template <typename T>
void SetSize(std::vector<T>& a, std::size_t n,
             std::size_t m) {
  a.reserve(m);
  a.resize(n);
}
template <typename T>
__int64 GetSize(const std::vector<T>& a) {
  return static_cast<__int64>(a.size());
}

Listing 1

Standalone functions return iterators corresponding to their vector methods. CArray methods return void, except for CArray::Append, which returns an index of signed integral type to the first appended element. Function Append returns an iterator pointing to the first appended element.

The GetSize function casts the return value of vector::size to a 64-bit signed integral value. Its primary use case is for arithmetic operations involving signed integral types. The assumption is that the vector length will not exceed std::numeric_limits<__int64>::max(), which equals LLONG_MAX or 9,223,372,036,854,775,807. This is a safe assumption for our desktop application.

Array index

Both container classes assume that indexes are valid (within bounds). CArray utilizes a signed integral type for indexes, while vector uses an unsigned integral type. For our 64-bit builds, this equates to the types long long (or __int64 on Windows) and vector::size_type/std::size_t (or unsigned __int64 on Windows), respectively.

Index refactoring involves replacing the signed integral type __int64 with the unsigned integral type unsigned __int64. The value ranges for these types are [-9.223E18, 9.223E18] and [0, 18.446E18], respectively [MS-2]. The unsigned type can safely represent signed integral values that are non-negative. However, it’s essential to handle values less than 0 carefully to avoid the integer wraparound (or overflow) phenomenon [Wikipedia-2]. For example, when converting a signed integral value of -1 (represented by 32 bits) to an unsigned integral value, we get (2^{32} - 1 = 4,294,967,295).

Array element access without bounds checking is provided by operator[] for both classes. This method is commonly used, and no modifications are needed. For access with bounds checking, replace the CArray::GetAt method with vector::at. Out-of-bounds accesses with CArray::GetAt assert for Debug builds, while vector::at throws the std::out_of_range exception. Catch statements for std::out_of_range can be added in specific areas if needed, e.g., error logging.

The CArray::GetUpperBound method is problematic because it returns -1 for empty arrays. To avoid negative indexes, this functionality was removed from the codebase. Instead, vector::empty was used for empty array checks, and vector::size was used for upper bounds calculations. Statements of the form a[a.GetUpperBound()] were replaced with a.back(). Table 1 provides a signed integral return value, to be used only if absolutely necessary.

When working with indexes, it’s advisable to convert them to an unsigned integral type whenever possible. Doing so ensures that values remain non-negative and this helps avoid potential errors that can arise from mixing different types (which we explore later).

Public inheritance of CArray

The CArray class publicly inherits from the principal MFC base class, CObject, as follows:

  template <class T, class ARG = const T&>
  class CArray : public CObject

where T specifies the type of objects stored in the array, and ARG specifies the argument type used to access objects stored in the array. The base class CObject provides services such as serialization support, run-time class information, and dump diagnostics output [MS-3]. CArray and CObject have virtual destructors that permit “is-a” use cases of the form:

  template <class T, class ARG = const T&>
  class CDerived : public CArray<T, ARG>

which were encountered in the codebase. Since vector has a non-virtual destructor, it’s essential to explore alternative approaches as recommended by Scott Meyers in Effective C++ [Meyers05].

Our codebase did not use any of the CObject services mentioned above, permitting a straightforward refactor using public composition of the form:

  template <class T>
  struct CDerived {
    std::vector<T> v;
  }

Code is modified by adding a v or .v to provide array access depending on the context. Some use cases arguably provide improved readability, such as transforming:

  s += (*this)[i].GetString();

into:

  s += v[i].GetString();

The following refactor can be used when CObject services are employed:

  template <class T>
  struct CDerived : public CObject {
    std::vector<T> v;
  }

I would like to mention the StackOverflow post ‘Thou shalt not inherit from std::vector’ [StackOverflow]. It is worth reading and considering the various perspectives. There are recommendations for both public and private inheritance with caveats (no new data members), along with nuanced discussions about undefined behaviour. Public composition is a safe choice for our simple use cases [Meyers05].

Array resize

Modernizing the array resizing code involves replacing CArray::SetSize with vector::resize. From the CArray reference:

Most methods that resize a CArray object or add elements to it use memcpy_s to move elements. This is a problem because memcpy_s is not compatible with any objects that require the constructor to be called. If the items in the CArray are not compatible with memcpy_s, you must create a new CArray of the appropriate size. You must then use CArray::Copy and CArray::SetAt to populate the new array because those methods use an assignment operator instead of memcpy_s.

Conversely, when vector reallocates it first attempts to move objects by calling the object’s move constructor. If the move constructor cannot be called (as determined by the utility function std::move_if_noexcept), the copy constructor is invoked [CPP-2]. We encountered compilation errors when calling vector::resize of the form:

  error C2280: 'BlockFile::BlockFile(const 
  BlockFile &)': attempting to reference a deleted
  function

Here BlockFile’s copy constructor is intentionally deleted. Interestingly, switching to vector exposed a programming flaw in the original code. CArray::SetSize should not have been making copies of BlockFile via memcpy_s. We were able to precompute the number of BlockFile objects needed, enabling the straightforward fix:

  std::vector<BlockFile> bFile(n);

which uses BlockFile’s default constructor and maintains the deleted copy constructor.

Array length: mixed signedness issues

Modernizing the array length reporting code involves replacing CArray::GetSize with vector::size or GetSize. It is recommended to use vector::size and GetSize for unsigned and signed integral types, respectively.

Many potential pitfalls arise from mixing different integral types in arithmetic and binary operations, which can result in unexpected behaviour [Wikipedia-3]. Specifically, the unmodified codebase expects indexes of signed integral type, while vector::size returns an unsigned integral type. According to usual arithmetic conversions, operations involving different integral types are performed using a common type [CPP-3]. In the case of signed and unsigned integral types, the unsigned integral type serves as the common type.

In most cases, indexes were always greater than 0, such as in the common for loop:

  for (int i = 0; i < a.size(); i++)

Here, the subexpression (i < a.size()) works as intended, converting i to an unsigned integral type with no wraparound.

Now let’s examine some modified statements representative of real-world conditionals found in if, while, and for statements, where i is a signed integral type that can assume negative values:

bool bBelowUpper1 = (i < a.size());
bool bBelowUpper2 = (i <= a.size() - 1);
bool bAboveLimit = (a.size() > 1);
bool bInsideRange = (i >= 0 && i < a.size());
bool bOutsideRange = (i < 0 || i >= a.size());

Line 1 fails because it exhibits wraparound when i is negative. The LHS of operator< is converted to an unsigned integral type to match the RHS.

Line 2 fails with two problems. First, the subexpression (a.size() - 1) exhibits wraparound when a.size() is 0. Second, the LHS of operator<= exhibits wraparound when i is negative for the same reason as Line 1.

Line 3 appears similar to Line 1, but it’s actually fine. This is mentioned for awareness purposes, as these cases tend to look similar after a few hundred instances.

Line 4 works as expected despite wraparound when i is negative; the subexpression (i >= 0) evaluates false as intended because i is not converted to an unsigned integral type, while the subexpression (i < a.size()) evaluates false simultaneously due to wraparound. However, for positive i both subexpressions works as intended, functioning correctly for large arrays. In fact, this technique is employed in the signed integral branch of ValidAt.

Line 5 fails because the subexpression (i >= a.size()) evaluates true due to wraparound when i is negative, with operator|| propagating the error. The main point of this discussion is to be extra careful when mixing types. It’s easy to become confused with seemingly simple statements.

Switching an index’s type isn’t always straightforward. If you’re dealing with a math-focused codebase where negative and relative indexes play a significant role (e.g., in physics simulations, time series, etc.), altering the type could impact algorithmic calculations. This change might be more complex and time-consuming than initially anticipated.

For scenarios that must honour the original signed integral intent, it is recommended to use GetSize:

bool bBelowUpper3 = (i32 < GetSize(a));
bool bBelowUpper4 = (i64 < GetSize(a));

Line 6 works as expected because the shorter 32-bit signed integral type on the LHS is upconverted to 64 bits to match the RHS.

Line 7 works as expected because the LHS and RHS types match.

MFC use of CArray

MFC uses CArray in a surprisingly limited capacity. We successfully eliminated CArray from our math-focused codebase, which employs standard MFC controls for the UI. Searching the MFC include directory for CArray yields 128 hits, many of which occur in protected data areas and appear implementation-specific. Nevertheless, there are some public use cases in the following classes: CArchive, CBaseTabbedPane, CD2DGeometrySink, numerous CMFCRibbon* classes, and CTabbedPane. You might want to reconsider replacing arrays in these cases. Alternatively, conversion methods between CArray and vector are straightforward and can be reused in testing code.

Further modernizations

Iterator difference types have potential for modernization. The difference type of an iterator [CPP-4] is a contemporary alternative to std::ptrdiff_t [CPP-5], allowing negative offsets. This concept applies to iterator types with defined equality. The std::incrementable_traits struct computes a difference type for a given type, if it exists [CPP-6].

MFC offers several ready-to-use array classes, such as CByteArray, CDWordArray, CObArray, CPtrArray, CUIntArray, CWordArray, and CstringArray [MS-4]. These classes have member functions similar to CArray and should also benefit from the proposed replacement methods.

Conclusions

The article explores modernizing legacy arrays, specifically proposing practical techniques to replace the MFC container class CArray with vector. It begins with class method conversion techniques suitable for direct substitution. When direct substitutions are not feasible, standalone replacement functions are provided. The article offers refactoring guidance for various array operations, including indexing, resizing, and length reporting. It also addresses handling situations involving the public inheritance of CArray and provides a description of MFC’s use of CArray. Additionally, the article discusses working with mixed integral types (signedness) and highlights potential pitfalls with examples. Finally, the article suggests further modernizations and draws conclusions. For those dealing with MFC structures, switching to standard C++ containers like vector can simplify the codebase, improve performance and scalability, and enhance the debugging experience.

Thanks

Thank you to the anonymous reviewers for their interest and invaluable comments, which greatly improved the quality of this article.

References

[Abrahams06] David Abrahams, Jeremy Siek and Thomas Witt (2003, updated 2006) boost::iterator_facade: https://www.boost.org/doc/libs/1_85_0/libs/iterator/doc/iterator_facade.html

[CPP-1] std::vector: https://en.cppreference.com/w/cpp/container/vector

[CPP-2] std::move_if_noexcept: https://en.cppreference.com/w/cpp/utility/move_if_noexcept

[CPP-3] Usual arithmetic conversions: https://en.cppreference.com/w/cpp/language/usual_arithmetic_conversions

[CPP-4] Iterator library: https://en.cppreference.com/w/cpp/iterator

[CPP-5] std::ptrdiff_t: https://en.cppreference.com/w/cpp/types/ptrdiff_t

[CPP-6] std::incrementable_traits: https://en.cppreference.com/w/cpp/iterator/incrementable_traits

[Meyers05] Scott Meyers (2005) Effective C++: 55 specific ways to improve your programs and designs, Third Edition, Addison-Wesley Professional.

[MS-1] CArray Class: https://learn.microsoft.com/en-us/cpp/mfc/reference/carray-class

[MS-2] Data Type Ranges: https://learn.microsoft.com/en-us/cpp/cpp/data-type-ranges

[MS-3] CObject Class: https://learn.microsoft.com/en-us/cpp/mfc/reference/cobject-class

[MS-4] Ready-to-Use Array Classes: https://learn.microsoft.com/en-us/cpp/mfc/ready-to-use-array-classes

[StackOverflow] ‘Thou shalt not inherit from std::vector’: https://stackoverflow.com/questions/4353203/thou-shalt-not-inherit-from-stdvector

[Wikipedia-1] Microsoft Foundation Class Library: https://en.wikipedia.org/wiki/Microsoft_Foundation_Class_Library

[Wikipedia-2] Integer overflow: https://en.wikipedia.org/wiki/Integer_overflow

[Wikipedia-3] Signedness: https://en.wikipedia.org/wiki/Signedness

Stuart Bergen Stuart is a software developer with a background in geophysics, finance, and communications systems. He has a PhD specializing in signal processing, and enjoys camping and skiing in the Canadian Rockies. Stuart lives in Calgary, Canada.