Many codebases still use mature libraries, such as Microsoft Foundation Classes. Stuart Bergen explains how and why he moved to using modern standard C++ tools instead.
Our scientific desktop application relies on the Microsoft Foundation Class (MFC) library for both user interface (UI) and computational tasks. MFC, introduced in 1992, encapsulates portions of the Windows API within object-oriented C++ classes and is currently maintained [Wikipedia-1]. Despite its age, our users enjoy the interactive, straightforward and visually appealing experience that MFC provides. As a result, there has been little incentive to replace complex UI-related code with alternatives.
Computational tasks using the MFC library show signs of aging compared to standard C++ containers. The primary MFC container class, CArray
, closely resembles vector
in functionality [MS-1], [CPP-1]. Both offer C-like arrays that dynamically grow and shrink, reserve contiguous space on the heap, and use zero-based indexing. However, a key difference is that CArray
lacks iterator support. We previously used boost::iterator_facade
[Abrahams06] to supplement basic iterator support, but this only goes so far. The codebase was updated to prepare for future development and embrace modern practices, including the adoption of C++20 ranges. This effort resulted in simpler, more familiar code, performance and scalability improvements, and an enhanced debugging experience.
In this article, methods to replace the MFC container class CArray
with vector
are proposed. These practical techniques emerged from a real-world refactoring effort on a commercial software project. First, we present class method conversion techniques suitable for direct substitution. In cases where direct substitutions are not feasible, standalone replacement functions are offered. Next, we consider the array index, public inheritance of the CArray
class, array resize, array length, and MFC use of CArray
. Finally, we suggest further modernizations and draw conclusions.
Class method conversions
The modernization effort mainly focused on replacing CArray
class methods and operators with their vector
equivalents. Table 1 lists vector
replacements for the public CArray
class interface.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
vector replacements for the public CArray class interface |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table 1 |
The table served as a helpful reference during the refactoring effort. Template arguments are omitted for brevity. Notation: a
and b
represent CArray
/vector
objects, i
represents an index, n
and m
represent array lengths or element counts, val
represents an array value, and italicized text represents standalone replacement functions. Full method signatures can be found in the CArray
and vector
references [MS-1], [CPP-1]. The CArray
class possesses only a default constructor; its copy constructor and copy assignment operator are deleted.
Standalone replacement functions in Listing 1 (C++17 compiler required) serve as direct replacements when no vector
equivalent is available. Function ValidAt
validates an index of any integral type (signed or unsigned). All other standalone functions employ unsigned integral types for indexing, array lengths, and element counts to avoid confusion. If necessary, these functions could be modified to support signed integral types similarly to ValidAt
. The RemoveAt
function does not validate or assert the array index or element count, so use it with caution. Conversely, functions InsertAt
and SetAtGrow
perform index validation removing the need for assertion checks.
template <typename T, typename I> bool ValidAt(const std::vector<T>& a, I i) { if constexpr (std::is_unsigned_v<I>) return i < a.size(); else if constexpr (std::is_signed_v<I>) return i >= 0 && i < a.size(); else return false; } template <typename T> auto Append(std::vector<T>& a, const std::vector<T>& b) { return a.insert(a.end(), b.begin(), b.end()); } template <typename T> auto InsertAt(std::vector<T>& a, std::size_t i, const T& val) { if (ValidAt(a, i)) return a.insert(a.begin() + i, val); a.resize(i + 1); a.at(i) = val; return a.begin() + i; } template <typename T> auto SetAtGrow(std::vector<T>& a, std::size_t i, const T& val) { if (!ValidAt(a, i)) a.resize(i + 1); a.at(i) = val; return a.begin() + i; } template <typename T> auto RemoveAt(std::vector<T>& a, std::size_t i, std::size_t n = 1) { return a.erase(a.begin() + i, a.begin() + i + n); } template <typename T> void SetSize(std::vector<T>& a, std::size_t n, std::size_t m) { a.reserve(m); a.resize(n); } template <typename T> __int64 GetSize(const std::vector<T>& a) { return static_cast<__int64>(a.size()); } |
Listing 1 |
Standalone functions return iterators corresponding to their vector
methods. CArray
methods return void
, except for CArray::Append
, which returns an index of signed integral type to the first appended element. Function Append
returns an iterator pointing to the first appended element.
The GetSize
function casts the return value of vector::size
to a 64-bit signed integral value. Its primary use case is for arithmetic operations involving signed integral types. The assumption is that the vector length will not exceed std::numeric_limits<__int64>::max()
, which equals LLONG_MAX or 9,223,372,036,854,775,807. This is a safe assumption for our desktop application.
Array index
Both container classes assume that indexes are valid (within bounds). CArray
utilizes a signed integral type for indexes, while vector
uses an unsigned integral type. For our 64-bit builds, this equates to the types long long
(or __int64
on Windows) and vector::size_type
/std::size_t
(or unsigned __int64
on Windows), respectively.
Index refactoring involves replacing the signed integral type __int64
with the unsigned integral type unsigned __int64
. The value ranges for these types are [-9.223E18, 9.223E18] and [0, 18.446E18], respectively [MS-2]. The unsigned type can safely represent signed integral values that are non-negative. However, it’s essential to handle values less than 0 carefully to avoid the integer wraparound (or overflow) phenomenon [Wikipedia-2]. For example, when converting a signed integral value of -1 (represented by 32 bits) to an unsigned integral value, we get (2^{32} - 1 = 4,294,967,295).
Array element access without bounds checking is provided by operator[]
for both classes. This method is commonly used, and no modifications are needed. For access with bounds checking, replace the CArray::GetAt
method with vector::at
. Out-of-bounds accesses with CArray::GetAt
assert for Debug builds, while vector::at
throws the std::out_of_range
exception. Catch statements for std::out_of_range
can be added in specific areas if needed, e.g., error logging.
The CArray::GetUpperBound
method is problematic because it returns -1 for empty arrays. To avoid negative indexes, this functionality was removed from the codebase. Instead, vector::empty
was used for empty array checks, and vector::size
was used for upper bounds calculations. Statements of the form a[a.GetUpperBound()]
were replaced with a.back()
. Table 1 provides a signed integral return value, to be used only if absolutely necessary.
When working with indexes, it’s advisable to convert them to an unsigned integral type whenever possible. Doing so ensures that values remain non-negative and this helps avoid potential errors that can arise from mixing different types (which we explore later).
Public inheritance of CArray
The CArray
class publicly inherits from the principal MFC base class, CObject
, as follows:
template <class T, class ARG = const T&> class CArray : public CObject
where T
specifies the type of objects stored in the array, and ARG
specifies the argument type used to access objects stored in the array. The base class CObject
provides services such as serialization support, run-time class information, and dump diagnostics output [MS-3]. CArray
and CObject
have virtual destructors that permit “is-a” use cases of the form:
template <class T, class ARG = const T&> class CDerived : public CArray<T, ARG>
which were encountered in the codebase. Since vector
has a non-virtual destructor, it’s essential to explore alternative approaches as recommended by Scott Meyers in Effective C++ [Meyers05].
Our codebase did not use any of the CObject
services mentioned above, permitting a straightforward refactor using public composition of the form:
template <class T> struct CDerived { std::vector<T> v; }
Code is modified by adding a v
or .v
to provide array access depending on the context. Some use cases arguably provide improved readability, such as transforming:
s += (*this)[i].GetString();
into:
s += v[i].GetString();
The following refactor can be used when CObject
services are employed:
template <class T> struct CDerived : public CObject { std::vector<T> v; }
I would like to mention the StackOverflow post ‘Thou shalt not inherit from std::vector
’ [StackOverflow]. It is worth reading and considering the various perspectives. There are recommendations for both public and private inheritance with caveats (no new data members), along with nuanced discussions about undefined behaviour. Public composition is a safe choice for our simple use cases [Meyers05].
Array resize
Modernizing the array resizing code involves replacing CArray::SetSize
with vector::resize
. From the CArray
reference:
Most methods that resize a CArray
object or add elements to it use memcpy_s
to move elements. This is a problem because memcpy_s
is not compatible with any objects that require the constructor to be called. If the items in the CArray
are not compatible with memcpy_s
, you must create a new CArray
of the appropriate size. You must then use CArray::Copy and CArray::SetAt
to populate the new array because those methods use an assignment operator instead of memcpy_s
.
Conversely, when vector
reallocates it first attempts to move objects by calling the object’s move constructor. If the move constructor cannot be called (as determined by the utility function std::move_if_noexcept
), the copy constructor is invoked [CPP-2]. We encountered compilation errors when calling vector::resize
of the form:
error C2280: 'BlockFile::BlockFile(const BlockFile &)': attempting to reference a deleted function
Here BlockFile
’s copy constructor is intentionally deleted. Interestingly, switching to vector
exposed a programming flaw in the original code. CArray::SetSize
should not have been making copies of BlockFile
via memcpy_s
. We were able to precompute the number of BlockFile
objects needed, enabling the straightforward fix:
std::vector<BlockFile> bFile(n);
which uses BlockFile
’s default constructor and maintains the deleted copy constructor.
Array length: mixed signedness issues
Modernizing the array length reporting code involves replacing CArray::GetSize
with vector::size
or GetSize
. It is recommended to use vector::size
and GetSize
for unsigned and signed integral types, respectively.
Many potential pitfalls arise from mixing different integral types in arithmetic and binary operations, which can result in unexpected behaviour [Wikipedia-3]. Specifically, the unmodified codebase expects indexes of signed integral type, while vector::size
returns an unsigned integral type. According to usual arithmetic conversions, operations involving different integral types are performed using a common type [CPP-3]. In the case of signed and unsigned integral types, the unsigned integral type serves as the common type.
In most cases, indexes were always greater than 0, such as in the common for
loop:
for (int i = 0; i < a.size(); i++)
Here, the subexpression (i < a.size()
) works as intended, converting i
to an unsigned integral type with no wraparound.
Now let’s examine some modified statements representative of real-world conditionals found in if
, while
, and for
statements, where i
is a signed integral type that can assume negative values:
bool bBelowUpper1 = (i < a.size());
bool bBelowUpper2 = (i <= a.size() - 1);
bool bAboveLimit = (a.size() > 1);
bool bInsideRange = (i >= 0 && i < a.size());
bool bOutsideRange = (i < 0 || i >= a.size());
Line 1 fails because it exhibits wraparound when i
is negative. The LHS of operator<
is converted to an unsigned integral type to match the RHS.
Line 2 fails with two problems. First, the subexpression (a.size() - 1
) exhibits wraparound when a.size()
is 0. Second, the LHS of operator<=
exhibits wraparound when i
is negative for the same reason as Line 1.
Line 3 appears similar to Line 1, but it’s actually fine. This is mentioned for awareness purposes, as these cases tend to look similar after a few hundred instances.
Line 4 works as expected despite wraparound when i
is negative; the subexpression (i >= 0
) evaluates false as intended because i
is not converted to an unsigned integral type, while the subexpression (i < a.size()
) evaluates false simultaneously due to wraparound. However, for positive i
both subexpressions works as intended, functioning correctly for large arrays. In fact, this technique is employed in the signed integral branch of ValidAt
.
Line 5 fails because the subexpression (i >= a.size()
) evaluates true due to wraparound when i
is negative, with operator||
propagating the error. The main point of this discussion is to be extra careful when mixing types. It’s easy to become confused with seemingly simple statements.
Switching an index’s type isn’t always straightforward. If you’re dealing with a math-focused codebase where negative and relative indexes play a significant role (e.g., in physics simulations, time series, etc.), altering the type could impact algorithmic calculations. This change might be more complex and time-consuming than initially anticipated.
For scenarios that must honour the original signed integral intent, it is recommended to use GetSize
:
bool bBelowUpper3 = (i32 < GetSize(a));
bool bBelowUpper4 = (i64 < GetSize(a));
Line 6 works as expected because the shorter 32-bit signed integral type on the LHS is upconverted to 64 bits to match the RHS.
Line 7 works as expected because the LHS and RHS types match.
MFC use of CArray
MFC uses CArray
in a surprisingly limited capacity. We successfully eliminated CArray
from our math-focused codebase, which employs standard MFC controls for the UI. Searching the MFC include directory for CArray
yields 128 hits, many of which occur in protected data areas and appear implementation-specific. Nevertheless, there are some public use cases in the following classes: CArchive
, CBaseTabbedPane
, CD2DGeometrySink
, numerous CMFCRibbon*
classes, and CTabbedPane
. You might want to reconsider replacing arrays in these cases. Alternatively, conversion methods between CArray
and vector
are straightforward and can be reused in testing code.
Further modernizations
Iterator difference types have potential for modernization. The difference type of an iterator [CPP-4] is a contemporary alternative to std::ptrdiff_t
[CPP-5], allowing negative offsets. This concept applies to iterator types with defined equality. The std::incrementable_traits
struct computes a difference type for a given type, if it exists [CPP-6].
MFC offers several ready-to-use array classes, such as CByteArray
, CDWordArray
, CObArray
, CPtrArray
, CUIntArray
, CWordArray
, and CstringArray
[MS-4]. These classes have member functions similar to CArray
and should also benefit from the proposed replacement methods.
Conclusions
The article explores modernizing legacy arrays, specifically proposing practical techniques to replace the MFC container class CArray
with vector
. It begins with class method conversion techniques suitable for direct substitution. When direct substitutions are not feasible, standalone replacement functions are provided. The article offers refactoring guidance for various array operations, including indexing, resizing, and length reporting. It also addresses handling situations involving the public inheritance of CArray
and provides a description of MFC’s use of CArray
. Additionally, the article discusses working with mixed integral types (signedness) and highlights potential pitfalls with examples. Finally, the article suggests further modernizations and draws conclusions. For those dealing with MFC structures, switching to standard C++ containers like vector
can simplify the codebase, improve performance and scalability, and enhance the debugging experience.
Thanks
Thank you to the anonymous reviewers for their interest and invaluable comments, which greatly improved the quality of this article.
References
[Abrahams06] David Abrahams, Jeremy Siek and Thomas Witt (2003, updated 2006) boost::iterator_facade
: https://www.boost.org/doc/libs/1_85_0/libs/iterator/doc/iterator_facade.html
[CPP-1] std::vector
: https://en.cppreference.com/w/cpp/container/vector
[CPP-2] std::move_if_noexcept
: https://en.cppreference.com/w/cpp/utility/move_if_noexcept
[CPP-3] Usual arithmetic conversions: https://en.cppreference.com/w/cpp/language/usual_arithmetic_conversions
[CPP-4] Iterator library: https://en.cppreference.com/w/cpp/iterator
[CPP-5] std::ptrdiff_t
: https://en.cppreference.com/w/cpp/types/ptrdiff_t
[CPP-6] std::incrementable_traits
: https://en.cppreference.com/w/cpp/iterator/incrementable_traits
[Meyers05] Scott Meyers (2005) Effective C++: 55 specific ways to improve your programs and designs, Third Edition, Addison-Wesley Professional.
[MS-1] CArray Class: https://learn.microsoft.com/en-us/cpp/mfc/reference/carray-class
[MS-2] Data Type Ranges: https://learn.microsoft.com/en-us/cpp/cpp/data-type-ranges
[MS-3] CObject Class: https://learn.microsoft.com/en-us/cpp/mfc/reference/cobject-class
[MS-4] Ready-to-Use Array Classes: https://learn.microsoft.com/en-us/cpp/mfc/ready-to-use-array-classes
[StackOverflow] ‘Thou shalt not inherit from std::vector
’: https://stackoverflow.com/questions/4353203/thou-shalt-not-inherit-from-stdvector
[Wikipedia-1] Microsoft Foundation Class Library: https://en.wikipedia.org/wiki/Microsoft_Foundation_Class_Library
[Wikipedia-2] Integer overflow: https://en.wikipedia.org/wiki/Integer_overflow
[Wikipedia-3] Signedness: https://en.wikipedia.org/wiki/Signedness
Stuart is a software developer with a background in geophysics, finance, and communications systems. He has a PhD specializing in signal processing, and enjoys camping and skiing in the Canadian Rockies. Stuart lives in Calgary, Canada.