pinA More Flexible Container

Overload Journal #58 - Dec 2003 + Programming Topics   Author: Rich Sposato

The Standard Set

Suppose I want to store pointers to some Employee records in a set, and then order that set by Employee name. That is easy; just make a functor that compares two Employee names, and apply it to a set. The snippet below shows how. The example stores bare pointers, but smart pointers are often a better choice for storing in containers.

class Employee {
public:
  const std::string& GetName() const;
  // ... other functions
};

struct CompareEmployees {
  inline bool operator()(const Employee* l,
                         const Employee* r) const {
    return (l->GetName() < r->GetName());
  }
};

typedef std::set<Employee*, CompareEmployees> EmployeeSet;
EmployeeSet employees;

Now, suppose I want to find a certain Employee record that matches any given name. Well, to do that, I have to pass in an Employee pointer with the properties I am looking for. This is because the set functions that search - find, count, lower_bound, upper_bound, equal_range, and erase - require a reference to the same type as the key of the set. These functions, except erase, are commonly called the "Special Set Operations". Using the example above, I have to pass in a pointer to a bogus Employee record. The bogus object only exists as a comparison value for finding the real record. Assuming an Employee object can be constructed using just a name, then the code looks something like this:

Employee* FindEmployee(const std::string& name) {
  Employee bogus(name);
  EmployeeSet::iterator it = employees.find(&bogus);
  return (employees.end() == it) ? 0 : *it;
}

But, what if I can't make an Employee object so easily? Perhaps no Employee constructor accepts just a name. Or maybe constructing any Employee object is so expensive that making a temporary on the stack is not worth the effort. Why is it necessary to make the bogus Employee record anyway? It would be so much simpler to just pass in the name itself to the set::find function and have it return the iterator to the target object.

Template Member Functions

The std::set's search functions typically require a reference to the same type as the set's key. They look like this:

template<class Key,class Compare,class Alloc>
class set {
  // ... other parts of set class
public:
  template<class Key>
    size_type erase(const Key& x);
};

Passing any type into these functions is possible only if the functions themselves are templates. These functions are not templated in the STL's associative containers - and would cause code bloat if they were, which I shall discuss later. The next code snippet shows the declarations of these functions as if they are template member functions. Notice that the Compare_Type used for each templated function is not among the templates for the class itself. As long as the Compare_Type is comparable to the Key type, it can be used by the functor. For completeness, listed below are both the const and non-const versions. I chose the unimaginative name of flex_set for the container class. Other than the name, and templated member functions, it behaves just like std::set.

template<class Key,class Compare,class Alloc>
class flex_set {
  // ... other parts of flex_set class
public:
  template<class Compare_Type>
    size_type erase(const Compare_Type& x);

  template<class Compare_Type>
    size_type count(const Compare_Type& x) const;

  template<class Compare_Type>
    iterator find(const Compare_Type& x);

  template<class Compare_Type>
    const_iterator find(
      const Compare_Type& x) const;

  template<class Compare_Type
    iterator lower_bound(
      const Compare_Type& x);

  template<class Compare_Type>
    const_iterator lower_bound(
      const Compare_Type& x) const;

  template<class Compare_Type>
    iterator upper_bound(
      const Compare_Type& x);

  template<class Compare_Type>
    const_iterator upper_bound(
      const Compare_Type& x) const;

  template<class Compare_Type>
    pair<iterator,iterator> equal_range(
      const Compare_Type& x);

  template<class Compare_Type>
    pair<const_iterator,const_iterator>
      equal_range(const Compare_Type& x) const;
};

This looks good so far, but how will these template member functions know how to compare some arbitrary type to the set's key type? The answer to that lies within the comparison functor, or comparator, used to order the set's elements. The comparator is overloaded to compare an Employee record to a const std::string. There are overloads so the name can be on the right or left side for symmetric comparisons. (Indeed, some member functions of flex_set require both.) The result is shown below along with a more efficient FindEmployee function.

std::binary_function<const Employee*,
                     const Employee*,
                     bool> {
  inline bool operator()(const Employee* l,
                         const Employee* r) const {
    return (l->GetName() < r->GetName());
  }

  inline bool operator()(const Employee* l,
                         const std::string& r) const {
    return (l->GetName() < r );
  }

  inline bool operator()(const std::string& l,
                         const Employee* r) const {
    return ( l < r->GetName() );
  }
};

Employee* FindEmployee(const std::string& name) {
  EmployeeSet::iterator
               it(employees.find(name));
    return (employees.end() == it) ? 0 : *it;
}

Okay, this is much better. No need to make a bogus Employee object just to find the real Employee. Just pass in the employee name, and the templated flex_set::find function returns the proper iterator.

For every possible type that can search through the container, simply add that type to the comparator and the compiler automatically makes the templated search functions. Using that idea, the example above can be further overloaded to compare a C-style string as shown here. I typically need only one additional type in the functor besides the key type, so here I would have chosen either std::string or const char *, but probably not both. (Admittedly, having a C-style string overload means I do not have to convert a const char * to std::string before doing the search, so one less object to construct.) Now that I can search using a type other than the key type of the set, I often use the key type only when inserting, and so the Employee-to-Employee comparison in the functor is only used for inserting.

struct CompareEmployees :
  std::binary_function<const Employee*,
                       const Employee*,
                       bool> {
  // rest of functor as shown above.

  inline bool operator()(const Employee* l,
                         const char* r) const {
    return (l->GetName() < r );
  }

  inline bool operator()(const char* l,
                         const Employee* r) const {
    return (l < r->GetName());
  }
};

An Alternate Solution

There is another method for searching through associative containers using types other than the key type. A bunch of wrappers, like that shown below, can be stored inside a std::set, and allow us to do searches by name.

struct EmployeeWrap {
  EmployeeWrap(Employee*);                // not explicit
  EmployeeWrap(const char*);              // not explicit
  EmployeeWrap(const std:string&);        // not explicit

  // Either of these is used, but not both.
  Employee* m_Employee;
  const std::string m_name;
  inline const std::string& GetName(void) const {
    return (0 == m_Employee) ? m_name : m_Employee->GetName();
  }

  bool operator<(const EmployeeWrap& l,
                 const EmployeeWrap& r);
};

  typedef std::set<EmployeeWrap> EmployeeWrapperSet;

  Employee* FindEmployee(const std::string& name) {
    EmployeeWrapperSet::iterator
                        it(employeeWrappers.find(name));
    return (employeeWrappers.end() == it)
           ? 0 : (*it).m_Employee;
}

This wrapper has several disadvantages. One is that it requires a special wrapper class to hold all the types needed for comparing. To compare Employee records with additional types, those types would have to be added to the wrapper, instead of just overloading the functor as needed for a flex_set. The example above has 2 elements, and the std::string has to be instantiated even if it is never used. The sizeof(std::string) varies from 4 bytes to 28 bytes depending upon the implementation. This increases the overall memory consumed by std::set even though all instances of EmployeeWrap within the set will not use the std::string. Another disadvantage is that the code above constructs an unnamed temporary of EmployeeWrap to pass into the std::set::find. This construction cost is small, and most of the cost can be optimized away. Lastly, EmployeeWrap::GetName imposes a small runtime cost to determine which data member has the name - a cost not required by the flex_set method. The argument against the wrapper method becomes: "If flex_set is available, then why pay for several costs that are not needed?"

Other Associative Containers

Another alternate solution is, "Why not just use map instead of set?" The Employee container will be:

std::map<std::string,Employee*> employees

There are two answers for that, one is complicated, and the other is more complicated. The complicated answer is that although that provides the desired ordering, it also requires storing a copy of the Employee name as a key for the map. (The EmployeeWrap example above also stores one std::string with an Employee pointer, but std::map actually uses each std::string.) An intuitive reason to avoid this practice is that the key, Employee name, is part of the Employee object, and it does not makes sense to store the key separately. A more practical reason is that storing the name separately is inefficient. Another practical reason is that someday, an Employee name will change, but the copy won't. That copy is the key within a map, and people will be reluctant to change the key in a map, but may not know that a property of the value is the key, and so change name of the Employee without updating the container.

The more complicated answer is that std::map cannot provide the same flexible search capabilities as those afforded by flex_set. To provide that flexibility for the other associative containers, flex_map, flex_multiset, and flex_multimap are needed, which are based upon same idea. Many STL implementations use the same underlying implementation for all four associative containers. By changing the implementation to make any associative container more flexible, the other three also become more flexible with little extra effort.

Using the templated flex_map::find member function allows for searches in a map of Employees to Assignments using just the name. This code shows a flex_map of Employees to their current Assignments.

typedef flex_map<Employee*,Assignment*,
                 CompareEmployees> EmployeeAssignments;
EmployeeAssignments employeeAssignments;

Assignment* GetEmployeeCurrentTask(
            const std::string& employeeName) {
  EmployeeAssignments::iterator it(
                       employeeAssignments.find(employeeName));
    return (employeeAssignments.end() == it)
           ? 0 : it->second;
}

If std::map were used, then something similar to this is required.

typedef std::map<Employee*,Assignment*,
                 CompareEmployees> EmployeeAssignments;
EmployeeAssignments employeeAssignments;

Assignment* GetEmployeeCurrentTask(
            const std::string& employeeName) {
  Employee bogus(employeeName);
  mployeeAssignments::iterator it(
                      employeeAssignments.find(&bogus));
    return (employeeAssignments.end() == it)
           ? 0 : it->second;
}

Considerations

CompareEmployees inherits from binary_function for use with adapters such as not2, bind1st, and bind2nd. The adapters will only work with the functor's function that has the types specified as templates for binary_function and ignore the overloaded functions. If an adaptable functor is needed that accepts some other type, the simple solution is to make such a functor. The code below shows a functor that derives from binary_function and can be used with std::string. Assuming that the hourlyEmployees container is sorted by name, the find_if call will locate the first Employee with a name less than the given name. One of the costs of the increased container flexibility is that more functors are needed. But, these additional functors are often needed for other purposes and searches on other containers anyway, such as vector.

struct CompareEmployeeToName :
std::binary_function<const Employee*,
                     const std::string&, bool> {
  inline bool operator()(const Employee* l,
                         const std::string& r) const {
    return (l->GetName() < r);
  }
};

typedef std::vector<Employee*> EmployeeVector;

EmployeeVector hourlyEmployees;
// Populate vector ...
EmployeeVector::iterator it(
  find_if(hourlyEmployees.begin(),
          hourlyEmployees.end(),
          bind2nd(CompareEmployeeToName(),
                  name)));

Another consideration is code bloat. Many C++ developers complain that templates cause code bloat, and making template functions out of otherwise normal functions allows for bloat. Instead of having just one flex_set::find function, there can now be several. (My own experience is that I rarely use more than one type of a function, so I am not paying for more than one version of flex_set::find anyway.) Fortunately, most search functions in the associative containers are small and inline, so code bloat will be minimal for them. Unfortunately, some functions in the underlying implementation are not as small, but still manageable. Still, the more flexible search abilities are worth a little extra bloat because temporary objects are no longer created. Since the temporary object is no longer needed, the cost of constructing and destroying it goes away, which may actually shrink the overall code size. Whether the flexible searching is worth the little extra bloat is a decision that must be made for each type and container.

The flex_set is not the ideal container for primitive types. A container of type flex_set<long> allows both flex_set::erase(const long& x) and flex_set::erase(const short& x). The compiler created both functions instead of promoting the short to a long and using only one function. This kind of code bloat can easily be avoided by using std::set<long> instead. A corollary to this is that member functions in std::set should not be templated. (Some simple experiments with changing std::set convinced me that making a separate flex_set class was a better solution.) The flex_set is useful for containers that store objects or store pointers to objects, but I prefer std::set for containers that store primitives.

Instead of making another container class, perhaps std::set itself can be extended by adding additional functions that use predicates. The algorithm functions std::find_if and std::count_if are similar to std::find and std::count except that they use a predicate instead of a value. Would a putative std::set::find_if member function that accepts a predicate work? Not really, because the predicate needs to compare an element to a value, and so std::set::find_if will need to receive both the value and the predicate. Which means the compare value passed into std::set::find_if would have to be constructed. This just reintroduces the original problem of constructing a temporary. Nor could adding predicates to the erase, lower_bound, upper_bound, equal_range, and binary_search member functions of std::set provide any greater efficiency than what flex_set already provides for these functions. Using a unary predicate which stores the compare value does not work either, since unary predicates cannot change the order for the compare and key values, and the predicate must be able to use the compare value on both the left and right side.

Could any other member functions of flex_set be templatized in the same way? Only one other function accepts a reference to a const key type as a parameter, and that function is insert. Inserting anything but a key value is meaningless, so the answer is negative. The special set operations, and one of the erase functions, are the only candidates for becoming templates.

Summary

More flexible variations of the four associative containers are possible by changing the search functions into template member functions. Each data type used as a parameter to these functions requires overloading the comparator to compare these data types to the key value of the container's elements. A caveat that goes with the overloading by type is that each type has to be comparable to the element type. The flexible containers have a beneficial side effect of resulting in more efficient code because named temporary variables are no longer necessary. Nothing is gained by changing std::set or the other STL associative containers. Changing member functions into templated functions causes bloat for containers of primitive types, and passing predicates as parameters into std::set functions does not work.

You may use the source code provided at: http://www.richsposato.com/software.html as a replacement for the associative containers provided with the GCC compiler.

Acknowledgements

Many thanks to Don Organ, Gerald Chan, Phil Bass, Alan Griffiths, Juan Carlos Aguilar, and Cherryl Smith for their feedback and insightful comments.

Overload Journal #58 - Dec 2003 + Programming Topics