Hi Francis,
I have a question that you might be interested in, or at least have an opinion on, concerning function objects. I've elaborated a little so that if might make more sense to readers of different levels if you put it in C Vu.
I am fairly certain that this level of C++ belongs in Overload. So I have added some thoughts of mine and hope that readers will feel able to comment (in my opinion, readers should be adding their contribution, without that there is very little to motivate people like Richard to take the time to share their speculations) Francis
It seems to me that the traditional method of using function objects is to define a templatised processing function that accepts any type of function object as a parameter. The function objects cannot be passed in directly by value, because that might cause them to be sliced, and also would cause them to be used as whatever the compile-time type of the parameter was declared as. The use of template parameters ensures that the actual object type makes it all the way in to the implementation of the processing function.
In other words, define a template processing function to use whatever type of function object is passed in:
// This function processes input based on op, // and returns the result. "data" is some // data type. template<typename Operation> data process(const data &input, Operation op);
Note that this is how the sort method is defined in my version of the library:
template<class RanIt, class Pred> void sort(RanIt first, RanIt last, Pred pr);
The predicate function object, pr , is passed by templatised value. The template function merely assumes that pr supports an appropriate operator()(...) call, and uses it. In the case of sort, I believe it assumes that pr supports an operator() method with the following signature:
// Compare the left and right objects and // return true if you want "left < right" bool operator()(const T &left, const T &right) const;
Now, in my current project I could potentially have an awful lot of different function objects. It is also possible that the processing function itself might not be completely trivial but may have some substantial body to it. The problem with using templates as shown above is that for every function object that is used, a new copy of the process() method may well be instantiated, which is able to cope with the particular function object, whatever its size, and which will call the correct operator()(...) method. I wonder whether you can think of any significant arguments against the following alternative scheme:
Define a base class for all the function objects with a (pure) virtual operator()(...) method. Then use pass by reference (or pointer) in the process() function to ensure that the appropriate function object method is called at runtime.
// Define base class with pure virtual // operator() method. I derive from the // standard library "unary_function" template // class here for consistency. class operation : public std::unary_function<data, data> { public: // This is my overloaded operator() method virtual data operator()(const data &input) const = 0; }; // I would now derive all my real operations // from the above class, and define the // virtual operator() method for each. // The process() method is no longer a // template, but uses polymorphism to ensure // operator() is called on the correct // function object. data process(const data &input, const operation &op);
I guess you get an extra pointer dereference to make the virtual call, but I am not worried about that level of detail at this stage of development. It would seem that this second method should remove any chance of code bloat from multiple instantiations of the processing function, because the function is now no longer templatised and so you'll just get the one copy. This may be a completely obvious thing to do, but I have only ever seen the previous template solution being used and so I am suffering from shock from the unfamiliar. It seems to work OK in my prototype, however!
I can see why the template method was chosen for the standard library functions. You can choose to go the second route if you wish to, because you have to instantiate even the first copy of sort<>() in your code. Avoidance of the overhead of virtual functions in the standard (template) library was for the best.
Cheers, Richard
Francis Glassborow Comments…
I am sure that Richard is an experienced user of function objects (functoids) but some of the above is likely to confuse less experienced programmers. So let me take a little time to try to clarify several issues.
First of all those STL functions that take a predicate simply require something that will be meaningful if a pair of parentheses is added (possible containing some arguments that depend on the other parameters. There are two kinds of C++ entity that can meet this need. The simpler one is a simply the address of an ordinary function. So, purely as an example:
class Rational { // suitable interface, including a // compare function > }; int compareRational (Rational const & lhs, Rational const & rhs) { return lhs.(rhs); } // I know there is an STL template to do this // but I do not want to complicate the issue void initRational(Rational &) { // initialises a default Rational somehow } void printRational(Rational const &); int main (){ Rational array[10]; for_each(array, array+10, initRational); sort(array, array+10, compareRational); for_each(array, array+10, printRational); };
In the above the two calls to for_each and the call of sort relies on passing the address of a global function as the third argument. In the context of the code that is perfectly sensible. However it is sometimes necessary to provide extra data at call time. In other words the process you want to run relies on data that would not be provided by the template parameters. For example you might want to select an output stream for printing. In such circumstances we need a way to wrap up that extra information. The way this is done is to have a functoid whose constructor can tuck the required data away as member data. Something along these lines:
class printRational { ostream & out_m; public: printRational(ostream & out = cout) : out_m(out){}; operator () (Rational const &); };
Now I can write:
for_each(array, array+10, printRational() );
Note that extra set of parentheses. They are there because a temporary (unnamed) printRational object has to be created - using the default constructor this time. When the temporary is bound to the pred parameter of for_each it works like a function call because writing pred(<specific Rational object>) makes perfect sense to the compiler. However functoids must be constructed. The idea of using functoids antedates templates by many years. The manipulators that you use with iostream objects are functoids (well mostly, though those that take no parameters are plain functions). The concept that functoids are specifically for template use is seriously faulty. Another place they can be used is in implementing the visitor pattern. Indeed anytime you need to pass a function like object but one that needs extra data bound to it at call time a functoid is going to be a prime candidate. They are a very powerful tool in the hands of the experienced C++ programmer and if you do not have a sound understanding of them you need to mark that as a priority area in which to develop your skills.
Now let me turn to the specific problem of functoids used in the context of templates. They are certainly an area in which code bloat is a potential problem. I say potential because in many cases this does not happen. A good compiler should be able to expand for_each inline to produce code as efficient as that produced by hand.
There is a general guideline with templates, always place code that is independent of the template parameters in its own non-template class. This class maybe used as a base for the template or an instance maybe a data member of the template. However, as with all optimisation, programmers must be very sure they understand what they are doing.
Richard's suggestion has several penalties. First it requires the predicates be functoids and not pure functions. Then there is the issue of bloat because of virtual inheritance. Every one of Richard's derived functoids will carry around a virtual function table. In addition there is the cost of writing and maintaining the template specialisations that will use his functoid hierarchies. There are also the overheads involved in dynamic binding to use his derived functoids.
I suspect, but do not know, that the combination of the above will produce similar code bloat. I think that the programming effort might be better placed ensuring that good template code is written with all independent code placed elsewhere.
I am sure we would all welcome further comments on this. I know I would.
Francis Glassborow