Can you form a new contain from two others? Vladimir Grigoriev reminds us how to write an iterator.
The article describes a simple and useful iterator adapter
iterator_pair
, provides its implementation and shows some use cases of the iterator. It argues that because standard iterator adapters hide the property value type of the underlying containers and objects, they make it difficult to write safe and generic code. The article points to a potention surprise with the standard algorithms
std::partial_sum
and
std::adjacent_difference
, and offers a way to remedy it.
Readers may also be interested to know that there is an iterator named
zip_iterator
in the boost libraries that is also based on the idea of combining several iterators as one iterator. It models Readable iterator as described in the documentation on the iterator [
Boost
]. Nevertheless
zip_iterator
and
iterator_pair
are different iterator adapters, with different implementations and their own usages.
Let’s help a student
In a forum for beginners I encountered the following assignment.
Let’s assume that there are two arrays of integers, A and B, with equal sizes. Form third array C with the same size elements of which will be set to the minimum of values of corresponding elements of the first two arrays.
It is obvious that the task of a beginner is to demonstrate his skill in managing loops.
Nevertheless the assignment can be easy done by means of the standard algorithm
std::transform
. Listing 1 is a possible solution of the assignment using
std::transform
.
#include <iostream> #include <algorithm> #include <iterator> #include <cstdlib> #include <ctime> int main() { const size_t N = 20; int a[N], b[N], c[N]; std::srand( ( unsigned int ) std::time( nullptr ) ); std::generate( std::begin( a ), std::end( a ), [] { return ( std::rand() % ( N / 2 ) ); } ); std::cout << "A: "; for ( int x : a ) std::cout << x << ' '; std::cout << std::endl; std::generate( std::begin( b ), std::end( b ), [] { return ( std::rand() % ( N / 2 ) ); } ); std::cout << "B: "; for ( int x : b ) std::cout << x << ' '; std::cout << std::endl; std::transform( std::begin( a ), std::end( a ), std::begin( b ), std::begin( c ), [] ( int x, int y ) { return ( std::min( x, y ) ); } ); std::cout << "C: "; for ( int x : c ) std::cout << x << ' '; std::cout << std::endl; } |
Listing 1 |
Note: If you are using MS Visual C++ to compile the code then specify
=
as the capture-default option in the lambda expression used in the calls of the algorithm
std::generate
. Otherwise you can get a compilation error.
The program might have the following output:
A: 2 0 3 7 2 4 0 4 2 2 6 5 3 6 7 0 6 9 0 2 B: 6 2 1 4 0 5 5 4 6 0 2 8 2 7 7 7 1 7 3 5 C: 2 0 1 4 0 4 0 4 2 0 2 5 2 6 7 0 1 7 0 2
This is nothing unusual or difficult. The only detail you should take into account is that you may not write simply
std::min<int>
instead of the lambda expression in the call of the algorithm because the compiler will report an error saying that there is an ambiguity for
std::min<int>
.
Indeed, when the C++ 2011 Standard was adopted a new special type appeared. It is
std::initializer_list
. Consequently several standard algorithms, including
std::min
, were overloaded to accept
std::initializer_list
as a parameter type. So
std::min<int>
can be either a specialization of the function with two parameters of type
int
(more precisely of type
const int &
) or a specialization of the function that has parameter of type
std::initializer_list<int>
. Thus you need a means of helping the compiler to select the right function. And using lambda expressions as wrappers around overloaded functions in similar situations is such a means. Of course you may omit the template argument of the function within the lambda expression because the compiler can deduce it itself in this case.
From the simple to the complex
Whether a student will use the standard algorithm or write an appropriate loop himself is not important for us now: as usual, there is just a step between trivial and non-trivial tasks.
Indeed, let’s make the assignment a bit more complicated. Assume that now we need to fill elements of one array, array C, with the minimum values of corresponding elements of the original arrays and to fill elements of another array, say, array D, with the maximum values of corresponding elements of arrays A and B.
What should we do in this case? We could use the previous call of
std::transform
twice, first to form array C with the minimum values and then, in the second call, to form array D with the maximum values.
However, it is obvious that such an approach is inefficient. Moreover, if we make the assignment even more complicated and require that instead of using two additional arrays, C and D, we have to overwrite the original arrays A and B with the minimum and maximum values of their elements respectively – that is, to do the task ‘in place’ – it is clear that this approach simply will not work.
Is it a dead end and will we be forced to use an ordinary loop as a student would do?
It is possible that these complicated assignments could be done with some other standard algorithms. I think that maybe
std::inner_product
will cope with the tasks. I am not sure, I did not try it. It is simply my supposition.
But what I am sure of is that if such solutions using other standard algorithms exist, they will look artificial and will not be easily readable.
It seems that there are no satisfactory solutions. There are indeed no satisfactory solutions if we must act within the frames of available standard constructions (algorithms, iterators and so on) provided by the C++ Standard (if you have such solutions please let me know).
However, let’s not abandon hope but return to the previous code example and consider the call of
std::transform
more closely.
std::transform( std::begin( a ), std::end( a ), std::begin( b ), std::begin( c ), [] ( int x, int y ) { return ( std::min( x, y ) ); } );
For the new, more complicated, assignments we need to get the minimum and maximum values of each pair of elements of arrays A and B simultaneously. There is a standard algorithm that can do this job. It is algorithm
std::minmax
. Not so bad! Let’s replace
std::min
in the lambda expression with
std::minmax
.
std::transform( std::begin( a ), std::end( a ), std::begin( b ), std::begin( c ), [] ( int x, int y ) { return ( std::minmax( x, y ) ); } );
So the lambda expression will now return an object of type
std::pair<const int &, const int &>
. The problem is that the iterator for array C cannot accept objects of that type and our purpose is to deal with two arrays simultaneously instead of one array.
Hmm...What will happen if we substitute the iterator of array C for a pair of iterators (the same way as we did with the substitution of
std::min
that returns a single object for
std::minmax
) that returns a pair of objects (actually one object of type
std::pair<const int &, const int &>
)?
It is an idea! Let’s write the renewed call of
std::transform
first and then discuss it.
std::transform( std::begin( a ), std::end( a ), std::begin( b ), make_iterator_pair( std::begin( c ), std::begin( d ) ), [] ( int x, int y ) { return ( std::minmax( x, y ) ); } );
So how does it look?
The functional object returns an object of type
std::pair<const int &, const int &>
and it is met by an iterator of type
std::pair<int *, int *>
that is by the pair of iterators. Each iterator will get its own value. Thus arrays C and D will be filled as required.
Of course there is no such a function as
make_iterator_pair
at present in the C++ Standard, in the same way as there is no iterator adapter
iterator_pair
itself. It is only my proposal. However, as you can see if there were such constructions our complicated assignments could be done very simply and elegantly.
Now all that we need to enjoy the luxury of using this iterator adapter to run programs for the assignments is to implement it.
Time to build the iterator adapter
The iterator adapter
iterator_pair
will have the iterator category
std::output_iterator_tag
. This allows us to combine any two iterators that satisfy the requirements of output iterators. Its value type will be a pair of value types of the underlying iterators. For convenience the definition of the iterator adapter is placed in a separate header file with name
"iterator_pair.h"
inside the name space
usr
.
Listing 2 is the iterator adapter definition, with boilerplate
include
guards removed for brevity.
#include <iterator> #include <utility> namespace usr { using namespace std; template <class Iterator1, class Iterator2> class iterator_pair : public iterator < output_iterator_tag, pair<typename iterator_traits<Iterator1>::value_type, typename iterator_traits<Iterator2>::value_type>, void, void, void > { public: typedef pair<Iterator1, Iterator2> iterator_type; iterator_pair( Iterator1, Iterator2 ); explicit iterator_pair( const pair<Iterator1, Iterator2> & ); explicit iterator_pair( pair<Iterator1, Iterator2> && ); iterator_type base() const; iterator_pair<Iterator1, Iterator2> & operator =( const pair< typename iterator_traits<Iterator1>::value_type, typename iterator_traits<Iterator2>::value_type > & ); iterator_pair<Iterator1, Iterator2> & operator =( pair< typename iterator_traits<Iterator1>::value_type, typename iterator_traits<Iterator2>::value_type > && ); iterator_pair<Iterator1, Iterator2> & operator *(); iterator_pair<Iterator1, Iterator2> & operator ++(); iterator_pair<Iterator1, Iterator2> operator ++( int ); protected: iterator_type it; }; template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> make_iterator_pair( Iterator1, Iterator2 ); template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> make_iterator_pair( const pair<Iterator1, Iterator2> & ); } namespace usr { template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2>::iterator_pair( Iterator1 it1, Iterator2 it2 ) : it( it1, it2 ) {} template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2>::iterator_pair( const pair<Iterator1, Iterator2> & it_pair ) : it( it_pair ) {} template <class Iterator1, class Iterator2> typename iterator_pair<Iterator1, Iterator2>::iterator_type iterator_pair<Iterator1, Iterator2>::base() const { return ( it ); } template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> & iterator_pair<Iterator1, Iterator2>::operator =( const pair< typename iterator_traits<Iterator1>::value_type, typename iterator_traits<Iterator2>::value_type > & value ) { *( it.first ) = value.first; *( it.second ) = value.second; return ( *this ); } template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> & iterator_pair<Iterator1, Iterator2>::operator =( pair< typename iterator_traits<Iterator1>::value_type, typename iterator_traits<Iterator2>::value_type > && value ) { *( it.first ) = value.first; *( it.second ) = value.second; return ( *this ); } template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> & iterator_pair<Iterator1, Iterator2>::operator *() { return ( *this ); } template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> & iterator_pair<Iterator1, Iterator2>::operator ++() { ++it.first; ++it.second; return ( *this ); } template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> iterator_pair<Iterator1, Iterator2>::operator ++( int ) { iterator_pair<Iterator1, Iterator2> tmp( it ); it.first++; it.second++; return ( tmp ); } template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> make_iterator_pair( pair<Iterator1, Iterator2> && it_pair ) { return ( iterator_pair<Iterator1, Iterator2>( it_pair ) ); } template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> make_iterator_pair( Iterator1 it1, Iterator2 it2 ) { return ( iterator_pair<Iterator1, Iterator2>( it1, it2 ) ); } template <class Iterator1, class Iterator2> iterator_pair<Iterator1, Iterator2> make_iterator_pair( const pair<Iterator1, Iterator2> & it_pair ) { return ( iterator_pair<Iterator1, Iterator2>( it_pair ) ); } } |
Listing 2 |
All is ready. It is time to enjoy the fruits of our labor. Below a program is presented that performs both assignments. First, it fills the two arrays C and D with the minimum and maximum values of each pair of elements of arrays A and B, and then overwrites arrays A and B themselves with the same minimum and maximum values. See Listing 3.
#include "iterator_pair.h" int main() { const size_t N = 20; int a[N], b[N], c[N], d[N]; std::srand( ( unsigned int )std::time( nullptr ) ); std::generate( std::begin( a ), std::end( a ), [] { return ( std::rand() % ( N / 2 ) ); } ); std::cout << "A: "; for ( int x : a ) std::cout << x << ' '; std::cout << std::endl; std::generate( std::begin( b ), std::end( b ), [] { return ( std::rand() % ( N / 2 ) ); } ); std::cout << "B: "; for ( int x : b ) std::cout << x << ' '; std::cout << std::endl; std::transform( std::begin( a ), std::end( a ), std::begin( b ), usr::make_iterator_pair( std::begin( c ), std::begin( d ) ), [] ( int x, int y ) { return ( std::minmax( x, y ) ); } ); std::cout << "C: "; for ( int x : c ) std::cout << x << ' '; std::cout << std::endl; std::cout << "D: "; for ( int x : d ) std::cout << x << ' '; std::cout << std::endl; std::cout << std::endl; std::cout << "A: "; for ( int x : a ) std::cout << x << ' '; std::cout << std::endl; std::cout << "B: "; for ( int x : b ) std::cout << x << ' '; std::cout << std::endl; std::transform( std::begin( a ), std::end( a ), std::begin( b ), usr::make_iterator_pair( std::begin( a ), std::begin( b ) ), [] ( int x, int y ) { return ( std::minmax( x, y ) ); } ); std::cout << "A: "; for ( int x : a ) std::cout << x << ' '; std::cout << std::endl; std::cout << "B: "; for ( int x : b ) std::cout << x << ' '; std::cout << std::endl; } |
Listing 3 |
The program might have the following output:
A: 3 1 2 2 9 3 4 9 8 8 2 5 7 2 3 5 3 0 8 4 B: 6 8 7 2 5 7 5 2 1 2 4 7 3 7 1 2 2 5 3 2 C: 3 1 2 2 5 3 4 2 1 2 2 5 3 2 1 2 2 0 3 2 D: 6 8 7 2 9 7 5 9 8 8 4 7 7 7 3 5 3 5 8 4 A: 3 1 2 2 9 3 4 9 8 8 2 5 7 2 3 5 3 0 8 4 B: 6 8 7 2 5 7 5 2 1 2 4 7 3 7 1 2 2 5 3 2 A: 3 1 2 2 5 3 4 2 1 2 2 5 3 2 1 2 2 0 3 2 B: 6 8 7 2 9 7 5 9 8 8 4 7 7 7 3 5 3 5 8 4
You can see that the program has done all that is required in the assignments.
To gain a more complete insight about the possibilities of the iterator adapter, let’s consider one more use case that occurs in practice where the iterator adapter would come in handy.
Sometimes it is required to copy the key values and mapped values of some associative container having, for example, type
std::map
in two other separate sequential containers. So let’s assume that there is a container of type
std::map<int, std::string>
and your task, for example, is to copy the key values of the map in a container of type
std::vector<int>
and mapped values of the map in a container of type
std::forward_list<std::string>
.
Before you continue to read the article further, it would be useful for you to try to do the assignment yourself using some standard algorithms and then compare your solution with the solution based on applying iterator adapter
iterator_pair
.
Have you written your solution yet? How much time did it take to write it?
Now compare it with what is being suggested in Listing 4.
#include "iterator_pair.h" int main() { std::map<int, std::string> m; std::istringstream is( "Hello new iterator adapter iterator_pair!" ); int i = 0; std::transform( std::istream_iterator<std::string>( is ), std::istream_iterator<std::string>(), std::inserter( m, m.begin() ), [&i]( const std::string &s ) { return ( std::make_pair( i++, s ) ); } ); std::vector<int> v( m.size() ); std::forward_list<std::string> l( m.size() ); std::copy( m.begin(), m.end(), usr::make_iterator_pair( v.begin(), l.begin() ) ); for ( int x : v ) std::cout << x << ' '; std::cout << std::endl; for ( const std::string &s : l ) std::cout << s << ' '; std::cout << std::endl; } |
Listing 4 |
The program has the following output:
0 1 2 3 4
Hello new iterator adapter
iterator_pair
!
The central point of the program is the statement
std::copy( m.begin(), m.end(), usr::make_iterator_pair( v.begin(), l.begin() ) );
that does all the work. I think that you will agree with me that the statement looks very clear and does not require much time to understand what is being done here.
It seems that we could end the article here. We have gotten a remarkably simple and useful iterator adapter. However, it is the C++ Standard that does not allow us to do this.
In the previous listings, the given number of elements of containers
std::vector<int>
and
std::forward_list<std::string>
were created beforehand. So at first the elements were created and initialized with the default values and only then the actual values were assigned to them in the call of algorithm
std::copy
.
The two separate operations – the default construction of objects and assigning actual values to them – can be substituted for one operation of copy construction. Calls of the copy assignment operator can be eliminated. For simple types – such as fundamental types – it is not as important. However, in general for objects of complex types, two calls of two special functions instead of one call of one special function can be wasteful.
Moreover, if we want new elements to be added to container
std::forward_list<std::string>
in the reverse order relative to the order of the elements in container
std::map<int, std::string>
then it makes no sense to create the elements of
std::forward_list<std::string>
beforehand, because the class
std::forward_list
does not have a reverse iterator.
Therefore let’s make some minor changes to the program. We will not create elements of containers
std::vector<int>
and
std::forward_list<std::string>
beforehand. Instead only reserve enough memory for the container
std::vector<int>
’s future elements and then respectively
std::back_insert_iterator
and
std::front_insert_iterator
will be used in the call of
std::copy
.
Now the program will look like Listing 5.
#include "iterator_pair.h" int main() { std::map<int, std::string> m; std::istringstream is( "Hello new iterator adapter iterator_pair!" ); int i = 0; std::transform( std::istream_iterator<std::string>( is ), std::istream_iterator<std::string>(), std::inserter( m, m.begin() ), [&i]( const std::string &s ) { return ( std::make_pair( i++, s ) ); } ); std::vector<int> v; v.reserve( m.size() ); std::forward_list<std::string> l; std::copy( m.begin(), m.end(), usr::make_iterator_pair( std::back_inserter( v ), std::front_inserter( l ) ) ); for ( int x : v ) std::cout << x << ' '; std::cout << std::endl; for ( const std::string &s : l ) std::cout << s << ' '; std::cout << std::endl; } |
Listing 5 |
If you have typed the program correctly without any typos (or if I did this myself correctly because you are going simply to copy and paste the program) then you may bravely compile the program and ... you will get compilation errors!
There is no visible cause why the code might not be compiled. Therefore all questions should be addressed to the compiler or, to be more correct, to the C++ Standard: why it does not allow the compiler to compile the program.
What is your name?
If you are going to contact someone you should use either their real name or a common form of address. Otherwise the result can be unexpected: you might get ignored entirely or there might be even more unpleasant consequences. (Imagine what might happen if you call your spouse by the wrong name.)
The same is true for the world of classes and objects.
If you want your programs to be safe, flexible, and portable, you should use the following general convention:
- Your classes have to provide common forms of address for their properties.
- Code that uses your classes has to use these common forms of address when it tries to access properties of your classes (or use their real names, but that can be tricky).
- In any case, it is better to use a common form of address because the real name of a property can vary between implementations.
As you already know, ‘names’ in the given context means the type names of class properties.
Here are two simple examples that demonstrate what can occur if you do not comply with the convention.
The first example. Let’s assume that you have a project where there is a set of flags. You chose to use the standard class
std::vector<bool>
as the container for the flags. Throughout the project a few methods do some processing based on the number of a flag in the set and its value passed together to the methods as arguments. Of course you tried to make your code flexible and independent of the details of the underlying container. The code could look something like Listing 6.
using special_flags_t = std::vector<bool>; void method1( special_flags_t::size_type flag_number, bool flag_value ) { // some processing using the flag } //... void methodn( special_flags_t::size_type flag_number, bool flag_value ) { // some processing using the flag } //... special_flags_t flag_values; //... for ( special_flags_t::size_type flag_number = 0; flag_number < flag_values.size(); flag_number++ ) { method1( flag_number, flag_values[flag_number] ); } //... for ( special_flags_t::size_type flag_number = 0; flag_number < flag_values.size(); flag_number++ ) { methodn( flag_number, flag_values[flag_number] ); } |
Listing 6 |
After a time, you conclude that it would be better to replace
std::vector<bool>
with
std::bitset
because the set of flags has a small fixed size. You might think that it will be enough to substitute only the alias declaration (and it would be indeed great if it was enough) because, after all you tried to write the code in such a way that it would be independent of the details of the underlying container.
However, if you make the substitution and instead of
using special_flags_t = std::vector<bool>;
write
using special_flags_t = std::bitset<N>;
(where
N
is some predefined constant), the project will not compile and the compiler will issue numerous errors!
The problem is that the standard class
std::bitset
does not provide the common form of address
size_type
for its property size. Thus your good intentions were completely subverted.
Note: the author has submitted a proposal to add a typedef declaration
size_type
for standard class
std::bitset
.
The second example. A programmer wrote the following snippet of code being sure that nothing extraordinary can occur within it.
std::string s; std::string source; std::string dest; //... unsigned int pos = s.find( source ); if ( pos != std::string::npos ) { s.replace( pos, source.size(), dest ); }
He was very suprised when this code snippet generated an exception
std::out_of_range
!
The reason for the exception is that the size of the unsigned integral type used by the container to represent its own property size happened to be greater than the size of type
unsigned int
in the environment where the program was compiled and run. So when string source had not been found in string
s
in statement
unsigned int pos = s.find( source );
the value returned by the method find of
std::string
was reduced using the arithmetic modulo 2 operation to fit
pos
. And then in the statement
if ( pos != std::string::npos )
it was again enlarged to the size of the type of
std::string::npos
according to the rules of the usual arithmetic conversions by setting the most significant bits to zeroes. As a result the condition in the
if
statement was evaluated to
true
and the incorrect value of
pos
was used further in method
replace
.
The exception could be avoided and the program would be portable if the programmer were to use the common form of address
std::string::size_type
in the declaration of variable
pos
instead of the type
unsigned int
.
std::string::size_type pos = s.find( source );
Or it would be even better to write simply
auto pos = s.find( source );
When you deal with containers or sequences of data directly or through iterators, one of the most important and useful pieces of information is about the value type of elements of the container or sequence. Without having such information, it is difficult to write generic and safe code. You should provide the common form of address
value_type
so that code can access the elements. Otherwise you will be helpless and will be unable to write generic template code.
That’s exactly what happened for our
iterator_pair
. Both of the iterator adapters (
std::back_insert_iterator
and
std::front_insert_iterator
) hide the actual value of the common form of address
value_type
of the underlaying containers from the user, making the property value type itself inaccessible.
If you look at how the iterators are defined [
ISO/IEC
] you will see that the second template argument of the inherited base class
std::iterator
that corresponds to the common form of address
value_type
is set to
void
. (See Listing 7.)
template <class Container> class back_insert_iterator : public iterator<output_iterator_tag,void,void,void,void> { //... }; template <class Container> class front_insert_iterator : public iterator<output_iterator_tag,void,void,void,void> { //... }; |
Listing 7 |
Thus when the property value type of iterator adapter
iterator_pair
that in turn is defined like
pair < typename iterator_traits<Iterator1>::value_type, typename iterator_traits<Iterator2>::value_type >
was instantiated then the compiler issued an error because it cannot instantiate
std::pair
with data members of type
void
.
These two iterators look like black holes. If a container finds itself in a constructor of the iterators then it instantly loses without a trace its main property, the value type.
On the other hand, if you look at how assignment operators are defined [ ISO/IEC ] for these iterators, you will see that they use the property value type of underlaying containers. For example
front_insert_iterator<Container>& operator=(const typename Container::value_type& value);
But they use the property bypassing their own common form of address
value_type
.
The same problem exists with
std::ostream_iterator
. Ask any programmer, for example, what type of objects the iterator
std::ostream_iterator<std::string>
, can output and he will answer without delay: “
Objects of type
std::string
or at least those objects that can be implicitly converted to type
std::string
.
”
And he will be right. But if you look at how the iterator is defined [
ISO/IEC
] you will see that its property value type is defined the same way as this property is defined for iterators
back_insert_iterator
and
front_insert_iterator
; that is, it is set to
void
and thus the real value type is hidden and inaccessible for the user of the iterator.
template < class T, class charT = char,class traits = char_traits<charT> > class ostream_iterator : public iterator<output_iterator_tag, void,void, void, void> { //... };
The very notion of an iterator adapter implies that it does not modify the properties of the underlying containers or objects. Instead it gives them new opportunities based on their own functionality.
It will not be difficult to define these iterator adapters, appending the iterator
std::insert_iterator
to them in such a way that they do not hide the main property, the value type, of the underlaying containers or objects. Their definitions could look like Listing 8.
template <class Container> class back_insert_iterator : public iterator< output_iterator_tag, typename Container::value_type,void,void,void> { //... }; template <class Container> class front_insert_iterator : public iterator< output_iterator_tag, typename Container::value_type,void,void,void> { //... }; template <class Container> class insert_iterator : public iterator< output_iterator_tag, typename Container::value_type,void,void,void> { //... }; template <class T, class charT = char, class traits = char_traits<charT> > class ostream_iterator : public iterator< output_iterator_tag, T, void, void, void> { //... }; |
Listing 8 |
And it should be done because as you will soon see, the problem is not limited only to the definition of the
iterator_pair
.
May an unsafe algorithm be called an algorithm in programming?
At the very beginning of the article, we helped a student. Now let the student help us.
We will ask the student to write a function that will store partial sums of elements of an array of type
std::uint8_t[N]
filled with random values in some other integer array.
Because the student does not know standard algorithms yet, he has written the function in C-style.
Listing 9 is his function.
int * partial_sum( const std::uint8_t a[], size_t n, int b[] ) { if ( n ) { auto acc = int( *a++ ); *b++ = acc; while ( --n ) { acc = acc + *a++; *b++ = acc; } } return b; } |
Listing 9 |
To be sure that the student’s function is correct we need to test it. Because each of us is a qualified programmer (aren’t you?) and, in contrast to the student, we know that the standard algorithm
std::partial_sum
already exists and how to use it, we can conclude that it will be reasonable simply to compare the results of using the student’s function and standard algorithm
std::partial_sum
applied to the same array. Otherwise what is the use of the algorithm?
The test program can look like Listing 10.
int * partial_sum( const std::uint8_t a[], size_t n, int b[] ) { if ( n ) { auto acc = int( *a++ ); *b++ = acc; while ( --n ) { acc = acc + *a++; *b++ = acc; } } return b; } int main() { const size_t N = 10; std::uint8_t a[N]; int b[N]; std::srand( ( unsigned int ) std::time( nullptr ) ); std::generate( std::begin( a ), std::end( a ), []() { return std::rand() % std::numeric_limits<std::uint8_t>::max(); } ); for ( int x : a ) std::cout << std::setw( 4 ) << x << ' '; std::cout << std::endl << std::endl; ::partial_sum( a, N, b ); for ( int x : b ) std::cout << std::setw( 4 ) << x << ' '; std::cout << std::endl; std::partial_sum( std::begin( a ), std::end( a ), std::begin( b ) ); for ( int x : b ) std::cout << std::setw( 4 ) << x << ' '; std::cout << std::endl; } |
Listing 10 |
Well, let’s run the test program, shall we?
The program output might look like:
110 152 109 192 160 180 82 212 74 6 110 262 371 563 723 903 985 1197 1271 1277 110 6 115 51 211 135 217 173 247 253
Oops! What are we seeing? Even for the second partial sum the values do not match and it seems that it is not the student’s function that has not passed the test but the standard algorithm.
There is no need to go far to find the reason for the incorrect result yielded by the algorithm. The answer is staring you in the face.
It is evident that if you are going to use some accumulator for a sequence of data then the accumulator should be defined with a type that has a larger size than the size of the type of the source data so that it would be able to accomodate all accumulated values correctly without overflowing.
This is exactly how an accumulator is defined in the student’s function. It has the type of the elements of the output array that stores accumulated values.
On the other hand if you have a dip into the description of algorithm
std::partial_sum
in the C++ Standard [
ISO/IEC
] you will see that according to the C++ Standard the algorithm creates an accumulator
acc
whose type is the value type of the input iterator.
Thus as the output of the test program has shown, in general you can ensure the safe and correct work of the algorithm only for sequences that contain a single element. You cannot control the type of the accumulator.
The same problem exists for another standard algorithm
std::adjacent_difference
.
Ask yourself what is the use of such algorithms?
Fixing this defect of the algorithms will not require a lot of effort. It is enough within the algorithms to define an accumulator as having type of the value type of the output iterator. Below is an updated version of the algorithm
std::partial_sum
without the template parameter of
operation
.
template <class InputIterator, class OutputIterator> OutputIterator partial_sum( InputIterator first, InputIterator last, OutputIterator result ) { if ( first != last ) { typename std::iterator_traits<OutputIterator>::value_type acc = *first++; *result++ = acc; for ( ; first != last; ++first, ++result ) { acc = acc + *first; *result = acc; } } return result; }
In the same way, algorithm
std::partial_sum
could be defined with the template parameter of
operation
and the corresponding versions of the algorithm
std::adjacent_difference
.
Now if we substitute the call of standard algorithm
std::partial_sum
for a call of its new implementation as it is shown in Listing 11, then both outputs of partial sums will match each other.
int * partial_sum( const std::uint8_t a[], size_t n, int b[] ) { if ( n ) { auto acc = int( *a++ ); *b++ = acc; while ( --n ) { acc = acc + *a++; *b++ = acc; } } return b; } namespace usr { template <class InputIterator, class OutputIterator> OutputIterator partial_sum( InputIterator first, InputIterator last, OutputIterator result ) { if ( first != last ) { typename std::iterator_traits<OutputIterator>::value_type acc = *first++; *result++ = acc; for ( ; first != last; ++first, ++result ) { acc = acc + *first; *result = acc; } } return result; } } // end of namespace usr int main() { const size_t N = 10; std::uint8_t a[N]; int b[N]; std::srand( ( unsigned int )std::time( nullptr ) ); std::generate( std::begin( a ), std::end( a ), []() { return std::rand() % std::numeric_limits<std::uint8_t>::max(); } ); for ( int x : a ) std::cout << std::setw( 4 ) << x << ' '; std::cout << std::endl << std::endl; ::partial_sum( a, N, b ); for ( int x : b ) std::cout << std::setw( 4 ) << x << ' '; std::cout << std::endl; usr::partial_sum( std::begin( a ), std::end( a ), std::begin( b ) ); for ( int x : b ) std::cout << std::setw( 4 ) << x << ' '; std::cout << std::endl; } |
Listing 11 |
140 138 70 20 134 191 181 45 56 37 140 278 348 368 502 693 874 919 975 1012 140 278 348 368 502 693 874 919 975 1012
However, this is only half the story. To get the fully functional algorithms, the standard iterator adapters
std::back_insert_iterator
,
std::front_insert_iterator
,
std::insert_iterator
, and
std::ostream_iterator
should be modified in the way described in the previous section. Only then will separate parts of the mosaic develop into a coherent picture.
Let’s consider two algorithms
std::partial_sum
and
std::accumulate
that supplement each other. It is natural to expect that partial sums of elements of a container or data sequence produced by algorithm
std::partial_sum
would be the partial sums calculated inside algorithm
std::accumulate
for the same container or data sequence and that the last partial sum produced by the
std::partial_sum
would be equal to the final value returned by the
std::accumulate
.
In other words if, for example, there is an integer array
int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
you expect that these two programs
int main() { int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; for ( auto last = std::begin( a ); last != std::end( a ); ) { std::cout << std::accumulate( std::begin ( a ), ++last, 0 ) << " "; } std::cout << std::endl; }
and
int main() { int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; std::partial_sum( std::begin( a ), std::end( a ), std::ostream_iterator<int>( std::cout, " " ) ); std::cout << std::endl; }
will yield the same output
0 1 3 6 10 15 21 28 36 45
And what about for example an array of pointers to string literals instead of the array of integers? Let the array will be defined the following way
const char * s[] = { "Hello ", "new ", "iterator ", "adapter ", "iterator_pair!" };
In this case the first program can look like
int main() { const char * s[] = { "Hello ", "new ", "iterator ", "adapter ", "iterator_pair!" }; for ( auto last = std::begin( s ); last != std::end( s ); ) { std::cout << std::accumulate( std::begin( s ), ++last, std::string() ) << std::endl; } }
And its output will be
Hello Hello new Hello new iterator Hello new iterator adapter Hello new iterator adapter iterator_pair!
So the ‘partial sums’ of the pointers to string literals produced by algorithm
std::partial_sum
have to look the same.
All you need to do to get this result is:
-
Substitute the standard algorithm
std::partial_sum
for the updated version presented in this section of the article. - In the definition of std::ostream_iterator, change the second template argument of its base class std::iterator from void to T as shown earlier. You can do this temporarily in the implementation of the iterator in the corresponding standard header provided by the compiler.
Thus the second program will look like Listing 12.
namespace usr { template <class InputIterator, class OutputIterator> OutputIterator partial_sum( InputIterator first, InputIterator last, OutputIterator result ) { if ( first != last ) { typename std::iterator_traits<OutputIterator>::value_type acc = *first++; *result++ = acc; for ( ; first != last; ++first, ++result ) { acc = acc + *first; *result = acc; } } return result; } } // end of namespace usr int main() { const char * s[] = { "Hello ", "new ", "iterator ", "adapter ", "iterator_pair!" }; usr::partial_sum( std::begin( s ), std::end( s ), std::ostream_iterator<std::string>( std::cout, "\n" ) ); } |
Listing 12 |
And if you have changed
std::ostream_iterator
as it is described then you indeed will see the ‘partial sums’ of the pointers to string literals as it is shown above. On the other hand, if you will try the program using the original standard algorithm
std::partial_sum
then it will not compile!
Thus standard algorithms
std::partial_sum
and
std::adjacent_difference
are not only able to guarantee a correct and predictable result, but sometimes they might not even compile.
Acknowledgement
The author would like to thank Jonathan Leffler for his help.
References
[Boost] http://www.boost.org/doc/libs/1_57_0/libs/iterator/doc/zip_iterator.html
[ISO/IEC] ISO/IEC 14882:2011 Programming Language C++
The full code is available at: http://cpp.forum24.ru/?1-10-0-00000000-000-0-0-1428232189