    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Summing Arrays</title>
        <link>http://accu.org/index.php/journals/1144</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 13, #6 - Dec 2001 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="http://accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c117/">136</a>
                    (8)
<br />

                                            <a href="http://accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c65/">Programming</a>
                    (488)
<br />

                                            <a href="http://accu.org/index.php/journals/c117-65/">Any of these categories</a>

                    -                        <a href="http://accu.org/index.php/journals/c117+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Summing Arrays</h1>
<p><strong>Author:</strong>&nbsp;Administrator</p>
<p>
<strong>Date:</strong> 03 December 2001 13:15:48 +00:00 or Mon, 03 December 2001 13:15:48 +00:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e20" id="d0e20"></a></h2>
</div>
<p><span class="emphasis"><em>&quot;What's one and one and one and one
and one and one and one?&quot;</em></span> Lewis Carroll had Alice ask.
In his real life as the Reverend Charles L Dodgson, an accomplished
mathematician and tutor at Christ Church College in the University
of Oxford, Dodgson was no doubt aware of the problems this question
would cause his students. But he could not be expected to have
imagined the difficulty that finding the sum of a list of numbers
would cause today's computer programmers.</p>
<p>Let me reword the opening question in a form which you could go
away and program. Write a function which takes a pointer,
'<tt class="literal">a</tt>', to an integer array of length
'<tt class="literal">len_a</tt>', and return the sum of the
elements as the function value.</p>
<p>And what I am going to talk about in this column is the pitfalls
you can get into trying to answer this question and suggest some
ways of getting the utmost out of the limited computer resources
you have available to solve the problem.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e36" id="d0e36"></a>The First
Attempt</h2>
</div>
<p>The solution initially seems easy enough. Use a local variable
<tt class="literal">sum_a</tt> to hold the running total of the
elements. Then, should not <tt class="literal">*a</tt> and
<tt class="literal">sum_a</tt> be of the same type, which is
<tt class="literal">int</tt>? And <tt class="literal">len_a</tt>
must be of type <tt class="literal">size_t</tt> because, being used
as an array index, it is conceptually of the same type as that
returned by the <tt class="literal">sizeof</tt> operator.</p>
<p>Look at:</p>
<pre class="programlisting">
#include  stddef.h
int sum_1 (int *a, size_t len_a)
{
  int     sum_a = 0;
  size_t  i;
  for(i = 0; i &lt; len_a; i++)
    sum_a += a[i];
  return (sum_a);
}
</pre>
<p>Certainly I have written code like that before now.</p>
<p>But of course, a variable of type integer, or any other type for
that matter, can only hold a certain range of numbers. If you want
to know what this range is, it is defined as macros in the standard
header file <tt class="literal">limits.h</tt> for the integral
types and <tt class="literal">float.h</tt> for the floating
types.</p>
<p>So sooner or later, depending on what is in the array <tt class=
"literal">a</tt>, the variable <tt class="literal">sum_a</tt> will
overflow. That is, the sum of the elements added up so far will be
too big to be held in <tt class="literal">sum_a</tt>. If you are
lucky, the computer will stop your program and tell you this. More
likely, it will merely store an incorrect answer and another famous
&quot;computer error&quot; is born.</p>
<p>This may happen far sooner than you think. Assuming a 16-bit
signed integer, and that the first two entries in the array are
both over 16383, <tt class="literal">sum_a</tt> will be incorrectly
set after only the first two entries of a have been summed. This is
not very much if the things being summed are entries, in pence, in
a bank account.</p>
<p>The first attempt to fix the problem with <tt class=
"literal">sum_1</tt> might be to change the type of <tt class=
"literal">sum_a</tt> from <tt class="literal">int</tt> to
<tt class="literal">long int</tt>. And because a <tt class=
"literal">long int</tt> can hold a far larger number than a simple
<tt class="literal">int</tt>, the sum is less likely to overflow.
There is a performance penalty to be paid for this in terms of a
longer run time (<span class="emphasis"><em>yes, probably if a
16-bit <tt class="literal">int</tt> is the native type for the
hardware, but not likely otherwise- Francis</em></span>), but that
is a small price for ensuring you get the right answer, and may
even be nothing for some of the bigger microprocessors now
around.</p>
<p>What this has done is to introduce some extra bits into the sum
which guard against the overflow. If an overflow of a 16-bit add
were to occur, there is somewhere for the computer hardware to
safely record this fact without having to potentially return an
incorrect answer.</p>
<p>Now a <tt class="literal">long int</tt> also has a limit to the
largest number it can correctly hold. Assuming 32-bit <tt class=
"literal">long int</tt>, that limit is 65536 times bigger than the
limit of a 16-bit <tt class="literal">int</tt>. This means that if
the values in <tt class="literal">a</tt> are all as big as they can
possibly be, at 32767, the length of <tt class="literal">a</tt>
must be 65539 or more before any overflow can occur.</p>
<p>Of course, if the values in <tt class="literal">a</tt> are
smaller, a proportionally longer array can be summed before there
is a problem. And if positive and negative values occur with equal
probabilities, you can expect that it will take about 65539 squared
times as long before an overflow occurs. That is over four thousand
million times longer, so is extremely unlikely to happen.</p>
<p>But now what happens if the type of the array <tt class=
"literal">a</tt> is itself a <tt class="literal">long int</tt>?
<tt class="literal">sum_a</tt> is already a <tt class=
"literal">long int</tt>, so you are back to the original situation:
after summing just two entries you might have an integer overflow
to contend with.</p>
<p>To circumvent the problem this time, you cannot make <tt class=
"literal">sum_a</tt> a long long int because the C Standard does
not provide such a type (it didn't, but as of 1999 it does, but
that only moves the problem up a level). Now there is no real
reason why an enterprising compiler author could not implement a
<tt class="literal">long long int</tt>. But if you were to use it
you might subsequently find that only his old, extremely expensive,
compiler would build your code, and then not for the new
microprocessor you wanted to use.</p>
<p>Using a floating type (i.e. <tt class="literal">float</tt>,
<tt class="literal">double</tt>, <tt class="literal">long
double</tt>) is not really the way to add up a long list of
integers, even if the latter two types will generally give a
greater accuracy than a <tt class="literal">long int</tt>.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e178" id="d0e178"></a>So what are
you to do?</h2>
</div>
<p>By thinking about the sort of numbers you have to sum you can
decide what is going to go wrong. Then you can chose between
several careful ploys (i.e., tricks) to postpone the problem as
long as possible, hopefully to the point where the sum has been
found before any overflow occurs. Knowing where the numbers came
from will help do this. This may sound a bit like needing to know
the answer before you can find the answer, but life is like
that.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e183" id="d0e183"></a>Try Biasing
the Array</h2>
</div>
<p>If the numbers are all roughly the same (and what &quot;roughly&quot;
means is all relative), you might try subtracting an estimate of
the mean (or average) from all the numbers before you sum them.
Then, of course, the total of the numbers is the sum of the numbers
after the mean has been subtracted plus the product of that mean
and <tt class="literal">len_a</tt>. Now if the original sum is too
big for an <tt class="literal">int</tt>, so too will be that final
calculation. So both the estimated mean and the sum of the biased
numbers must be returned to the caller.</p>
<p>You have to estimate the quantity to subtract from the numbers
without finding their sum, and a fairly safe way to do this is to
take the average of the first, middle and last elements in the
array. I hold this quantity in <tt class="literal">bias_a</tt> and
hope that its calculation does not overflow. The example then
becomes:</p>
<pre class="programlisting">
#include  stddef.h
int sum_2 (int *a,size_t len_a) {
  size_t  i;
  int bias_a = (a[0]+a[len_a/2]+a[len_a-1])/3;
  int sum_a = 0;
  for(i = 0; i &lt; len_a; i++)
    sum_a += a[i] - bias_a;
  return (sum_a + (bias_a * len_a));
}
</pre>
<p>I return the sum directly here, knowing it will overflow, rather
than returning <tt class="literal">bias_a</tt> and <tt class=
"literal">sum_a</tt> but I think you will see what I mean. And
although intended for summing large arrays, the example will work
for small values of <tt class="literal">len_a</tt> of 3 or less:
always check that your code works for arrays of length one and does
something tidy for lengths less than one.</p>
<p>If the sum of the numbers exceeds the range of an integer,
ultimately you have to accept that fact. But is there something one
can do if, despite the idea of <tt class="literal">sum_2</tt>, the
partial sum exceeds the legal range, but the final sum does
not?</p>
<p>Well, yes there is, and this is my second trick.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e219" id="d0e219"></a>Ask What is
Going On</h2>
</div>
<p>Usually when summing an array, a few positive numbers are added
to the partial sum, so advancing towards the positive limit. It is
then fortuitously found that several negative numbers are
encountered which return the partial sum back towards the negative
limit. That is, the partial sum alternately approaches the maximum
and minimum integer values. And disaster occurs when this random
variation causes the partial sum to cross one of those bounds.</p>
<p>The trick is to postpone this disaster as long as possible and
the surprising way to do this is to sort the numbers before
summing. Then add up all the numbers, starting from one end of the
sorted array. The partial sum will almost certainly overflow before
long, but you can check for that and stop just before it occurs.
Then start accumulating numbers from the other end of the array.
These should have the opposite sign to those just added in, so will
move the partial sum quickly away from the danger zone of one limit
until the other limit is reached.</p>
<p>Repeat this, changing the end being summed from each time an
overflow is about to occur. And so the algorithm proceeds with the
running total lurching from one extreme to the other, until all the
numbers have been summed.</p>
<p>Of course, if your luck runs out, and the remaining numbers at
both array ends have the same sign, this algorithm fails, but then
the sum exceeds that which can be held in an integer anyway.</p>
<p>Look at the following example to see how this is done.</p>
<pre class="programlisting">
#include  stdio.h
#include  stddef.h
#include  stdlib.h
#include  limits.h

int compar (void *ap,void *bp){
  int a = * (int *) ap; 
  int b = * (int *) bp;
  if(a == b) return 0;
  return ((a  b) ? -1 : +1);
}

int o_flow (int sum, int next){
  /* Determine if adding 'next' to 'sum' would
   * cause an overflow.*/
  return (next  0 ?
    sum  INT_MIN - next :
    sum  INT_MAX - next
  );
}

int sum_3 (int *a, size_t len_a) {
  int *al, *au, sum_a, working;
  /* Sort the data into ascending order. */
     qsort (a, len_a, sizeof(a[0]), compar);
  /* Set up the summation.  */
  sum_a = *a;
  al = a + 1;
  au = a + len_a;
  /* Loop from both ends inwards until all data 
   * summed.   */
  while (al  au) {
    working = 0;
  /* Accumulate from the bottom end up until a 
   * further accumulation would cause a negative
   * overflow.  */
  while (al  au) {
    if (o_flow (sum_a, *al))
      break;
    else {
      sum_a += *al++;
      working = 1;
    }
  }

  /* Loop from the top down until a further item 
   * would cause a positive overflow */
  while (al  au) {
    if (o_flow (sum_a, *(au - 1)))
      break;
    else {
      sum_a += * --au;
      working = 1;
    }
  }

  /* If neither of the inner loops did anything, 
   * the sum will not fit an 'int' and so we 
   * cannot find the sum. */
  if (! working) {
    fputs (&quot;'sum_3' failed.\\n&quot;, stdout);
    exit (EXIT_FAILURE);
  }

  /* Successful completion. */
  return (sum_a);
}

#define LEN_A_MAX       16000
int     a[LEN_A_MAX];
int main (int argc, char *argv){
  size_t  len_a, i;
  int     sum;
  unsigned int    seed;
  len_a = get_int(&quot;Array length to sum: &quot;, 1, 
                 LEN_A_MAX);
  seed = get_int(&quot; Random number seed: &quot;, 0,
            RAND_MAX);
  srand (seed);
  for(i = 0; i  len_a ; i++) {
    a[i] = rand ();
    if(rand()  RAND_MAX/2) a[i] = -a[i];
  }
  sum = sum_3 (a, len_a);
  printf(&quot;'sum' = %1d\\n&quot;, sum);
  return EXIT_SUCCESS;
}
</pre>
<p>The summation function is called <tt class="literal">sum_3</tt>.
It uses the Standard function qsort together with the local
function compar to sort the array.</p>
<p>Note that the body of 'compar' cannot be simply:</p>
<pre class="programlisting">
 return (* (int *) ap - * (int *) bp);
</pre>
<p>because if both <tt class="literal">*ap</tt> and <tt class=
"literal">*bp</tt> are large numbers of opposite sign, this too
will overflow. Instead, I compare the numbers and return a constant
indicating the sign of the difference. I then rely on the compiler
correctly comparing two numbers irrespective of their values. That
is what it is for.</p>
<p>As an aside on compar note that although I know that the two
arguments point to integers, they must be defined as pointers to
<tt class="literal">void</tt>, not pointers to <tt class=
"literal">int</tt>, because of the definition of the <tt class=
"literal">qsort</tt> function. <tt class="literal">compar</tt> must
then explicitly cast every use of the arguments to a pointer to
<tt class="literal">int</tt>. This generally means that a
well-written <tt class="literal">compar</tt> will either copy the
arguments to local variables which are pointers to the wanted type
or, as I have done here, immediately cast the arguments to the
correct pointer type and dereference them with the <tt class=
"literal">*</tt> operator to obtain the values pointed to.</p>
<p>I use <tt class="literal">al</tt> in <tt class=
"literal">sum_3</tt> to point to the next element to be summed at
the lower end of the array and <tt class="literal">au</tt> to point
to the last element already summed at the array upper end. I do
things this way round because the C Standard guarantees that the
address of the element just past the array end must exist (even
though its value does not) but does not require that the address of
the element just before the first must exist. This becomes
important when checking to see if all array elements have been
summed.</p>
<p>The summation is more complex, having an outer loop which runs
two inner loops in turn. The first uses <tt class="literal">al</tt>
to sum from the array's lower end upwards and the second uses
<tt class="literal">au</tt> to sum from the upper end downwards.
The array has been completely summed when <tt class=
"literal">au</tt> and <tt class="literal">al</tt> point to the same
object.</p>
<p>The function <tt class="literal">o_flow</tt> is used to see if
adding <tt class="literal">next</tt> to the partial <tt class=
"literal">sum</tt> would result in an overflow. This it has to do
without itself causing such a catastrophe. The solution is to see
if <tt class="literal">next</tt> is positive or negative, and so
whether the most positive, or most negative, limit is that
threatened. Decreasing the limit by <tt class="literal">next</tt>
results in a limit closer to zero, so is itself within bounds.
<tt class="literal">sum</tt> must not exceed this bound (in the
appropriate direction) if adding <tt class="literal">next</tt> to
it is not to cause an overflow.</p>
<p>The variable <tt class="literal">working</tt> is used to ensure
that both inner loops add in at least one number on each pass. This
traps the possibility that <tt class="literal">al</tt> and
<tt class="literal">au-1</tt> both point to numbers of the same
sign and an overflow would occur if either were added in. As I said
earlier, this occurs if the sum cannot fit within an integer. When
it does occur, I flag this failure to <tt class=
"literal">exit()</tt>.</p>
<p>The outer loop exits when all array elements have been summed,
and that sum is returned as the function value.</p>
<p>My <tt class="literal">main</tt> function is a test for
<tt class="literal">sum_3</tt>. It uses the <tt class=
"literal">get_int</tt> function I wrote about in an earlier article
to get from the user an array length and seed for the pseudo random
number generator, <tt class="literal">rand</tt>. An array of
numbers is generated, summed, and that sum printed.</p>
<p>Normally, one would test a function with small data sets, or
array lengths in this case. But because here I am relying on the
randomness of the test data to enable the function to keep going, I
found that array lengths of 100 or more might be successfully
summed by <tt class="literal">sum_3</tt>, whilst arrays only 25
long would often fail, even though the longer array started with
the same numbers as the shorter array. When summing a few random
numbers, there is too great a chance that their sum is well distant
from zero.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e357" id="d0e357"></a>When All Else
Fails</h2>
</div>
<p>If neither of the above ideas enable you to find the array total
without problems, about the only option left is to resort to using
an arbitrary precision arithmetic package. These allow you to do
arithmetic to whatever accuracy is necessary to solve the problem
to hand. But describing one is well outside the scope of this
article, and would probably justify a book in its own right.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e362" id="d0e362"></a>Outstanding
Points</h2>
</div>
<p>I have left in a nasty shock for the user of <tt class=
"literal">sum_3</tt>. Because I sort the array a in place, the
array is not the same after <tt class="literal">sum_3</tt> returns
as when it was called. This an is unreasonable side-effect, and I
leave it to you to fix for me.</p>
<p>A point that some may like to look at is that the way I set
<tt class="literal">a[i]</tt> in <tt class="literal">main</tt>
results in it being assigned (on my 16 bit system) only 16384
different values. What should be done to make <tt class=
"literal">a[i]</tt> potentially have any of the 65536 values
allowed of a 16 bit integer?</p>
<p>Despite my earlier claims, neither <tt class=
"literal">sum_2</tt> nor <tt class="literal">sum_3</tt> work if
handed an array length of zero. You might like to look at this too.
But there is no need to check for a negative array length because
of the definition of <tt class="literal">size_t</tt>.</p>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
