    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Fine Tuning for lexical_cast</title>
        <link>http://accu.org/index.php/journals/1375</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #74 - Aug 2006 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="http://accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c200/">74</a>
                    (7)
<br />

                                            <a href="http://accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c65/">Programming</a>
                    (488)
<br />

                                            <a href="http://accu.org/index.php/journals/c200-65/">Any of these categories</a>

                    -                        <a href="http://accu.org/index.php/journals/c200+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Fine Tuning for lexical_cast</h1>
<p><strong>Author:</strong>&nbsp;webeditor</p>
<p>
<strong>Date:</strong> 01 August 2006 11:57:00 +01:00 or Tue, 01 August 2006 11:57:00 +01:00</p>
<p><strong>Summary:</strong>&nbsp;Alexander Nasonov takes a look at Boost's lexical_cast and addresses a common user request: &quot;make it go faster&quot;.</p>
<p><strong>Body:</strong>&nbsp;<p>A few weeks ago I created a patch that reduces a size of executable when arrays of different sizes are passed to <tt class="code">lexical_cast</tt>. I sent the patch to Kevlin Henney and to a maintainer of <tt class="code">lexical_cast</tt> Terje Slettebo. To my surprise, nobody is actively maintaining <tt class="code">lexical_cast</tt> anymore. Terje kindly asked me to take a maintainership and I did.
  </p><p>
    To those of you who have never used <tt class="code">boost::lexical_cast</tt> before, this is a function that transforms a source value of one type into another type by converting the source value to a string and reading the result from that string. For example, if the first argument of a program is an integer, it can be obtained with <tt class="code">int value = lexical_cast&lt;int&gt;(argv[1])</tt> inside the <tt class="code">main</tt> function. For more information, refer to [<a href="#Boost">Boost</a>].
  </p><p>
    There were several requests from users and I, as the maintainer, should consider them. The first request that I recalled was to improve the poor performance of the <tt class="code">lexical_cast</tt>. Inspection of the <tt class="code">boost/lexical_cast.hpp</tt>  file showed that an argument of <tt class="code">lexical_cast</tt> is passed to an internal <tt class="code">std::stringstream</tt> object and a return value is constructed by reading from that <tt class="code">std::stringstream</tt> object. It is well known among C++ programmers that a construction of <tt class="code">std::stringstream</tt><i> </i>is slow. This affects performance significantly because the object is created to format one value every time the <tt class="code">lexical_cast</tt><i> </i>function is called.
  </p><p>
    In the general case, not much can be improved but specialized conversions can be optimized. One example is where this can be done is <tt class="code">lexical_cast&lt;char&gt;(boolean_value)</tt><i>, </i>which could be expanded to <tt class="code">'0' + boolean_value</tt><i> </i>if this combination of types was supported.
  </p><p>
    Another example is a conversion from an integral type to <tt class="code">std::string</tt>. It could be as fast as
  </p><pre class="programlisting">
std::string s = itoa(n); // n is int
</pre><p>
    I wish I could add a specialization that handled this case but <tt class="code">itoa</tt> is not a standard function. The <tt class="code">sprintf</tt> function might be an option but it formats differently compared to <tt class="code">std::ostringstream</tt> and it also might be painful to mix C and C++ locales.
  </p><p>
    Sounds like I should implement a function similar to <tt class="code">itoa</tt>. No problem. Though it's always good to study the subject before coding.
  </p><p>
    Some might argue that coding activity should be driven by tests. Well, that's a great approach but even having an excellent test suite doesn't guarantee that your code is bug-free. One goal of code analysis is to explore new test cases. You will easily recognize them during the course of the article.
  </p><p>
    Actually, the <tt class="code">lexical_cast</tt> is shipped with a test (see <tt class="code">libs/conversion/lexical_cast_test.cpp</tt><i> </i>in Boost tree). It will help not to make a silly mistake but is not enough for checking hand coded conversion from an integral type to <tt class="code">std::string</tt><i> </i>because the test relies on correctness of <tt class="code">std::stringstream</tt> code.
  </p><h2>
    Analysis of several itoa implementation
  </h2><p>
    There are several open-source UNIX distributions available ([<a href="#FreeBSD">FreeBSD</a>], [<a href="#NetBSD">NetBSD</a>] and [<a href="#OpenSolaris">OpenSolaris</a>]) that include a code of <tt class="code">itoa</tt>-like functions. I should make it clear that I'm not going to copy code between projects. Even if both projects are open-source, there are might be some incompatibilities between licences that don't let you redistribute mixed code under one licence. I only analyzed algorithms and pitfalls of implementations.
  </p><p>
    Some implementations of <tt class="code">itoa</tt> look similar to this code:
  </p><pre class="programlisting">
char* itoa(int n)
{
    static char buf[12]; // assume 32-bit int
    buf[11] = '\0';
    char *p = buf + 11;
    int negative = n &lt; 0;
    if(negative)
        n = -n;
    do {
        *--p = '0' + n % 10;
        n /= 10;
    } while (n);
    if(negative)
        *--p = '-';
    return p;
}
    </pre><p>
    This code is plainly wrong. More precisely, <tt class="code">-n</tt> has an implementation-defined behaviour. On 2s-complement systems, a negation of <tt class="code">INT_MIN </tt>overflows. On systems where integer overflow doesn't interrupt a program execution, a result of <tt class="code">INT_MIN</tt> negation is <tt class="code">INT_MIN</tt>.
  </p><p>
    This fact has an interesting consequence for the algorithm. The first argument of the <tt class="code">n % 10</tt> expression is negative, therefore, a result of this expression is implementation-defined as per section 5.6 bullet 4 of the standard [1]:
  </p>
		<p><span class="quote">
    If both operands are nonnegative then the remainder is nonnegative; if not, the sign of the remainder is implementation-defined
  </span></p><p>
    On my FreeBSD-powered Pentium-M notebook, this function returns:<tt class="code">&quot;-./,),(-*,(&quot;</tt>. This is definitely not a number!
  </p><p>
    So, I threw this version away and went to the next implementation:
  </p><pre class="programlisting">
char* itoa(int n)
{
    static char buf[12]; // assume 32-bit int
    buf[11] = '\0';
    char *p = buf + 11;
    unsigned int un = n &lt; 0 ? -n : n;
    do {
        *--p = '0' + un % 10;
        un /= 10;
    } while (un);
    if(n &lt; 0)
        *--p = '-';
    return p;
}
    </pre><p>
    It still applies unary minus to <tt class="code">int</tt> type but the result is immediately transformed into <tt class="code">unsigned int</tt>. This makes a big difference compared to the previous code. According to [1] 4.7/2, a conversion of some value <tt class="code">v</tt> to <tt class="code">unsigned</tt> type is the least congruent to <tt class="code">v</tt> modulo <tt class="code">2^N</tt>, where <tt class="code">N</tt> is a number of bits used to represent values of the target type.
  </p><p>
    In other words, <tt class="code">v</tt> belongs to one equivalence class with <tt class="code">v + 2^N, v + 2^N + 2^N, v + 2^N + 2^N + 2^N</tt><i> </i>and so on.  By common convention, values within <tt class="code">[0, 2^N)</tt> are used to represent classes of equivalent values.
  </p><p>
    Negative numbers are outside of this range and therefore, <tt class="code">2^N</tt> should be added to convert them to <tt class="code">unsigned</tt> type<sup><a href="#1">1</a></sup>. For example, <tt class="code">n=INT_MIN</tt> in 2s-complement representation is <tt class="code">-2^(N-1)</tt>. This number is equivalent to <tt class="code">-2^(N-1) + 2^N = 2^(N-1)</tt>. Therefore, <tt class="code">un</tt> is equal to an absolute value of <tt class="code">n</tt>.
  </p><p>
    To get rid of implementation-defined behaviour of the <tt class="code">-n</tt> expression, the line:
  </p><pre class="programlisting">
unsigned int un = n &lt; 0 ? -n : n;
    </pre><p>
    can be replaced with:
  </p><pre class="programlisting">
unsigned int un = n;
if(n &lt; 0)
    un = -un;
    </pre><p>
    It has already been discussed that a conversion in the first line has well-defined behaviour. Negative number <tt class="code">n</tt> is converted to <tt class="code">unsigned</tt> value <tt class="code">n + 2^N</tt>, other values become <tt class="code">unsigned</tt> by trivial conversion.
  </p><p>
    The third line operates on <tt class="code">unsigned</tt> type and therefore, obeys modulo <tt class="code">2^N</tt><i> </i>arithmetic as per 3.9.1/4 of [1]. This means that a value of <tt class="code">un</tt> after assignment is equal to <tt class="code">-(n + 2^N) = -(n + 2^N) + 2^N = -n</tt>. Since <tt class="code">n</tt> is negative, <tt class="code">-n</tt> is positive and we can conclude that <tt class="code">un</tt> is equal to an absolute value of <tt class="code">n</tt>.
  </p><p>
    A reader might wonder why we don't just write <tt class="code">unsigned int un = abs(n)</tt>? The answer is the same. When an absolute value of <tt class="code">n</tt> cannot be represented by <tt class="code">int</tt>, a result of <tt class="code">abs</tt> function is implementation-defined.
  </p><p>
    Let's move on to the next problems. Illustrated functions define <tt class="code">static char buf[12]</tt>.
  </p><p>
    First problem is that it is assumed that <tt class="code">int</tt>'s representation is not bigger than 32 bits. A size of <tt class="code">buf</tt> should be correctly calculated for all possible representations:
  </p><pre class="programlisting">
static char buf[3 +  std::numeric_limits&lt;int&gt;::digits10];
    </pre><p>
    The number 3 is a sum of 1 byte for minus sign, 1 byte for trailing <tt class="code">'\0'</tt> and 1 byte for a digit not counted by <tt class="code">digits10</tt> constant.
  </p><p>
    A more generic formula for arbitrary integral type <tt class="code">T</tt> is:
  </p><pre class="programlisting">
2 + std::numeric_limits&lt;T&gt;::is_signed + std::numeric_limits&lt;T&gt;::digits10
    </pre><p>
    The second problem with <tt class="code">buf</tt> is that it is shared between invocations of this function. You cannot call <tt class="code">itoa</tt> from two threads simultaneously because both calls would try to write at the same location - to the <tt class="code">buf</tt><i> </i>array.
  </p><p>
    There are several solutions. One can pass a pointer to a buffer of sufficient length to <tt class="code">itoa</tt>:
  </p><pre class="programlisting">
void itoa(int n, char* buf);
    </pre><p>
    It introduces one inconvenience, though. Original <tt class="code">itoa</tt> is very handy for code like <tt class="code">s += itoa(n)</tt>. This modification would definitely break it because prior to making a call, a buffer variable should be defined. For the very same reason calling <tt class="code">lexical_cast</tt> in-place is better then defining <tt class="code">std::ostringstream</tt> variable and performing formatting.
  </p><p>
    The idea is to return an array from <tt class="code">itoa</tt>. Builtin C/C++ array cannot be copied, so it should be wrapped into <tt class="code">struct</tt>:
  </p><pre class="programlisting">
struct itoa_result
{
    char elems[12]; // assume 32-bit int
};
itoa_result itoa(int n);
    </pre><p>
    The technique of wrapping an array into a struct can be generalized in C++. We don't have to do this though because <tt class="code">boost::array</tt><sup><a href="#2">2</a></sup> is (quoting Boost documentation) <i>STL compliant container wrapper for arrays of constant size</i>:
  </p><pre class="programlisting">
boost::array&lt;char, 3 +
    std::numeric_limits&lt;int&gt;::digits10&gt; itoa(int n);
    </pre><p>
    A typical call of this function would look like this:
  </p><pre class="programlisting">
s += itoa(n).elems;
    </pre><p>
    or, if you'd like to save a result in a variable that has a wider scope than that of the temporary return value of <tt class="code">itoa</tt>, take this route:
  </p><pre class="programlisting">
char buf[sizeof itoa(n).elems];
std::strcpy(buf, itoa(n).elems);
    </pre><p>
    It's interesting to note how <tt class="code">buf</tt> is defined in the last example. There is no magic formula to calculate a capacity to hold the <tt class="code">int</tt> value, just the <tt class="code">sizeof</tt> applied to an array returned by <tt class="code">itoa</tt>. It is much easier to remember this trick then to recall the formula.
  </p><p>
    Unless you count every byte in <tt class="code">buf</tt>, an access to <tt class="code">elems</tt> in <tt class="code">sizeof</tt> expression can be omitted:
  </p><pre class="programlisting">
char buf[sizeof itoa(n)];
    </pre><p>
    This might increase the size of <tt class="code">buf</tt> slightly but this is not dangerous, while it saves typing 6 characters.
  </p><p>
    Other improvements that can be applied to <tt class="code">itoa</tt> are consistency with C++ streams and support for  other integral types.
  </p><p>
    The first improvement is printing digits in groups delimited by a thousands separator if the current locale has grouping. It is worth noting that this increases the size of the buffer. Since the thousands separator is a <tt class="code">char</tt>, this change doesn't add more then  <tt class="code">std::numeric_limits&lt;int&gt; ::digits10</tt><i> </i>characters to the buffer<i>.</i></p><p>
    Integral types other then <tt class="code">int</tt> are not different from an implementation point of view. Although it's tempting to turn <tt class="code">itoa</tt> into a function template, there is a good reason to leave it and add overloads for other integral types<i> </i>instead. The fact is that some types cannot be used in templates. Such things as <tt class="code">bit</tt> fields and values of unnamed or local enumeration types are incompatible with a template machinery.
  </p><p>
    That's it. Definitely too much for the patch but we created a good ground for another library.
  </p><h2>
    Tuning for lexical_cast
  </h2><p>
    As shown in a previous section, string representations of some source types have a limited length that can be calculated at compile-time. For every such type, we define a specialization of the <tt class="code">lexical_cast_src_length</tt><i> </i>metafunction that returns that limit. For example:
  </p><pre class="programlisting">
template&lt;&gt; struct lexical_cast_src_length&lt;bool&gt;    
: boost::mpl::size_t&lt;1&gt; {};
 template&lt;&gt; struct lexical_cast_src_length&lt;int&gt;
     : boost::mpl::size_t&lt;2 +
         std::numeric_limits&lt;int&gt;::digits10&gt; {};
    </pre><p>
    A source value can be placed in a local buffer <tt class="code">buf</tt> of appropriate length without the overhead of <tt class="code">std::stringstream</tt> class, e.g. a lexical conversion of <tt class="code">bool</tt><i> </i>value could be a simple <tt class="code">buf[0] = '0' + value</tt><i> </i>expression, likewise <tt class="code">itoa</tt> could format an <tt class="code">int</tt> value and so on.
  </p><p>
    Fine, the output phase is completed without calling to C++ iostream facilities. Next is the input phase. Unlike the output phase, where a set of source types is limited to those that have a specialization of <tt class="code">lexical_cast_src_length</tt>, a target type can be any <tt class="code">InputStreamable</tt><sup><a href="#3">3</a></sup> type.  This means that we have to use C++ streams. The implementation is straightforward. First of all, a <tt class="code">streambuf</tt> object is constructed, then its <tt class="code">get</tt> area (<tt class="code">setg</tt> member function) is set to point to a location of formatted source value as shown in Listing 1.
  </p>
<table class="sidebartable"><tr><td><pre class="programlisting">
struct lexical_streambuf : std::streambuf
{
    using std::basic_streambuf&lt;CharT&gt;::setg;
};

// Adapted version of lexical_cast
template&lt;class Target, class Source&gt;
Target lexical_cast(Source const&amp; arg)
{

  // Pretend that Source has a limited string
  // representation
  char buf[ lexical_cast_src_length&lt;Source&gt;::value ];
  char* start = buf;

  // Output phase ...
  // char* finish points to past-the-end of
  // formatted value

  // Input phase
  lexical_streambuf sb;
  std::istream in(&amp;sb);
  sb.setg(start, start, finish);
  Target result;
  if(!( in &gt;&gt; result) || in.get() !=
        std::char_traits&lt;char&gt;::eof())
    throw bad_lexical_cast(typeid(Source),
                           typeid(Target));
  return result;
}
</pre></td></tr><tr><td class="title">Listing 1
</td></tr></table>
  <p>
    To solve a performance degradation of the generic solution, specialized versions can be added. For example, if <tt class="code">Target</tt> is <tt class="code">std::string</tt>, the input phase can be implemented in a single line of code:
  </p><pre class="programlisting">
return std::string(start, finish);
    </pre><h2>
    Testing
  </h2><p>
    It's great that the library already has a test. We can safely assume that combinations of <tt class="code">Source</tt> and <tt class="code">Target</tt><i> </i>types that are dispatched to old implementations are tested thoroughly. Some combinations of types that trigger an execution of the new code are tested as well but we shouldn't rely on it.
  </p><p>
    The plan is to write down a list of <tt class="code">Source</tt> and <tt class="code">Target</tt><i> </i>types that have an optimized implementation. Then all combinations should be tested one by one. Implementation details should be taken into account so that there are no surprises like <tt class="code">INT_MIN</tt> being formatted incorrectly or weird behavior for some locales.
  </p><p>
    Completeness of the test can be verified by running the test under control of a coverage tool. If some line is not hit, it's a good hint for a new testcase. There is one small problem with specializations of  <tt class="code">lexical_cast_src_length</tt> metafunction, though. They don't have executable code at all. How do you check that some particular specialization is in use? That's easy, just define an empty  <tt class="code">check_coverage</tt> function in each specialization and call it from a point of use of the metafunction.
  </p><h2>
    Benchmark
  </h2><p>
    A conversion from <tt class="code">int</tt> digit to <tt class="code">char</tt> has been measured on FreeBSD 6.1. The test has been compiled with gcc 3.4.4 with -O2 optimization flag turned on.
  </p><pre class="programlisting">
#include &lt;boost/lexical_cast.hpp&gt;
int main()
{
  int volatile result = 0;
  for(int count = 1000000; count &gt; 0; --count)
    result += boost::lexical_cast&lt;char&gt;(count % 10);
  return result;
}
    </pre><p>
    The next table shows execution times of the test compiled with Boost 1.33.1 and with the patched <tt class="code">lexical_cast</tt>. The last column is the performance influence of not constructing <tt class="code">std::locale</tt><i> </i>and <tt class="code">std::numpunct&lt;char&gt; </tt>objects and not calling <tt class="code">std::numpunct&lt;char&gt;::grouping</tt> function. This change is not included into the patch because it is not consistent with the standard output operator for <tt class="code">int</tt>.

    <table class="journaltable" border="1" cellpadding="5" cellspacing="0">
				<tr><td>  Boost 1.33.1 </td><td>  Patch </td><td>  Ratio </td><td>  Patch that ignores locales </td>
				</tr>
				<tr><td>  2.012 sec </td><td>  0.322 sec </td><td>  6.24 </td><td>  0.083 sec </td>
				</tr>
			</table>
		</p><p>
    More common is a conversion to <tt class="code">std::string</tt><i>.</i> To measure performance of this conversion, the <tt class="code">for</tt> body in the test program has been replaced with
  </p><pre class="programlisting">
result +=
  boost::lexical_cast&lt;std::string&gt;(count).size();
    </pre><p>
    The results are:

    <table class="journaltable" border="1" cellpadding="5" cellspacing="0">
				<tr><td>  Boost 1.33.1 </td><td>  Patch </td><td>  Ratio </td><td>  Patch that ignores locales </td>
				</tr>
				<tr><td>  2.516 sec </td><td>  0.844 sec </td><td>  2.98 </td><td>  0.626 sec </td>
				</tr>
			</table>
		</p><h2>
    Conclusion
  </h2><p>
    After careful tuning, the <tt class="code">lexical_cast</tt> can run several times faster. I hope that future versions of Boost will have an improved version of . It will likely happen in version 1.35.</p><h2>
    References
  </h2><p class="bibliomixed">
    [<a name="Abrahams"></a>Abrahams] C++ Template Metaprogramming by David Abrahams and Aleksey Gurtovoy, ISBN 0-321-22725-5
  </p><p class="bibliomixed">
    [<a name="Boost"></a>Boost] The Boost Library, http://www.boost.org
  </p><p class="bibliomixed">
    [<a name="FreeBSD"></a>FreeBSD] The FreeBSD Project, http://www.freebsd.org
  </p><p class="bibliomixed">
    [<a name="NetBSD"></a>NetBSD] The NetBSD Project, http://www.netbsd.org
  </p><p class="bibliomixed">
    [<a name="OpenSolaris"></a>OpenSolaris] The OpenSolaris Project, http://www.opensolaris.org
  </p><p class="bibliomixed">
    [<a name="Standard"></a>Standard] ISO/IEC 14882, Programming Languages - C++, Second Edition 2003-10-15
  </p>


    <p class="footnote">
    <a name="1"></a>1. This statement is not correct if the source type has more bits then N. For such conversions, <tt class="code">2^N</tt> should be added repeatedly until the result is within <tt class="code">[0, 2^N)</tt> range.
  </p><p class="footnote">
    <a name="2"></a>2. The class is under discussion for inclusion into the next version of the standard; see N1836, Draft Technical Report on C++ Library Extensions.
  </p>
    <p class="footnote"><a name="3"></a>3. A more correct list of requirements is <tt class="code">InputStreamable</tt>, <tt class="code">CopyConstructible</tt> and <tt class="code">DefaultConstructible</tt></p>

</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
