    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: ACCU Mentored Developers XML Project</title>
        <link>http://accu.org/index.php/journals/238</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #62 - Aug 2004 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="http://accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c150/">62</a>
                    (8)
<br />

                                            <a href="http://accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="http://accu.org/index.php/journals/c65/">Programming</a>
                    (488)
<br />

                                            <a href="http://accu.org/index.php/journals/c150-65/">Any of these categories</a>

                    -                        <a href="http://accu.org/index.php/journals/c150+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;ACCU Mentored Developers XML Project</h1>
<p><strong>Author:</strong>&nbsp;Administrator</p>
<p>
<strong>Date:</strong> 01 August 2004 22:52:11 +01:00 or Sun, 01 August 2004 22:52:11 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e27" id="d0e27"></a></h2>
</div>
<p>This article was originally written in December 2002 as part of
the ACCU Mentored Developers [<a href=
"#MDevelopers">MDevelopers</a>] XML [<a href="#XMLRec">XMLRec</a>]
project. It has now been revised, with considerable help from Jez
Higgins, for publication in Overload.</p>
<p>The first exercise set for the project students by the project
mentors was as follows:</p>
<p>Incorporate either the Xerces[<a href="#Xerces">Xerces</a>] or
Microsoft XML[<a href="#MSXML">MSXML</a>] parsers into a C++
project and use it to:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Parse XML strings and files.</p>
</li>
<li>
<p>Output the element structure as an indented tree.</p>
</li>
</ol>
</div>
<p>As most of my development experience has been on Windows I
followed the MSXML route.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e56" id="d0e56"></a>Downloading and
Installing MSXML</h2>
</div>
<p>The MSXML parser can be downloaded from the Microsoft website.
The latest version at the time of writing is version 4.0 and
requires the latest Windows installer, which is incorporated into
Windows XP and comes with Windows service pack 3. The installer can
also be downloaded as single executable [<a href=
"#InstMsi">InstMsi</a>].</p>
<p>Assuming the latest Windows Installer is present on your system
installing MSXML is simply a case of running the installer package.
As MSXML is Component Object Model (COM) based this will register
the MSXML dynamic link library (<tt class=
"filename">msxml4.dll</tt>). The installer also creates a directory
with all necessary files needed to use the parser in a C++
project.</p>
<div class="sidebar">
<p class="title c3">An XML Mini-Glossary</p>
<div class="variablelist">
<dl>
<dt><span class="term">Attributes</span></dt>
<dd>
<p>XML elements can have attributes. An attribute is a name-value
pair attach to the element's start tag. Names are separated from
their values by an equals sign, and values are enclosed in single
or double quotes. Attribute order is not significant.</p>
<pre class="literallayout">
&lt;bigbrain invented=&quot;SGML&quot;&gt;Charles Goldfarb&lt;/bigbrain&gt;
</pre></dd>
<dt><span class="term">DOM</span></dt>
<dd>
<p>The Document Object Model is a W3C recommendation which an
application programming interface well-formed XML documents
[<a href="#DOMRec">DOMRec</a>], defining the logical structure of
documents and the way a document is accessed and manipulated. The
DOM is defined in programming-language neutral terms. This leads to
some slightly clumsy looking code, but that aside the DOM is widely
used (if not necessarily wellloved). Its in-memory representation
makes it well suited to document editing, navigation and data
retrieval applications.</p>
</dd>
<dt><span class="term">DTD</span></dt>
<dd>
<p>Document Type Definition, the original XML schema language
described in the XML recommendation. A Document Type Definition
defines the legal building blocks of an XML document. It defines
the document structure with a list of legal elements, each
element's allowed content and so on.</p>
</dd>
<dt><span class="term">Elements &amp; Tags</span></dt>
<dd>
<p>Here's a tiny XML document</p>
<pre class="literallayout">
&lt;bigbrain&gt;Charles Golbfarb&lt;/bigbrain&gt;
</pre>
<p>It consists of a single element named bigbrain and the element's
content, the text string <tt class="literal">Charles Goldfarb</tt>.
The element is delimited by the start tag <tt class=
"literal">&lt;bigbrain&gt;</tt> and the end tag <tt class=
"literal">&lt;/bigbrain&gt;</tt>.</p>
</dd>
<dt><span class="term">Valid</span></dt>
<dd>
<p>Documents which conform to a particular XML application are said
to be <span class="emphasis"><em>valid</em></span>. In the early
days of XML (all of five years ago) validity meant conforming to a
DTD. With the development and widespread adoption of other schema
languages, valid has come to mean <span class="emphasis"><em>valid
to whatever schema you happen to be using</em></span>.</p>
</dd>
<dt><span class="term">Well-formed</span></dt>
<dd>
<p>Not all, quite probably most, XML documents are not valid, nor
do they need to be. However they are all <span class=
"emphasis"><em>well-formed</em></span>. An XML document is
well-formed if it satisfies the basic XML grammar - the elements
are properly delimited, start and end tags match and so on. A
document which is not wellformed is like a C++ program with a
missing semi-colon, no good for anything.</p>
</dd>
<dt><span class="term">XML Application</span></dt>
<dd>
<p>A set of XML elements and attributes for a particular purpose -
for instance DocBook, SVG, WSDL, Open Office file format - is
called an <span class="emphasis"><em>XML application</em></span>.
An XML application is often expressed in one of the many available
schema languages - DTD, XML Schema, RelaxNG, Schematron, etc. An
XML application is <span class="emphasis"><em>not</em></span> an
application which uses XML.</p>
</dd>
</dl>
</div>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e150" id="d0e150"></a>Testing
MSXML</h2>
</div>
<p>Although there are the usual Microsoft help files incorporated
with MSXML there aren't any examples, so I used Google to try and
find some and found the PerfectXML[<a href=
"#PerfectXML">PerfectXML</a>] website. The website includes a
number of MSXML C++ examples and one in particular, Using DOM
[<a href="#UsingDOM">UsingDOM</a>], that downloads an XML file from
an Internet location, parses it, modifies it and writes it to the
local hard disk. I used this example as a template for the
following simple MSXML console application test program:</p>
<pre class="programlisting">
#include &lt;iostream&gt;
#include &lt;string&gt;
#include &lt;windows.h&gt;
#include &lt;atlbase.h&gt;
#import &quot;msxml4.dll&quot;
int main() {
  std::cout &lt;&lt; &quot;MSXML DOM: Simple Test 1: Creating&quot;
     &lt;&lt; &quot; of COM object and parsing of XML.\n\n&quot;;
  ::CoInitialize(0);
  {
    MSXML2::IXMLDOMDocument2Ptr pXMLDoc = 0;
    // Create MSXML DOM object
    HRESULT hr = pXMLDoc.CreateInstance(
                   &quot;Msxml2.DOMDocument.4.0&quot;);
    if (SUCCEEDED(hr)) {
      // Load the document synchronously
      pXMLDoc-&gt;async = false;
      _variant_t varLoadResult((bool)false);
      const std::string xmlFile(&quot;poem.xml&quot;);
      // Load the XML document
      varLoadResult = pXMLDoc-&gt;load(xmlFile.c_str());
      if(varLoadResult) {
        std::cout &lt;&lt; &quot;Successfully loaded XML file: &quot;
                  &lt;&lt; &quot; file: &quot; &lt;&lt; xmlFile &lt;&lt; &quot;\n&quot;;
      }
      else {
        std::cout &lt;&lt; &quot;Failed to load XML file: &quot; 
                  &lt;&lt; xmlFile &lt;&lt; &quot;\n&quot;;
        // Get parseError interface
        MSXML2::IXMLDOMParseErrorPtr pError = 0;
        if(SUCCEEDED(pXMLDoc-&gt;get_parseError(
                                      &amp;pError))) {
          USES_CONVERSION;
          std::cout &lt;&lt; &quot;Error: &quot; 
                    &lt;&lt; W2A(pError-&gt;reason) &lt;&lt; &quot;\n&quot;;
        }
      }
    }
    else {
      std::cout &lt;&lt; &quot;Failed to create MS XML COM &quot;
                &lt;&lt; &quot;object.\n&quot;;
    }
  }
  ::CoUninitialize();
  return 0;
}
</pre>
<p>This program takes the following XML file and parses it:</p>
<pre class="literallayout">
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;poem&gt;
  &lt;line&gt;Roses are red,&lt;/line&gt; 
  &lt;line&gt;Violets are blue.&lt;/line&gt; 
  &lt;line&gt;Sugar is sweet,&lt;/line&gt; 
  &lt;line&gt;and I love you&lt;/line&gt;
&lt;/poem&gt;
</pre>
<p>If the parse fails an error message is written to <tt class=
"classname">std::cout</tt> giving the reason. Although this code
snippet does the intended job, it is a bit rough and needs some
work in order to achieve the objective of this exercise. Among
other things it would benefit from wrapping of MSXML and some
proper exception handling.</p>
<p>It is worth noting <tt class="literal">#import</tt> is specific
to Microsoft Visual C++ and is not supported by other Win32
compilers.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e177" id="d0e177"></a>Engineering
the Exercise Solution: Part 1</h2>
</div>
<p>I'm going to look at the exercise solution in two parts. The
first part will reengineer the PerfectXML example into a more
general solution with a clean interface, proper runtime handling
and exception handling. The second part will look at writing the
element structure to a stream.</p>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e182" id="d0e182"></a>COM Runtime</h3>
</div>
<p>As MSXML is COM based, the COM runtime must be started before
any COM objects can be instantiated. The COM runtime is started by
the <tt class="function">CoInitializeEx</tt> API function and
stopped with <tt class="function">CoUninitialize</tt>. MSDN states
that every call to <tt class="function">CoInitializeEx</tt> must be
matched by a call to <tt class="function">CoUninitialize</tt>, even
if <tt class="function">CoInitializeEx</tt> fails.</p>
<p><tt class="function">CoUninitialize</tt> must not be called
until all COM objects have been released. For instance in the
example above there is an extra scope wrapping the MSXML code so
that the <tt class="classname">IXMLDOMDocument2Ptr</tt> smart
pointer destructor is called, destroying the DOM, before <tt class=
"function">CoUninitialize</tt> is called.</p>
<p>The easiest way to achieve this, even in the presence of
exceptions, is to take advantage of C++'s RAII (Resource
Acquisition Is Initialization) and place <tt class=
"function">CoInitialiseEx</tt> in the constructor of a class and
<tt class="function">CoUninitialize</tt> in the destructor and to
create an instance of the class on the stack, at the beginning of
the program before anything else. <tt class=
"classname">COMRuntimeInit</tt>, shown below, is just such a class.
The copy constructor and copy-assignment operator are both private
and undefined, to prevent copying. A <tt class=
"classname">COMRuntimeInit</tt> object has no state and therefore
it does not make sense to copy it. This method of preventing
copying and some more of the reasons behind it are discussed by
Scott Meyers in Effective C++[<a href="#ECpp">ECpp</a>].</p>
<pre class="programlisting">
#include &lt;stdexcept&gt;
#include &lt;string&gt;
#include &lt;windows.h&gt;
class COMRuntimeInit {
public:
  COMRuntimeInit() {
    HRESULT hr = ::CoInitializeEx(0,
                         COINIT_APARTMENTTHREADED);
    if(FAILED(hr)) {
      UnInitialize();
      std::string errorMsg = &quot;Failed to start COM &quot;
                             &quot;Runtime: &quot;;
      switch(hr) {
        case E_INVALIDARG:
          errorMsg += &quot;An invalid parameter was &quot;
                      &quot;passed to the returning &quot;
                      &quot;function.&quot;;
          break;
        case E_OUTOFMEMORY:
          errorMsg += &quot;Out of memory.&quot;;
          break;
        case E_UNEXPECTED:
          errorMsg += &quot;Unexpected error.&quot;;
          break;
        case S_FALSE:
          errorMsg += &quot;The COM library is already &quot;
                      &quot;initialized on this &quot;
                      &quot;thread.&quot;;
          break;
        default:
          errorMsg += &quot;Unknown.&quot;;
          break;
      }
      throw std::runtime_error(errorMsg);
    }
  }
  ~COMRuntimeInit() {
    UnInitialize();
  }
private:
  void UnInitialize() const {
    ::CoUninitialize();
  }
  COMRuntimeInit(const COMRuntimeInit&amp;);
  COMRuntimeInit&amp; operator=(const COMRuntimeInit&amp;);
};
</pre>
<p>There are of course times when the initial call to <tt class=
"function">CoInitialiseEx</tt> may fail. The cause of the failure
can be ascertained from its return value. The obvious way to
communicate the cause of the failure to the user is via an
exception. This has the drawback that the destructor will not be
called when the constructor throws and therefore <tt class=
"function">CoUninitialize</tt> must be called manually. For now
<tt class="exceptionname">std::runtime_error</tt> will be thrown
when <tt class="function">CoInitializeEx</tt> fails, later on we'll
look at a custom exception type.</p>
<p>As stated above, the <tt class="classname">COMRuntimeInit</tt>
instance must be declared before any other object on the stack. The
instance cannot be put at file scope as it throws an exception if
it fails, so the obvious place is at the top of main's scope. A
<tt class="literal">try</tt>/<tt class="literal">catch</tt> block
is also needed to detect the failure.</p>
<pre class="programlisting">
#include &lt;iostream&gt;
#include &quot;comruntimeinit.h&quot;
int main() {
  try {
    COMRuntimeInit comRuntime;
  }
  catch( const std::runtime_error&amp; e) {
    std::cout &lt;&lt; e.what() &lt;&lt; &quot;\n&quot;;
  }
  return 0;
}
</pre></div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e258" id="d0e258"></a>Instantiating
the MSXML DOM</h3>
</div>
<p>Code that uses COM, as with most Microsoft API code, is just
plain ugly and really should be hidden behind an interface.
Exercise 1 of the XML project states that either the Xerces parser
or the MSXML parser can be used. Ideally they should be easily
interchangeable and their use completely hidden from the user.
Hiding the ugly code <span class="emphasis"><em>and</em></span>
making the parsers easily interchangeable can be achieved with the
Pimpl Idiom, as discussed by Herb Sutter in Exceptional C++
[<a href="#ExCpp">ExCpp</a>].</p>
<p>The first stage in the exercise is to create the MSXML DOM
parser. This is achieved with the DOM class:</p>
<pre class="programlisting">
// dom.h
// Forward declaration so that implementation 
// can be completely hidden.
class DOMImpl;
class DOM {
private:
  DOMImpl *impl_;
public:
  DOM();
  ~DOM();
private:
  DOM(const DOM&amp;);
  DOM&amp; operator=(const DOM&amp;);
};
</pre>
<p>The DOM class will form a basic wrapper for the <tt class=
"classname">DOMImpl</tt> class which will do all the work.
<tt class="classname">DOMImpl</tt> is forward declared, so that its
implementation can be completely hidden.</p>
<p>The DOM class implementation is shown below. It creates an
instance of the <tt class="classname">DOMImplclass</tt> on the heap
in the constructor and deletes it in the destructor.</p>
<pre class="programlisting">
// dom.cpp
#include &quot;dom.h&quot;
#include &quot;domimpl.h&quot;
DOM::DOM() : impl_(new DOMImpl) {}
DOM::~DOM() { delete impl_; }
</pre>
<p><tt class="classname">DOMImpl</tt> creates the MSXML DOM parser
in the same way as the PerfectXML example:</p>
<pre class="programlisting">
// domimpl.h
#import &quot;msxml4.dll&quot;
class DOMImpl {
private:
  MSXML2::IXMLDOMDocument2Ptr xmlDoc_;
public:
  DOMImpl() : xmlDoc_(0) {
    xmlDoc_.CreateInstance(
                    &quot;Msxml2.DOMDocument.4.0&quot;);
  }
private:
  DOMImpl(const DOMImpl&amp;);
  DOMImpl&amp; operator=(const DOMImpl&amp;);
};
</pre>
<p>Both DOM and <tt class="classname">DOMImpl</tt> have private
copy constructors and copy assignment operators, again to prevent
copying.</p>
<p>The above code does not include any error checking. It is
possible for the call to <tt class="methodname">CreateInstance</tt>
to fail. The <tt class="filename">msxml4.dll</tt> may not be
registered, for example. The success or failure of the <tt class=
"methodname">CreateInstance</tt> call can be detected by its return
value.</p>
<pre class="programlisting">
DOMImpl() : xmlDoc_(0) {
  HRESULT hr = xmlDoc_.CreateInstance(
                   &quot;Msxml2.DOMDocument.4.0&quot;);
  if(FAILED(hr)) {
    std::string errorMsg = &quot;Failed to start &quot;
                           &quot;create MSXML &quot;
                           &quot;DOM: &quot;;
    switch(hr) {
      case CO_E_NOTINITIALIZED:
        errorMsg += &quot;CoInitialize has not &quot;
                    &quot;been called.&quot;;
        break;
      case CO_E_CLASSSTRING:
        errorMsg += &quot;Invalid class string.&quot;;
        break;
      case REGDB_E_CLASSNOTREG:
        errorMsg += &quot;A specified class is &quot;
                    &quot;not registered.&quot;
        break;
      case CLASS_E_NOAGGREGATION:
        errorMsg += &quot;This class cannot be &quot;
                    &quot;created as part of an &quot;
                    &quot;aggregate.&quot;;
        break;
      case E_NOINTERFACE:
        errorMsg += &quot;The specified class &quot;
                    &quot;does not implement the &quot;
                    &quot;requested interface&quot;;
        break;
      default:
        errorMsg += &quot;Unknown error.&quot;;
        break;
    }
    throw std::runtime_error(errorMsg );
  }
}
</pre></div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e312" id="d0e312"></a>NonCopyable</h3>
</div>
<p>We now have three classes which are &quot;copy prevented&quot;, with a
private copy constructor and copy assignment operator. There is a
clearer way to document the fact that a class is not intended to be
copied. When used by a number of different classes it also reduces
the amount of code.</p>
<p>The <tt class="classname">NonCopyable</tt> class, show below,
has a private copy constructor and assignment operator to prevent
prevent copying. When another class inherits from <tt class=
"classname">NonCopyable</tt>, the private copy constructor and
assignment operator are also inherited. This both prevents the
subclass from being copied and documents the intention. The
relationship between <tt class="classname">NonCopyable</tt> and its
subclass is not IS-A and therefore the inheritance can be
private.</p>
<p>As <tt class="classname">NonCopyable</tt> is intended only to
provide behaviour to a derived class, rather than act as a class in
its own right, its default constructor is protected, preventing a
free <tt class="classname">NonCopyable</tt> object being created.
Its destructor too, is protected to prevent a subclass being
deleted via a pointer to <tt class="classname">NonCopyable</tt>. To
further document this intention, the destructor is not virtual.</p>
<pre class="programlisting">
class NonCopyable {
protected:
  NonCopyable() {}
  ~NonCopyable() {}
private:
  NonCopyable(const NonCopyable&amp;);
  NonCopyable&amp; operator=(const NonCopyable&amp;);
};
</pre>
<p>The <tt class="classname">NonCopyable</tt> class was written by
Dave Abrahams for the boost [<a href="#boost">boost</a>] library. I
have recreated it here so that a dependency on the boost library is
avoided.</p>
<p>Now that the <tt class="classname">NonCopyable</tt> class is in
place the copy constructors and assignment operators can be removed
from <tt class="classname">COMRuntimeInit</tt>, <tt class=
"classname">DOM</tt> and <tt class="classname">DOMImpl</tt>. They
can then be changed to privately inherit from <tt class=
"classname">NonCopyable</tt>.</p>
<pre class="programlisting">
class COMRuntimeInit : private NonCopyable {
  ...
};

class DOM : private NonCopyable {
  ...
};

class DOMImpl : private NonCopyable {
  ...
};
</pre></div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e368" id="d0e368"></a>Loading and
Validating the XML</h3>
</div>
<p>The MSXML DOM has a method that loads and parses an XML file.
While parsing the file it is checked to make sure it is well formed
and if there is a DTD or Schema specified it is also validated. If
the file cannot be opened, is not well formed or cannot be
validated the call fails.</p>
<p>The method is called load and takes a single parameter which is
the full path to the XML file. To load and parse an XML file, a
similar method can be added to DOMImpl and a corresponding
forwarding function added to DOM.</p>
<pre class="programlisting">
class DOMImpl : private NonCopyable {
public:
  ...
  void Load(const std::string&amp; fullPath) {
    xmlDoc_-&gt;load(fullPath.c_str());
  }
};
</pre>
<p><tt class="function">main</tt> can then be modified to call the
new function with the path to an XML file.</p>
<pre class="programlisting">
try {
  COMRuntimeInit comRuntime;
  DOM dom;
  dom.Load(&quot;poem.xml&quot;);
}
catch(const std::runtime_error&amp; e) {
  std::cout &lt;&lt; e.what() &lt;&lt; &quot;\n&quot;;
}
</pre>
<p>Once again there is no way of detecting failure and the return
value of the MSXML DOM <tt class="methodname">load</tt> method must
be tested to find out if it failed. If a failure has occurred an
exception should be thrown.</p>
<pre class="programlisting">
void Load(const std::string&amp; fullPath) {
  if(!xmlDoc_-&gt;load( fullPath.c_str())) {
    throw std::runtime_error(ErrorMessage());
  }
}
</pre>
<p>The method of extracting an error message from an MSXML DOM is a
little fiddly, so I have placed it in its own function, <tt class=
"function">ErrorMessage</tt>.</p>
<pre class="programlisting">
class DOMImpl : private NonCopyable {
public:
  ...
  std::string ErrorMessage() const {
    std::string result = &quot;Failed to extract &quot;
                         &quot;error.&quot;;
    MSXML2::IXMLDOMParseErrorPtr pError =
                         xmlDoc_-&gt;parseError;
    if(pError-&gt;reason.length()) {
      result = pError-&gt;reason;
    }
    return result;
  }
};
</pre>
<p>A parse error is extracted from an MSXML DOM as an <tt class=
"classname">XMLDOMParserError</tt> object. The error description is
fetched from the reason property. If no description is available,
the <tt class="type">bstr_t</tt> returned by reason has a length of
0. <tt class="type">bstr_t</tt> is a wrapper class for COM's native
<tt class="type">unsigned short*</tt> string type. It provides a
conversion to <tt class="type">const char*</tt>, and thus can be
assigned to a <tt class="classname">std::string</tt>.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e417" id="d0e417"></a>Custom Exception
Types</h3>
</div>
<p>Our <tt class="function">main</tt> function's body is</p>
<pre class="programlisting">
try {
  COMRuntimeInit comRuntime;
  DOM dom;
  dom.Load(&quot;poem.xml&quot;);
}
catch(const std::runtime_error&amp; e) {
  std::cout &lt;&lt; e.what() &lt;&lt; &quot;\n&quot;;
}
</pre>
<p>Currently the example throws a <tt class=
"exceptionname">std::runtime_error</tt> if the COM runtime fails to
initialise or if there is an XML failure. In both cases the error
message is prefixed with a description of the type of error.
Exceptions thrown as a result of the COM runtime failing to
initialise are probably fatal and it may be appropriate for the
program to exit, while for exceptions thrown due to an XML parse
fail it might be more appropriate to log the error and move on to
the next file.</p>
<p>These different categories of error would be better communicated
by the exception's actual type and it is easy to add custom
exceptions. Throwing different types of exceptions helps to
maintain the context in which the exception was thrown and enables
the behaviour of a program to change based on the type of exception
that is thrown.</p>
<p>Deriving from <tt class="exceptionname">std::exception</tt> not
only means that custom exception types can be caught along with
other standard exception types in a single catch statement if
necessary, but also provides an implementation for the custom
exception object.</p>
<pre class="programlisting">
class BadCOMRuntime : public std::exception {
public:
  BadCOMRuntime(const std::string&amp; msg)
        : exception(msg.c_str()) {}
};
</pre>
<p><tt class="exceptionname">std::exception</tt>'s constructor
takes a <tt class="type">char*</tt>, but I know that I will be
building exception messages with strings and following the model of
<tt class="exceptionname">std::runtime_error</tt>, <tt class=
"exceptionname">BadCOMRuntime</tt>'s constructor takes a <tt class=
"classname">std::string</tt>.</p>
<p><tt class="classname">COMRuntimeInit</tt>'s constructor must be
modified for the new exception:</p>
<pre class="programlisting">
COMRuntimeInit() {
  HRESULT hr = ::CoInitialize(0);
  if(FAILED(hr)) {
    UnInitialize();
    std::string errorMsg = &quot;Unknown.&quot;;
    switch(hr) {
      case E_INVALIDARG:
        errorMsg = &quot;An invalid parameter was &quot;
                   &quot;passed to the returning &quot;
                   &quot;function.&quot;;
        break;
      ...
      default:
        break;
    }
    throw BadCOMRuntime(errorMsg);
  }
}
</pre>
<p>and <tt class="function">main</tt> must be modified to catch the
new exception:</p>
<pre class="programlisting">
try {
  COMRuntimeInit comRuntime;
  DOM dom;
  dom.Load(&quot;poem.xml&quot;);
}
catch(const BadCOMRuntime&amp; e) {
  std::cout &lt;&lt; &quot;COM initialisation error: &quot;
            &lt;&lt; e.what()
            &lt;&lt; &quot;\n&quot;;
}
...
</pre>
<p>The exceptions thrown by <tt class="classname">DOMImpl</tt> are
a little more complicated. <tt class="classname">DOMImpl</tt>
throws exceptions when two different things happen and therefore
requires two different exception types, which should be in some way
related. One way to solve this is to have a common exception type
for <tt class="classname">DOMImpl</tt> from which two other
exception types derive.</p>
<p><tt class="classname">DOMImpl</tt> is the implementation of DOM
and any exception thrown by <tt class="classname">DOMImpl</tt> is
most likely to be caught outside <tt class="classname">DOM</tt>.
Therefore, to the user of <tt class="classname">DOM</tt>, who is
unaware of <tt class="classname">DOMImpl</tt>, it is more logical
for <tt class="classname">DOM</tt> to be throwing exceptions of
type <tt class="exceptionname">BadDOM</tt> rather than <tt class=
"exceptionname">BadDOMImpl</tt>.</p>
<pre class="programlisting">
#include &lt;stdexcept&gt;
#include &lt;string&gt;
class BadDOM : public std::exception {
public:
  BadDOM(const std::string&amp; msg)
        : exception(msg.c_str()) {}
};
class CreateFailed : public BadDOM {
public:
  CreateFailed(const std::string&amp; msg)
        : BadDOM(msg) {}
};
class BadParse : public BadDom {
public:
  BadParse(const std::string&amp; msg)
        : BadDOM(msg) {}
};
</pre>
<p>The constructor and <tt class="methodname">Load</tt> function in
<tt class="classname">DOMImpl</tt> can now be modified to use the
new exception types and main modified to catch a <tt class=
"exceptionname">BadDOM</tt> exception. For completeness sake, we
also need a third <tt class="literal">catch</tt> block. The COM
smart pointers generated by <tt class="literal">#import</tt> raise
a <span class="errortype">_com_error</span> if a function call
fails.</p>
<pre class="programlisting">
try {
  COMRuntimeInit comRuntime;
  DOM dom;
  dom.Load(&quot;poem.xml&quot;);
}
catch(const BadCOMRuntime&amp; e) {
  std::cout &lt;&lt; &quot;COM initialisation error: &quot;
            &lt;&lt; e.what() &lt;&lt; &quot;\n&quot;;
}
catch(const BadDOM&amp; e) {
  std::cout &lt;&lt; &quot;DOM error: &quot;
            &lt;&lt; e.what() &lt;&lt; &quot;\n&quot;;
}
catch(const _com_error&amp; e) {
  std::cout &lt;&lt; &quot;COM error: &quot;
            &lt;&lt; e.ErrorMessage() &lt;&lt; &quot;\n&quot;;
}
</pre></div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e530" id="d0e530"></a>Engineering
the Exercise Solution: Part 2</h2>
</div>
<p>Now that the DOM is loading and validating XML the next part of
the exercise is write the elements to an output stream as an
indented tree.</p>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e535" id="d0e535"></a>Writing the
Element Structure</h3>
</div>
<p>The first step in enabling the elements to be written to an
output stream is to pass one in. The obvious way to do this is to
is to add a function to <tt class="classname">DOMImpl</tt>, and a
forwarding function to <tt class="classname">DOM</tt>, which takes
a <tt class="classname">std::ostream</tt> reference.</p>
<pre class="programlisting">
#include &lt;ostream&gt;
class DOMImpl : private NonCopyable {
...
public:
  void WriteTree(std::ostream&amp; out) {}
...
};
</pre>
<p>Modifying <tt class="function">main</tt> to call the new
function means that results can be seen straight away as the
<tt class="methodname">WriteTree</tt> implementation is
developed.</p>
<pre class="programlisting">
try {
  COMRuntimeInit comRuntime;
  DOM dom;
  dom.Load(&quot;poem.xml&quot;);
  dom.WriteTree(std::cout);
}
...
</pre>
<p>In order to write the complete tree, every element must be
visited. Starting with the root element, the rest of the elements
can then be visited in a depth-first traversal. I wrote the
following function, based on some Delphi written by Adrian Fagg,
which gets a pointer to the root element and then calls the
function <tt class="methodname">WriteBranch</tt> which recurses the
rest of the tree.</p>
<pre class="programlisting">
void WriteTree(std::ostream&amp; out) {
  MSXML2::IXMLDOMElementPtr root =
                     xmlDoc_-&gt;documentElement;
  WriteBranch(root, 0, out);
}
</pre>
<p>The <tt class="methodname">WriteBranch</tt> function is also
based on Adrian Fagg's Delphi code. The code is self explanatory,
but basically it:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Gets the tag name of the element passed to it.</p>
</li>
<li>
<p>Writes tag names to the supplied <tt class=
"classname">std::ostream</tt> at twice the specified
indentation.</p>
</li>
<li>
<p>The supplied element is then used to get a pointer to its first
child.</p>
</li>
<li>
<p>If the child pointer is not 0, it is used to get the node
type.</p>
</li>
<li>
<p>If the node is of type NODE_ELEMENT the <tt class=
"methodname">WriteBranch</tt> method is called again
(recursion).</p>
</li>
<li>
<p>The child pointer is then used to get the next sibling.</p>
</li>
<li>
<p>If there are no more siblings, the method returns.</p>
</li>
</ol>
</div>
<pre class="programlisting">
void WriteBranch(
            MSXML2::IXMLDOMElementPtr element, 
            unsigned long indentation,
            std::ostream&amp; out) {
  bstr_t cbstr element-&gt;tagName;
  out &lt;&lt; std::string(2 * indentation, ' ') 
      &lt;&lt; cbstr &lt;&lt; std::endl;
  MSXML2::IXMLDOMNodePtr child =
                          element-&gt;firstChild;
  while(child != 0) {
    if(child-&gt;nodeType ==
                      MSXML2::NODE_ELEMENT) {
      WriteBranch(child,
                  indentation + 1, out);
    }
    child = child-&gt;nextSibling;
  }
}
</pre>
<p>The result of running the program is now that the following is
written to the console:</p>
<pre class="screen">
poem
  line
  line
  line
  line
</pre>
<p>With that the exercise is complete.</p>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e609" id="d0e609"></a>Next
Step</h2>
</div>
<p>The logical next step would of course be exercise 2. However, as
well as completing the exercises which help the students learn
about XML, one of the aims of the ACCU Mentored Developers XML
Project is to write a standard interface behind which any parser,
such as MSXML or Xerces can be used. Therefore, the next step is to
design a common interface to the DOM.</p>
<p><span class="emphasis"><em>Paul Grenyer and Jez
Higgins</em></span></p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e617" id="d0e617"></a>Thank
You</h2>
</div>
<p>Thanks to all the members of the ACCU Mentored Developers XML
Project, especially Adrian Fagg, Rob Hughes, Thaddaeus Frogley and
Alan Griffiths for the proof reading and code suggestions.</p>
</div>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e622" id="d0e622"></a>References</h2>
</div>
<div class="bibliomixed"><a name="boost" id="boost"></a>
<p class="bibliomixed">[boost] The boost library: <span class=
"bibliomisc"><a href="http://www.boost.org" target=
"_top">http://www.boost.org</a></span></p>
</div>
<div class="bibliomixed"><a name="DOMRec" id="DOMRec"></a>
<p class="bibliomixed">[DOMRec] W3C Document Object Model (DOM):
<span class="bibliomisc"><a href="http://www.w3.org/DOM/" target=
"_top">http://www.w3.org/DOM/</a></span></p>
</div>
<div class="bibliomixed"><a name="ECpp" id="ECpp"></a>
<p class="bibliomixed">[ECpp] Scott Meyers, <span class=
"citetitle"><i class="citetitle">Effective C++: 50 Specific Ways to
improve Your Programs and Designs</i></span>. Addison Wesley: ISBN
0-201-9288-9</p>
</div>
<div class="bibliomixed"><a name="ExCpp" id="ExCpp"></a>
<p class="bibliomixed">[ExCpp] Herb Sutter, <span class=
"citetitle"><i class="citetitle">Exceptional C++</i></span>.
Addison Wesley: ISBN 0201615622</p>
</div>
<div class="bibliomixed"><a name="InstMsi" id="InstMsi"></a>
<p class="bibliomixed">[InstMsi] Windows Installer 2.0:
<span class="bibliomisc"><a href=
"http://www.microsoft.com/downloads/details.aspx%20?FamilyID=4b6140f9-2d36-4977-8fa1-6f8a0f5dca8f%20&amp;displaylang=en"
target="_top">http://www.microsoft.com/downloads/details.aspx
?FamilyID=4b6140f9-2d36-4977-8fa1-6f8a0f5dca8f
&amp;displaylang=en</a></span></p>
</div>
<div class="bibliomixed"><a name="MDevelopers" id=
"MDevelopers"></a>
<p class="bibliomixed">[MDevelopers] ACCU Mentored Developers:
<span class="bibliomisc"><a href="http://www.accu.org/mdevelopers/"
target="_top">http://www.accu.org/mdevelopers/</a></span></p>
</div>
<div class="bibliomixed"><a name="MSXML" id="MSXML"></a>
<p class="bibliomixed">[MSXML] Microsoft XML parser: <span class=
"bibliomisc"><a href=
"http://www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2-46da-b4b6-c5d7485f2b42%20&amp;displaylang=en"
target=
"_top">http://www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2-46da-b4b6-c5d7485f2b42
&amp;displaylang=en</a></span></p>
</div>
<div class="bibliomixed"><a name="PerfectXML" id="PerfectXML"></a>
<p class="bibliomixed">[PerfectXML] PerfectXML: <span class=
"bibliomisc"><a href="http://www.perfectxml.com/msxml.asp" target=
"_top">www.perfectxml.com/msxml.asp</a></span></p>
</div>
<div class="bibliomixed"><a name="UsingDOM" id="UsingDOM"></a>
<p class="bibliomixed">[UsingDOM] Using DOM: <span class=
"bibliomisc"><a href=
"http://www.perfectxml.com/CPPMSXML/20020710.asp" target=
"_top">http://www.perfectxml.com/CPPMSXML/20020710.asp</a></span></p>
</div>
<div class="bibliomixed"><a name="Xerces" id="Xerces"></a>
<p class="bibliomixed">[Xerces] Xerces XML parser: <span class=
"bibliomisc"><a href="http://xml.apache.org/xerces-c" target=
"_top">http://xml.apache.org/xerces-c</a></span></p>
</div>
<div class="bibliomixed"><a name="XMLRec" id="XMLRec"></a>
<p class="bibliomixed">[XMLRec] Extensible Markup Language (XML):
<span class="bibliomisc"><a href="http://www.w3.org/XML/" target=
"_top">http://www.w3.org/XML/</a></span></p>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
