Can we reduce the pain of computer problems?
Hello and welcome to Overload 83. I am guest editor for this issue, giving Alan Griffiths a break from his duties and me a chance to see what's involved in the editorial role. Alan will be back for the next issue.
Computer problems
It has been a trying month for the Orr household computers.
Firstly we installed IE7 on Windows Server 2003 SP2 after a number of reminders from various web sites, with comments about the benefits over IE6, made us decide that the time was now right to update. However, this change has proved a rather mixed blessing; it seems to be so secure that many web sites won't work at all... however as it has persuaded my wife to try out (and switch to) Firefox perhaps it wasn't altogether a bad thing.
While trying to resolve some of the problems with the sites that weren't working I decided to verify that we had installed all the latest Windows hot fixes by running Microsoft Update interactively. However the check for available updates failed with an error code.
Unfortunately a quick google for the error code didn't provide a great deal of help - a couple of suggestions were provided but neither resolved the problem. Further investigation in the event log revealed that Windows Update wasn't running properly, and in fact hadn't been working for over a month.
Fortunately faults in Microsoft Update can be reported for free, so I fired off a request for help to Microsoft. I was asked for some additional information including the contents of WindowsUpdate.log, a file that I hadn't previously known about. Sadly the log file, while it did contain a bit more information about the error ('WARNING: DownloadFileInternal failed for http://download.windowsupdate.com/v7/windowsupdate/redir/wuredir.cab'), didn't immediately help me to solve my problem.
I carried on with the original problem, trying to get IE7 to work. At this point I was running a packet sniffer (Packetyzer, free from Network Chemistry) and mixed up with the HTTP traffic I was expecting I noticed a single request to my ISP's web proxy. 'That's odd', I thought, 'I removed the proxy cache from my configuration back in October as my ISP announced they were going to decommission the facility. Why is it still being tried?'
After more googling on Web proxies I discovered a pointer to a tool, proxycfg , which displays and edits the WinHTTP proxy configuration. Using this tool I found that WinHTTP was still using the obsolete proxy cache and so I reconfigured it to use direct access. Having made this change Microsoft Update started working, so at least that problem was resolved.
Oh yes, I did get a reply from Microsoft to my request for help, which did give me number of additional things to try to resolve the problem; however this list didn't include checking the proxy settings. Perhaps it would have been suggested had I continued to need assistance.
Then my daughter arrived home from University, complaining that her new laptop had suddenly stopped displaying DVDs. Sure enough, when you put one in and pressed 'play' the screen simply went black. No error was shown, and nor could I find a log file that seemed to shed any more light on the problem.
I checked the obvious things - looking for errors in the event log from around the time of failure, looking for newly installed software, running a spyware scan, but to no avail. Data DVDs were readable, it was only videos that seemed to fail. Once more Google came to the rescue, and I found a thread reporting the same behaviour, and explaining that the problem is caused by the CODECs expiring. There was a link to a patch on the manufacturer's site containing updated CODECs so I downloaded and installed this patch, but sadly it made no difference. It seems that the updated CODECs had also expired. A further search pointed me to a further set of (free) CODECs, which resolved the problem, and my daughter was soon happily catching up with her unwatched DVDs into the small hours.
So why have I told these stories? I'm not trying to get at any company or operating system in particular - the previous month I tried and failed (on two different machines) to get successful network connections under Linux - but there seem to me to be couple of issues that these problems highlight.
Why are there so many problems?
I am an experienced programmer, although not a trained support engineer, and even so I can spend hours trying to fix such problems. I expect many of you have also spent more time than you'd like on fixing systems that don't work, or programs that won't communicate. For the majority of users, without in-depth technical experience, fixing these problems is even harder.
A large part of the problem is the 'execution environment' of the programs. No two PCs seem to be exactly alike, and the differences can include any or all of: the hardware configuration, the type (and version) of the operating system, the configuration settings, the amount of disk space, the network topology and the selection of other programs installed on the machine. I'm sure you, like me, can think of application or installation failures that each one of these factors can cause. Hence a log of what software and hardware was installed, when, and what options we selected can provide valuable help when experiencing software failures.
One of the adages of programming is 'separate out the things that change from those that stay the same'. I consider that two of the problems I had last month could have been resolved more easily if the software writers had borne this rule in mind with reference to the execution environment.
Network configurations are not static but are subject to change; this may be frequently for a laptop but will also occur if you move house or change ISP. Given this, silently caching the Internet Explorer proxy settings elsewhere in the machine seems to me to be a poor design choice. The DVD player problem was also caused by something changing, in this case the date. The software is checking the expiry date of the CODECs but insufficient thought seems to have been given to the action taken when the check failed.
It is easy to make mistakes over which things may change and which are constant when considering installation versus runtime checks. A program might, for example, check at installation time that a specific version of some other component is available. Later on, this component can be updated causing hard-to-disagnose faults in the application. The sad thing is that the code to diagnose the incompatability has already been written - but a decision was implicitly taken that this only executed during installation. Some programs sensibly have a separate 'self test' function that verifies some of the environmental dependencies, which can be executed during installation and also re-run automatically on startup or manually as part of fault diagnosis.
Where do we get help?
The second reflection on the stories is how much the Web has changed the way I resolve support issues. I typically start by googling to find information from other people who have experienced and, hopefully, resolved the same problem. For this to work well three things are needed: specific error data, good questions and documented solutions.
One trouble with the Web is that it is so big and contains so much information that a good search query is vital to locate relevant information easily. The more specific the error data provided by the failing program is the better the hit rate of a search will be. Numeric error codes can be useful - in the Windows Update case above adding the Hex error code into a simple Google query reduces 3,230,000 possible pages to 419. On the other hand searching for 'DVD player displays blank screen' gives 638,000 possible pages but, alas, the DVD player provided no other details to refine the search.
Some of us are writers of software that we do not support in person; how much thought do we give to ensuring that our users are not only given notification of errors but also that any such error messages are easily identifiable? How specific is the message text, and are there log files containing further details (and if so is the location of these files known)? More specific errors are useful anyway ('File not found' begs the question: 'Which file?'), but when searching the Web a generic error message can make the chance of success vanishingly small.
Often the initial search doesn't find a solution, but does find a news group or Web site that has answers to similar queries. The next challenge is how to make my request. Asking a good question significantly improves the chance that someone will be willing and able to assist. Eric S Raymond, probably best known at the author of The Cathedral and the Bazaar , maintains a Web page entitled 'How to Ask Questions the Smart Way' (http://catb.org/~esr/faqs/smart-questions.html ) that contains a number of examples of ways to ask good questions.
Finally when the problem is solved we can make sure the solution is published to help other people who have the same problem later. If we posted a question to a support site and got a couple of replies, then post a final message saying thank you and stating which piece of advice fixed the problem; if none of the proffered advice helped then post the method that was finally successful. We may be providing support to users ourselves, and if so we may be able to make problem resolutions searchable via the Internet. Again, to help with searching for answers, be specific in any reply.
It is often remarked that computers are an increasing part of all aspects of modern life. Ten or fifteen years ago people may have played with a computer as a hobby, now it is likely to be a tool they rely on for a variety of tasks. Unfortunately these tools seem to lack reliability; and I think something is wrong when these types of failures are so common. However, such users are often very motivated to sort out their problem, and if sufficient information is provided in the public domain they may well be able to resolve their own troubles.
Preventing problems
As the proverb puts it: 'An ounce of prevention is worth a pound of cure'. I hope that, in line with our 'professionalism in programming' motto, the articles in this issue will help us write programs that are part of the solution to such computer woes.