ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinThe Eternal Battle Against Redundancies, Part 2

Overload Journal #107 - February 2012 + Programming Topics   Author: Christoph Knabe
Repeated information leads to poor quality software. Christoph Knabe continues to see how removing them has influenced language design.

Since the beginning of programming, redundancies in source code have prevented maintenance and reuse. By redundancy we mean that the same concept is expressed in several locations in the source code. Over the last 50 years the efforts to avoid redundancies [DRY] have inspired a large number of programming constructs. This relationship is often not obvious to programmers in their daily work. In part I [Part1] we talked about relative addressing, symbolic addressing, formula translation, parameterizable subroutines, control structures, middle-testing loops, symbolic constants, preprocessor features, and array initialization. In this part we will investigate higher concepts like object-oriented, aspect-oriented, and functional programming, as well as exception handling and even program generators and relational databases, and how these concepts contribute to redundancy avoidance. These concepts are discussed on the basis of prevalent programming languages. Whosoever understands the common concept is well equipped for the future.

Mentioned programming languages
Note: The concepts are all talked about on the basis of prevalent programming languages. But often they were before tried out in research languages as Simula-67, CLU, MESA, or LISP dialects.
Name Year Innovations
Freiburgian Code > 1958 Programming of the Zuse 22
Freiburgian Code Z23 1961 Relative and symbolic addressing
FORTRAN 1957 Formula translation, FORTRAN II: subroutines, linker
ALGOL 1958 Subroutines, block principle, BNF, control structures, recursion
LISP 1958 Garbage collection, recursion, functional programming (FP)
COBOL 1959 Record variables, long identifiers
Pascal 1970 Record types, pointer types, structured programming
Smalltalk 1972 Dissemination of object-oriented programming (OOP)
C 1973 Preprocessor, sizeof, operating system API, information hiding, break
Modula-2 1978 Separation of interface/implementation, if...end
Ada 1980 Genericity, automatic exception handling
C++ 1983 Static typesafe OOP, freezing variable values, late declaration
Java 1995 Static typesafe OOP with garbage collection, stack trace API
AspectJ 2001 Centralized solution of cross-cutting concerns
Scala 2003 Static typesafe synthesis of OOP and FP

Information hiding

The principle of information hiding was formulated by Parnas [Parnas]. It postulates not to allow direct manipulation of a data structure by clients. Such manipulations are to be done only through operations which are grouped in an interface. Information hiding was the prevalent design criterion in modular programming and it still plays an important role in object-oriented programming.

Enforcing the information hiding principle guarantees that the intended administration operations cannot be bypassed by a module’s users. This contributes to redundancy avoidance by the fact that the logic behind the administrative functions cannot migrate into the user’s code with the risk of duplication therein. This danger was always present in languages without support for information hiding.

Secure information hiding was enabled in C (1973) by the declaration of file-scope static variables. Such variables stayed alive beyond a function call, but were not accessible from outside the source file. Later languages which introduced special constructs for module interfaces and implementations were Modula-2 and Ada.

In C++ (1983) the information hiding principle was extended to user-defined data types (classes) by giving class members private visibility by default, which could be explicitly changed to public.

Genericity

COBOL (1960) had composite variables, but only Pascal (1970) introduced user-defined, composite data types as RECORDs. C (1973) followed with structs. These constructs increased the robustness of programs, as confusions of e.g. persons with windows, calendar dates, or jobs were detected by the compiler. But the new strictness led to problems in the creation of universal services. Although Pascal had elegant operations for dynamic data structures, it was impossible to program a linked list so that it would be usable for an arbitrary element data type. The link data and the type of the payload data had to be firmly combined in the type for a list node. E.g.:

   TYPE
     PersonList = ^PersonNode;
     PersonNode = RECORD
       info: Person;
       nextPtr: ^PersonNode;
     END;

If you wanted to use the same list management module in Pascal for different payload data types, you had to copy the source text and globally substitute the payload data type name.

The somehow less strict C could bypass such problems by using an untyped pointer, the void*. So in C it was possible, although insecure, to implement list management for arbitrary payload data. This list node could be formulated as follows:

   struct List {
      void* infoPtr;
      struct List* nextPtr;
   };

Only Ada (1980) achieved a synthesis of user-defined, composite data types (records) with flexible type safety. This concept was named genericity and was accepted by all modern, statically typed languages such as C++ (templates), Java, C#, and Scala. Using genericity you can avoid redundancies if you have to define same-behaviour services for different payload data types. The generic collection classes implemented by this technique are used quite frequently in all contemporary programming languages.

Dynamically typed languages such as Smalltalk or Ruby circumvent the problem described here by postponing the type checks to run-time.

Exception handling

In older programming languages (Lisp, Fortran, Algol, Cobol, Pascal, C) there was no automatic handling of exceptions. After every subroutine call the caller had to check manually whether the subroutine terminated successfully or erroneously. To complicate matters further there was no universal convention for how a subroutine should communicate its failure to the caller. The Unix services written in C used a special value for the function result as well as error codes in the global variable errno. The latter way was more suitable for standardization, as it did not have to cope with different function result types, but it was not suitable for the upcoming multi-threading.

How was the errno convention applied? After invoking fopen(filename, "r") in order to open a file you had to check whether errno had a nonzero value. As there were neither destructors nor garbage collection mechanisms in C, errors found could not be easily collected in working storage, so tended to be immediately reported. But this limited the universal usability of a subroutine, as then the destination of error reporting was not easily chosen by the caller.

So the correct handling of a function call in C on Unix, here of the function fopen, appeared as follows:

   FILE* pFile = fopen(filename, "r");
   if(errno!=0){
       perror(filename); //prints errno and filename
	   
       fprintf(stderr, "at file %s in line %d\n",
          __FILE__, __LINE__);
       errno = FAILURE;
       return NULL;
   }

You can easily imagine that correct error handling was highly redundant and made program texts harder to read and understand, and so harder to maintain. Furthermore, you had to write so much to implement this handling that programmers rarely practised it. Fortunately C’s preprocessor macros offered a means to partially eliminate this redundancy. You could extract the portion of the example from if up to return NULL;} into a macro, which should get a context and the function result in case of failure as arguments.

   #define ERRCHECK(context, failResult) ...

The invocation of fopen could then be much shorter:

   FILE* pFile = fopen(filename, "r");
   ERRCHECK(filename, NULL)

This approach cannot yet solve the problem of functions failing when they were combined in expressions, e.g. f(x)*g(x). ERRCHECK could only be applied between two statements, not inside an expression.

Such error handling, which was implemented here manually, is done by contemporary languages automatically, when a function throws an exception. Standardized handling (usually a message with stack trace and program abortion) is guaranteed, although custom handling is possible. Automatic exception handling was popularized by Ada 80. C++ adopted it around 1990, while Java contained it from the beginning (including an API access to the stack trace of a caught exception).

Object-oriented programming

The technique of object-orientation, introduced by Simula 67 and popularized by Smalltalk-80, adopts ‘information hiding’ for object attributes and contains as innovations ‘inheritance’, ‘reference polymorphism’, and ‘dynamic method dispatch’. Inheritance alone enables a minor avoidance of redundancy by extracting the common state and behaviour of several data types into a base class. Compared to composition this saves only a (relatively) small amount of writing when accessing an inherited attribute or method. Polymorphism of references enables a flexibility similar to the untyped pointers of C, but considerably more secure, as it constrains the referenced elements to subclasses of the base class. With dynamic dispatch for calls of virtual functions (C++, 1983) came the big, redundancy-avoiding progress, which is nowadays commonly known as the ‘Template Method Pattern’ [TemplMeth].

Template Method Pattern: As an example let us have a look at the problem of transaction management. In enterprise applications each operation of the business logic must be executed as a transaction. If the logic operation succeeds, the database modifications must be committed, otherwise errors must be reported and the database modifications must be rolled back. Instead of redundantly programming this behaviour in each logic method, you can extract it into an execute on a base class Transaction, which will call an abstract action method, which has to be overridden with the concrete logic operation. In Java, the solution looks like Listing 1.

abstract class Transaction {
  public void execute(){
    final Connection con =
       DatabaseUtil.getConnection();
      try{
        doAction();
        con.commit();
      }catch(Exception ex){
        report(ex);
        con.rollback();
      }
  }
  abstract void doAction() throws Exception;
}
			
Listing 1

The template method execute follows a fixed procedure in order to guarantee the commit or rollback. Only the business logic part of the action is conferred in the template method upon the abstract method doAction. The programmer of the subclasses has then to implement this method. Usage would follow the pattern shown below and would appear in a real system hundreds of times, which leads to an enormous reduction of redundancy, although the amount of code is still problematic.

   new Transaction(){
     public void doAction() throws Exception {
       //Here the actual logic operation is placed.
     }
   }.execute();

An alternative solution in Java would make use of reflection [Refl], as done by EJB 3.0 application servers internally. Each method of a class annotated as @Session is executed as a transaction.

Mixin Programming: In contrast to Java inheritance, Scala (2003) allows the mixing in of several traits (partially implemented interfaces), each of which can offer such template methods. The ‘diamond problem’ usually occurring with multiple inheritance is avoided by an explicitly definable resolution order. By this means you can freely combine different services in a class. In fact the Scala collections framework stands out due to an extremely high internal re-use of a few template methods. This is a big contribution to redundancy avoidance.

Aspect-oriented programming

Aspect-oriented programming enables you to handle concerns that cut across a software system centrally in an aspect. The above-mentioned problem of transaction management is exactly such a cross-cutting concern. Let us consider the case where each method of a logic façade should be executed as a transaction. Although the above solution, implementing the method doAction in an anonymous subclass of Transaction, is technically free of redundancy, it needs a lot of code. In contrast to this, in the solution with AspectJ (2001) in Listing 2, the aspect needs to be noted only once for the whole system. The ‘pointcut’ executeAnyFacadeMethod captures each execution of a method of objects of the type LgFacade. The around advice surrounds the captured method executions at the location, marked by proceed, thus causing the unified transaction management. This solution is not only technically, but also textually, free of redundancies. Usage of AspectJ in Java projects can deliver enormous redundancy savings straightaway.

aspect TransactionAspect {
  pointcut executeAnyFacadeMethod
     (LgFacade lgFacade):
     execution(public * *(..)) && this(lgFacade);
        
  Object around(LgFacade lgFacade):
     executeAnyFacadeMethod(lgFacade) {
    final Connection con =
       DatabaseUtil.getConnection();
    try{
      final Object result = proceed(lgFacade);
      con.commit();
      return result;
    }catch(Exception ex){
      report(ex);
      con.rollback();
    }
  }
}
			
Listing 2

Functional programming

Of the many and powerful constructs of Functional Programming I want to demonstrate only one, which facilitates the extraction of control structures. We take the every-day example that a list of persons should be displayed in a special format obtainable by method getName of class Person. In Java 5 we would need the function in Listing 3 to transform a list of persons into such a format.

public List<String> personsToNames
   (final List<Person> persons){
  final List<String> names =
     new LinkedList<String>();
  for(final Person p: persons){
    names.add(p.getName());
  }
  return names;
}
			
Listing 3

A usage would look like:

   personsToNames(persons)

The corresponding transformation in Scala would be so compact that no one would write a special function for this purpose:

   persons.map(_.getName)

This is possible since the function map from the Scala collections library contains the above algorithm in a general solution and calls the argument function for each element of the List. Using the underscore sign _ we define a mapping from an anonymous argument to the expression containing the underscore. The type of the argument is inferred from the element type of persons and thus needs not to be indicated explicitly.

In a similar way, in Scala you could guarantee the above-mentioned transaction management. What should be executed as transaction would have to be packed into transaction{...}, if the method transaction is suitably defined. This solution is technically free of redundancies, but it needs slightly more code than with AspectJ. In contrast, Scala needs only a minimum of keywords in comparison to AspectJ.

Program generators / domain specific languages

Sometimes an application needs highly redundant code patterns, but the programming language used does not offer a means to extract them. In such circumstances, as last resort, you could use a brute-force means: code generation. You define a special language, tailored to the problem, in which you can express yourself without redundancies. From that language you generate program code. Classical examples are decision table code generators like DETAB/65 or parser generators like yacc. As an example we give a rule of the contemporary parser generator ANTLR for multiplicative operations. This rule means: A product is a sequence of factors, which are separated by '*' or '/'.

   product
       :    factor
            ( '*' factor
            | '/' factor
            )*
       ;

From this ANTLR can generate a parser which recognizes expressions like a*b/c*d. You can expand this parser to an interpreter or translator by inserting actions at the end of each line.

Data storage

Redundancies also cause problems in data storage. An example for this is a table of employees with the columns Id, Name, Date of Birth and Department.

Id Name Date of Birth Department
1 Seyfried, Janina 17.01.1974 Human Resources
2 Stahl, Georg 06.06.1985 Sales
3 Schmidt, Sebastian 26.09.1979 Development
4 Müller, Friederike 19.11.1987 Sales

If the department is indicated as a string for each employee, this constitutes a data redundancy causing the following problems: If there is a typo in a department name, the affiliation of the employee to the department can not be recognized automatically. A renaming of a department necessitates modification of many employee rows.

The redundancy-free solution comprises the management of an additional table for departments, whose rows are referred to by a departmentId from each employee. Exactly this is achieved by normalization according to the concept of relational databases.

Other concepts of programming languages

This section lists relevant milestones in evolution of programming languages, which are not useful for redundancy avoidance, but are nevertheless worthy of mention.

  • Robustness of programs was boosted by the declaration principle (Algol 58), by the locality helped by the block principle (Algol 58) and the late compilation in conjunction with a linker (FORTRAN, COBOL).
  • Coding convenience was boosted by dynamically typed languages (LISP) or by the concedeclaration of variables only at their first usage (C++, 1983), by the freezing of computed values (C++), by ‘Garbage Collection’ instead of explicit deallocation (LISP, 1958).
  • Labour division in development was boosted by the technique of separate pt of static type inference (Scala, 2003).
  • Understandability was boosted by comments beginning with full line comments in FORTRAN with C, block comments in Algol with comment up to ;, end of line comments in Ada with --, documentation comments in Java with /** up to nested block comments in Scala. COBOL pioneered long identifiers significantly helping understandability.

Summary

When you see how painfully the steps of progress in programming were achieved over the last 50 years, you really learn to appreciate the state of the art. Even more interesting is recognizing the driving force behind this progress. High redundancies in source code regularly required new programming constructs. In the majority of cases the ability was introduced to give a freely electable name to the redundant code pattern, and to invoke it with parameters from several locations. This happened to addresses, constants, subroutines, classes and generic units. Sometimes the evolution did not go as far and the redundant code patterns only received new keywords. This happened to formulas, loops, branches, and exception handling. When a programming language helped to eliminate redundancies better than a competing language, this was an advantage in the battle for dissemination. We can assume that this will still be true in future.

References and further reading

[DRY] http://en.wikipedia.org/wiki/Don%27t_repeat_yourself

[Parnas] http://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf

[Part1] Christoph Knabe: ‘The Eternal Battle against Redundancies, Part I’, Overload 106, December 2011, accu.org, pp. 6-10

[Refl] http://en.wikipedia.org/wiki/Reflection_%28computer_programming%29

[TemplMeth] http://en.wikipedia.org/wiki/Template_method_pattern

Overload Journal #107 - February 2012 + Programming Topics