Formal Verification

Proving code is correct is challenging. Aurelian Melinte explores verification, the most ignored area of software engineering.

When I see code for a state machine or a protocol I know it was not formally verified (FV); that is it was not proved to operate correctly via some formal methodology. And I know why: most engineers are not even aware that FV exists. Few universities offer FV classes in their software engineering curricula. It is hard to understand why as FV started in research and academia environments in the 60s and, by the 80s, FV toolsets usable by the average CS graduate were available. Since the 80s, hardware design engineers live and die by FV but software engineers are still generally ignorant of it.

Theoreticians might cringe at this definition: by-and-large FV means proving that a verification model – an abstraction of the design to be implemented – is mathematically correct. That is, it does what it is supposed to do and does not do things it is not supposed to do. What exactly this ‘does and does not’ set of properties means depends of what is being modeled. In practice, FV tools will check all possible execution paths of the model for logical flaws.

Typical designs to be verified are state machines, protocols and algorithms. But the methodologies are flexible enough to model an 8-queen chess puzzle or a Fibonacci series, for instance. Accordingly, the properties to be verified can be anything one decides to be applicable to the model.

Two notes. One, FV verifies the model of a proposed design, not the actual implementation. If the model is proven to be deadlock-free then the implementation can still deadlock. But, if the implementation does not work, at least you know the design behind the implementation is sound. Two, FV will verify only what it is told to, i.e. if you want to verify it is starvation-free then you have to subject the model to that property check lest you miss that important check.

Out of the of dozens FV methodologies [Wikipedia-1], I think two are the dominant ones: Promela/spin and TLA+/PlusCal. Below is a very high-level and hopefully still useful overview of each. The overview may not be very intelligible without further individual homework but the FV topic is at least book-sized. As for the specifics of Promela or TLA+, these are also book-size.

Both methodologies share the same theoretical roots, use the same terminology and even the same linear temporal logic (LTL) [Wikipedia-2] notation. LTL expresses the desired or undesired behavior the model should have over its lifetime and allows one to express properties such as ‘if X happens then Y is guaranteed to happen, and it remains set if Z does not occur’.

Properties of a model are, by and large, classified in two bins: safety (the model does not misbehave; does not do ‘bad things’) and liveness (the model does what is supposed to do).

Typical safety properties are: the model is deadlock-free, does not trip model-specific assertions (that you have to define) and it does not end in an invalid end state. Safety is expressed either as plain assertions, as system invariants, or as LTL properties.

Typical liveness checks are:

responsiveness: every request gets a response back.
non-progress cycles over model’s lifetime aka starvation: FV checks for conditions leading into infinite and useless runs. Think livelock as an example.
acceptance: the model is either stuck in an undesirable state or falls into that state infinitely often. The “acceptance” property naming makes sense only when thinking of the under-the-hood verification mechanisms[Spinroot-1].

Promela

Promela/spin [Spinroot-2] originated in the 80s at the Bell Labs. It has a strong practitioners’ base in academia and in the telecom companies. Since the telecoms have lost their preeminence in IT these days, that user base is probably shrinking but research in academia seems to be as healthy as ever.

By and large, the spin tool will turn the Promela model into a state-machine which will be verified for the properties you asked for in your model.

The spin tool will also warn about code that was not reached during verification: this alone is priceless as it could be a bug in the model.

With spin, some of the built-in checks are free: dead code, deadlocks, invalid end states; and some are almost-freebies: you may have to do some minimal work and ask for progress and acceptance.

Atomicity is fine-grained, at the statement level. This makes race conditions rather trivial to find. Promela is architecture agnostic, on purpose. Modeling specifically for verification with the C++ memory-model is feasible but requires specific modeling work on top of modeling for your design. In particular, the Dekker algorithm as detailed below was proven to be unsafe on a machine without atomic reads and writes [Buhr15].

Because of the fine-grained atomicity, Promela models tend to be large in a relative sense. With the models still being entirely hand-crafted, it is safe to say that the footprint is orders of magnitude lower than hardware FV models. The first line of defense is reworking the model (e.g. by reducing the number of global variables and other artifacts such as processes and channels). Then there are special model generation options to mitigate the footprint – this was clearly an issue with 90’s available machines. Finally, for really large models, there is a best-effort (read: partial) verification available. I have yet to see a model too large for modern machines – the largest I know of would complete verification within minutes.

There are also a number of tools to extract a Promela model from an existing codebase but I have not used these [Spinroot-3].

To learn Promela, one must read Prof. Holzmann’s book The Spin Model Checker [Holzmann03] and follow the short two-hour video class on the spinroot site. The other book one should read (this is actually another must) is Prof. Ben-Ari’s book Principles of Spin [Ben-Ari08]. This should suffice to start the journey.

As an example, Listing 1 is a model of the Dekker algorithm [Wikipedia-3] to enforce critical sections/mutual exclusion, sourced from the Promela manual [Spinroot-4].

#define true   1
#define false  0
#define Aturn  false
#define Bturn  true

bool x, y, t;
byte ghost;

proctype A()
{  x = true;
   t = Bturn;
   (y == false || t == Aturn);
   /* critical section */
   ghost++;
   assert(ghost == 1); 
cspa: ghost--;  
      x = false
}

proctype B()
{  y = true;
   t = Aturn;
   (x == false || t == Bturn);
   /* critical section */
   ghost++;
   assert(ghost == 1); 
cspb: ghost--;  
      y = false
}

init{	atomic { run A(); run B(); } }

ltl criticalSection {[]!(A@cspa && B@cspb)}

Listing 1

Easy to read even if you do not know some Promela statements can be blocking until becoming executable. Here we want to verify that the critical section is enforcing mutually exclusive access.

The last line of the model instructs the verifier what to look for. It is a system invariant expressed in the canonical LTL way which states plainly that both processes cannot be in the critical section at the same time during the lifetime of the model.

Assertions can also be used to verify various conditions in conjunction with the LTL or independently of it. As shown in the example, the assertions also verify the critical sections holds.

TLA+

TLA+ [Lamport] is pure mathematics describing your model. This is rarefied air that few can adapt to when models get complex. PlusCal is an added layer that describes the model in an imperative-like language that gets translated to TLA+. You will still need to be TLA+ literate but PlusCal puts you within reach of modelling properly.

TLA+ got adopted by Microsoft and other software companies followed suit, so I guess the practitioner base is currently expanding.

The curricula here includes Wayne’s book Practical TLA+ [Wayne18], the training videos on Prof. Lamport’s site and his book Specifying Systems.

The Dekker algorithm, according to a course at the Stuttgart University [Duerr15], follows. First the PlusCal (lines 29-99 in the original) that you have to write is in Listing 2.

--------- MODULE criticalsection5dekker ---------
EXTENDS Integers, Sequences

(*
--algorithm criticalsection5dekker
{
  \* Global variables
  variables turn = 1; wantp = FALSE; 
    wantq = FALSE;

  \* The non-critical section.
  \* For checking for freedom from starvation, it 
  \* is important that a process might stay in 
  \* the non-critical section forever (however,
  \* each process must leave the critical 
  \* section). 
  \* This procedure covers both cases: finite and 
  \* infinite execution of the non-critical 
  \* section. 
  procedure NCS()
    variable isEndless;
  {
  \* Non-deterministically choose whether the 
  \* procedure will be endless or finite.
    ncs0: with (x \in {0,1}) {
      isEndless := x;
    };
    ncs1: if (isEndless = 1) {
      ncs2: while (TRUE) {
        ncs3: skip;
      }
    } else {
      ncs4: return;
    }
  }

  \* First process (name P, pid 1)
  process(P = 1) {
    p0: while (TRUE) {
      p1: call NCS(); \* non-critical section
      p2: wantp := TRUE; 
      p3: while (wantq = TRUE) {
        p4: if (turn = 2) {
          p5: wantp := FALSE;
          p6: await turn = 1;
          p7: wantp := TRUE;
        };
      };
      p8: skip; \* critical section
      p9: turn := 2; 
      p10: wantp := FALSE;
    } 
  }

  \* Second process (name Q, pid 2)
  process(Q = 2) {
    q0: while (TRUE) {
      q1: call NCS(); \* non-critical section
      q2: wantq := TRUE; 
       q3: while (wantp = TRUE) {
        q4: if (turn = 1) {
          q5: wantq := FALSE;
          q6: await turn = 2;
          q7: wantq := TRUE;
        };
      };
      q8: skip; \* critical section
      q9: turn := 1; 
        q10: wantq := FALSE;
    } 
  }
}
*)

Listing 2

Please note every line is labeled. That is because TLA+ has coarse-grained atomicity: the whole span of statements between two labels is one atomic op. Since we check for race conditions here, we have to label every statement. TLA+ is also architecture agnostic.

Coarse atomicity generates much smaller models (compared to Promela). The only way I know to further reduce the size of the model is to rework it.

If you go to the full model file on github, you can see the TLA+ translation of the above PlusCal at lines 102-281 – the ‘real’ verification model you can write by hand if you feel brave.

Finally, the verification properties are shown in Listing 3, lines 283-306 of the original. There are no freebies with TLA+. You have to state what deadlock and starvation look like. Another (rather big) minus: no dead code warnings unless you code model-specific properties checks for it.

\*** Mutual exclusion
\* For mutual exclusion, process 1 and process 2
\* must never be in the critical section at the 
\* same time.
MutualExclusion == [] ~ (pc[1] = "p8" /\ pc[2] 
  = "q8")

\*** Deadlock free
\* If P and Q both want to enter the critical 
\* section, one of them will eventually enter the 
\* critical section. 
NoDeadlock == /\ pc[1] = "p2"
              /\ pc[2] = "q2"
              ~>
              \/ pc[1] = "p8"
              \/ pc[2] = "q8"

\*** Starvation free
\* If P wants to enter the critical section, P 
\* will eventually enter the critical section. 
\* The same must hold for Q.
NoStarvationP == (pc[1] = "p2") ~> (pc[1] = "p8") 
NoStarvationQ == (pc[2] = "q2") ~> (pc[2] = "q8") 
NoStarvation == /\ NoStarvationP
                /\ NoStarvationQ

\* Assume weakly fair scheduling of all commands
(* PlusCal options (wf) *)

Listing 3

Summary

This article has given a quick overview of two dominant FV methodologies, Promela/spin and TLA+/PlusCal. They both have pros and cons, but can proof a design for issues. Though many programmers are unfamiliar with FV, companies do use them. For example AWS have used TLA+ successfully [Newcombe15] and Promela was used to proof Plan9 concepts [Plan9] and mission-critical applications [Spinroot-5].

References

[Ben-Ari08] Mordechai Ben-Ari (2008) Principles of the Spin Model Checker, Springer, ISBN-13: 978-1846287695

[Buhr15] Peter Buhr, Dave Dice and Willem Hesselink (2015) ‘Dekker’s mutual exclusion algorithm made RW-safe’, published in Concurrency and Computation, available at https://onlinelibrary.wiley.com/doi/10.1002/cpe.3659

[Duerr15] Frank Duerr (2015) criticalsection5dekker.tla (at October 2024), available at: https://github.com/duerrfk/skp/blob/master/criticalsection5dekker/criticalsection5dekker.tla

[Holzmann03] Gerard J. Holzmann (2003) The Spin Model Checker, Primer and Reference Manual, Addison Wesley Professional, ISBN-13: 978-0321228628

[Lamport] Leslie Lamport, the TLA+ website, available at https://lamport.azurewebsites.net/tla/tla.html

[Newcombe15] Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, Michael Deardeuff ‘How Amazon Web Services uses formal methods’, available at https://www.amazon.science/publications/how-amazon-web-services-uses-formal-methods

[Plan9] ‘Spin & Plan 9’, available at: https://swtch.com/spin/

[Spinroot-1] Spinroot manual, ‘accept’: https://spinroot.com/spin/Man/accept.html

[Spinroot-2] Spinroot website: https://spinroot.com

[Spinroot-3] Modex readme: https://spinroot.com/modex/

[Spinroot-4] Basic Spin Manual: https://spinroot.com/spin/Man/Manual.html

[Spinroot-5] ‘Inspiring Applications of Spin’, available at: https://spinroot.com/spin/success.html

[Wayne18] Hillel Wayne (2018) Practical TLA+: Planning Driven Development, Apress Berkeley, CA, ISBN-13: 978-1-4842-3828-8 (softback)/ 978-1-4842-3829-5 (ebook)

[Wikipedia-1] ‘Formal verification’: https://en.wikipedia.org/wiki/Formal_verification

[Wikipedia-2] ‘Linear temporal logic’: https://en.wikipedia.org/wiki/Linear_temporal_logic

[Wikipedia-3] ‘Dekker’s algorithm’: https://en.wikipedia.org/wiki/Dekker's_algorithm

Aurelian Melinte Aurelian acquired his programming addiction in late 90s as a freshly-minted hardware engineer and is not looking for a cure. He spends most of his spare time reading and exercising.

Promela

TLA+

Summary

References

Advertisement

Advertisement

Your Privacy