CPU Clocks and Clock Interrupts, and Their Effects on Schedulers

Instructions to sleep for a second almost never result in precisely one second’s sleep. Bob Schmidt walks us through the mechanics of why.

Suppose you are walking down the hallway of your office, and a Summer Intern (SI) intercepts you and asks, “ If I put a line of code in my program that simply reads sleep(10) , how long will my program sleep? ¹

You look at the harried SI and reply, “ It depends, ” and you continue on your way.

The SI rushes to catch up with you, and asks, “ It depends on what? ”

And you answer, “ That, too, depends, ” as you continue walking.

At this point our young SI is frantic (and in immediate danger of going bald). “ Stop talking in riddles, grey hair! I’m in real need of help here. ”

Your stroll has taken you to the entrance of the break room, so you grab your interlocutor, duck inside, grab two cups of your favourite caffeinated beverage, and sit down.

“ It depends, ” you say, “ on many things, so let’s start with first things first. ”

First things first

To understand what’s going on ‘under the hood’ when a sleep() is executed, it helps to know a little about how CPUs work, and that means knowing something about CPU clocks, interrupts, and schedulers. The former two are hardware concepts; the latter is a software concept.

But before we get tucked into those details, we should be clear on what sleep() we are talking about. There is quite a variety:

The Posix function sleep() (with a lower-case ess) takes a parameter that specifies the number of seconds to sleep;
The Posix function usleep() takes a parameter that specifies the number of microseconds to sleep;
The Posix function nanosleep() takes a parameter that specifies the number of nanoseconds to sleep;
Boost has the boost::this_thread::sleep() function, which accepts an object specialized from the date_time::subsecond_duration template;
C++ 11 has the std::this_thread::sleep_for() function that accepts an object specialized from the duration template;
The Windows Platform SDK function Sleep() (with an upper-case ess) takes a parameter that specifies the number of milliseconds to sleep;
Other, operating system variations too numerous to list.

For the purposes of this discussion I’m going to assume a generic version of sleep() that accepts a parameter representing time in milliseconds. The concepts scale up or down with the scale of the parameter.

Schedulers

In order to discuss schedulers, I will first define two terms: process and thread . A process is a unit of execution that, within the context of its operating system, contains all of the resources to execute as a stand-alone entity. A thread is usually a subset of a process, and is the smallest unit of executable code that can be scheduled on its own. (A process has at least one thread, but a thread is not necessarily a process.) Processes are sometimes considered ‘heavy-weight’ while threads are considered ‘light-weight’, referring to the amount of resources allocated to each type. Processes have unique address spaces; threads within a process share the address space of the process.

There are several common types of operating environments in the computer world. A single process, non-threaded (SPNT) OS runs one process at a time; Microsoft’s DOS is a good example of this type. A single process, multi-threaded (SPMT) OS, such as General Software’s Embedded DOS, runs only one process at a time, but supplies an interface that allows for multiple threads to execute in that process. A multi-process, non-threaded (MPNT) OS, such as those used in ModComp mini-computers, may have many processes with a single thread of execution. Linux and Windows are examples of multi-process, multi-threaded (MPMT) environments.

A SPNT OS has no real need of a scheduler. The process is started, it runs until completion, and while it is running no other processes can be executed. The other three types have schedulers of varying complexity. What they have in common is a requirement to determine what should be executing at any given time.

Again, depending on the OS, a scheduler may run at a fixed interval, when needed, or both. We’ll come back to this after discussing clocks and interrupts.

CPU clocks

Most CPUs have some sort of circuitry that generates a periodic waveform of alternating ones and zeros. (Clock-less CPUs exist – this discussion is not about them.) The easiest way to represent this signal is shown in Figure 1a. Horizontal lines represent logic levels, with the lower horizontal line typically representing a zero, and the higher horizontal line representing a one. (In reality these horizontal lines represent voltage levels, with the lower line typically at approximately 0 volts DC, and the higher horizontal line typically representing 3.3 VCD or 5.0 VDC, depending on the operating voltage of the logic.) Vertical lines represent the rise and fall of the clock.

Figure 1

In logic datasheets this type of waveform is often represented by Figure 1b. The waveform represented by Figure 1b takes into account the fact that the rise and fall of the clock is not instantaneous; it takes some very small but none-the-less non-zero amount of time. The slope of the rise and fall of the clock is usually exaggerated in these diagrams to make it easy to see; there often are other lines superimposed on the clock signal to indicate logic level hysteresis and timing. (An example can be found at [ TI393 ], Figure 1, Page 6.)

Both of these diagrams are idealized versions of the waveform. In reality, clock signals are much messier. For a good example of an actual waveform, see [ McArdell15 ]. The rising and falling edges overshoot their levels, then dampen out toward the nominal voltage level.

I’m going to use the idealized clock drawing in Figure 1a for the remainder of this discussion.

Interrupts

A CPU interrupt is a signal that causes some out-of-band processing to occur. Your CPU is chugging along merrily, and then BAM – an interrupt occurs. Whatever your CPU was doing is, well, interrupted, and the CPU branches off to do some other, ostensibly important, function. When that important function is complete, the interrupt is released (or cleared), and your CPU goes back to what it was doing, right where it left off.

(There’s a lot that goes on under the hood when processing interrupts, and different CPU architectures implement those details in differing ways. Those details are not important to this discussion.)

Interrupts originate from two places: internal and external to the CPU. Internal interrupts are generated by the CPU hardware itself, or commanded by software. External interrupts are those caused by peripheral devices. These types of interrupts are non-deterministic – they can occur at any time relative to anything else occurring in the CPU. (As an example, consider the input from a keyboard. The CPU cannot predict when a mere human might press a key. The act of pressing a key causes an interrupt in the CPU, so the encoded data can be read from the keyboard and stored for later retrieval by a process.)

Schedulers, part deux

The raw clock rate of today’s CPUs is blisteringly fast, even for low-power embedded cores. ² There are some things that a CPU needs to do that occur at a fraction of that speed, and one of those things is scheduling. (I told you we’d get back to this.)

If we consider a CPU with only one core, and no hyper-threading, it is only possible for one process, and one thread in that process, to be executing at any one time. In a multi-process and/or multi-threaded environment the scheduler is responsible for allocating this limited CPU resource, temporarily suspending the execution of one process and/or thread, and temporarily resuming the execution of a different process and/or thread. (This is called a context switch .)

A scheduler contains a list of all of the processes (and their threads) currently executing. The list contains the context in which each process is running. This context is saved when a process or thread is suspended, and restored just before the process or thread is resumed.

A process or thread can be in one of several states. If a process or thread is executing, it is the current context. A process or thread that is suspended is waiting for something to occur. A process or thread that is ready to execute (but is not executing) is one that can become the current context at any time.

Schedulers determine which process or thread to execute next based on their nature. Two common scheduler types are real-time and time-sharing. In a real-time scheduler, processes are assigned priorities, and the process with the highest priority that is ready to execute will become the current context (and execute) the next time the scheduler runs. A time-sharing scheduler allocates CPU time on a round-robin basis, where each process gets a certain fraction of the available CPU time. A real-time scheduler might allocate time to processes of equal priority on a round-robin basis, or a run-until-completion basis.

Clock interrupts

So how and when, you might ask, does the scheduler run? Mostly, the scheduler runs when an interrupt fires, but it may run at other times, too.

An operating system typically will have some sort of clock interrupt, which causes code to execute that takes care of tasks that need to be processed periodically. One of those tasks is the scheduler.

In the absence of any other (external) interrupt, the scheduler will run whenever the clock interrupt fires. At that time the scheduler will look at its list of processes, figure out which one should be executing, and (possibly) execute a context switch to that process.

Clock interrupts fire at a rate much slower than the underlying speed of the CPU. The rate at which the clock executes is a balancing act: the more often a scheduler runs, the easier it is to balance the load between processes (in a time-sharing scheduler), or the faster a higher priority process will actually execute when it is ready (in a real-time scheduler). On the other hand, the more the clock interrupt fires, the less time there is for actual work to get done, until at its most absurd the only code that is executing is the clock interrupt code.

Looking at Figure 2, the box labelled TCXO at the left is a clock oscillator. It generates a clock at the oscillator’s specified frequency. The box labelled 74LS393 is a 4-bit decade and binary counter [TI393]. The clock signal drives the CLK input of the 74LS393; the resulting four outputs show how the clock frequency is divided: QA is the clock divided by two; QB is the clock divided by four; QC is the clock divided by eight; and QD is the clock divided by 16. ³

Figure 2

A modern clock oscillator divided by 16 may not be much good, but QD could be directed to be the input to another binary counter, whose QD output would be the clock divided by 256. Run the signal through two more binary counters, and you have divided the clock by 65 536. If your CPU’s clock is running at 6 553 600 Hz (slow by today’s standards), you get a clock interrupt 100 times a second, which was a fairly common frequency for a scheduler (modern x86 architecture clock interrupts execute at a faster rate). ⁴ [ Linux2 ] (I am going to assume a 100 Hz clock interrupt for the remainder of this discussion. I am also going to assume a multi-process, non-threaded environment to simplify the examples.)

Earlier I said that a scheduler may run independent of the interrupt. Consider the case where a high-priority process is suspended waiting for I/O to complete. While it is suspended lower priority processes are getting a chance to execute; however, when the I/O completes we want the high-priority process to resume immediately. This typically is accomplished by having the I/O interrupt handler change the state of the process to ready-to-execute and invoke the scheduler.

Delays

“ We’ve got all of the pieces of the puzzle now, ” you say to the SI, who is getting restless. “ It’s just a matter of putting them all together to get the answer to your question. ”

When you call sleep() , its operating system-dependent implementation most likely calls an operating system function that suspends the process and causes a context switch – independent of the clock interrupt. In implementations I’ve seen, the parameter to sleep() – the number of milliseconds to sleep – is converted to an integer equal to the parameter divided by the clock interrupt period. (A 1000 millisecond sleep is equal to 100 clock interrupts.) It is easy for the scheduler to decrement a count every time it is executed as the result of the interrupt firing, and change the state of the process to ready-to-execute when the value reaches zero.

You should be starting to see that the accuracy of the sleep() call is only as good as the underlying clock interrupt frequency. If the scheduler only runs every 10 milliseconds, you cannot get sub-10 millisecond accuracy from your call to sleep() .

Let us consider a system where the clock interrupt fires at our assumed rate – 100 times a second, for a period of 10 milliseconds. If your process is such that it is possible to call sleep() at any time between clock interrupts, a 10 millisecond sleep() call will, on average, cause your process to sleep for 5 milliseconds (in the absence of any other process activity). A 20 millisecond sleep() call will, on average, cause your process to sleep for 15 milliseconds. Take a look at Figure 3 to see why.

Figure 3

In Figure 3 the horizontal line represents time, increasing from left to right, and the vertical lines represent the firing of the clock interrupt. If you can call sleep() at any time, then it is just as likely that sleep() will be called immediately after a clock interrupt (point A) as immediately before a clock interrupt (point B). When sleep() is called at point A the result is an almost 10 millisecond delay; at point B the result is almost no delay at all. Over time this averages out.

If you want to guarantee a minimum amount of delay, you must call sleep() with a value that is at least twice the clock interrupt period. For a minimum 10 millisecond delay that means calling sleep() with a parameter equal to 20, which results in a delay from 10 to 20 milliseconds, and an average delay of 15 milliseconds. (I’m assuming that the intermediate values 11 through 19 round down to 10 milliseconds, since that has been my experience.)

Exceptions to this rounding rule-of-thumb include the Posix routines mentioned above (there may be others). The functions sleep() , usleep() and nanosleep() all guarantee a delay that is at least as long as the value of their respective parameters, unless a SIGALARM signal is generated for the calling process. [ POSIX ] The delays are still subject to the limitations of the underlying clock interrupt period, so for these functions the delay will be, on average, one half-period longer than the clock interrupt period (15 milliseconds for a 10 millisecond delay, 25 milliseconds for a 20 millisecond delay, etc.)

Ironically, a single-process, non-threaded operating system may result in the most accurate delays. The reason is that the sleep() function may just have to waste CPU cycles by spinning in a loop (a busy wait ), since there are no other processes or threads to which to switch. ⁵ In a pseudo-assembly language that might look like Listing 1.

It is possible to guarantee a minimum delay by specifying the value (or next higher value) needed, but in a multi-process (multi-threaded) environment it is not possible to guarantee a maximum delay. The reason for this should be clear – there is no guarantee that your process will be executed on the next clock interrupt context switch. If your process is not the highest priority in a real-time system, or your process is in a round-robin list, one or more other processes may get to execute before the scheduler gets back to your process. Even if your process is the highest priority process, and should be expected to be scheduled on the next clock interrupt, another interrupt higher in priority than the scheduler's may occur, delaying the execution of the scheduler and by extension further delaying your process. This is the ‘other activity’ alluded to several paragraphs ago.

  sleep( 0 )

A sleep() call with a parameter equal to zero often devolves into a relinquish. Calling sleep( 0 ) indicates to the scheduler that your process wants to cause a context switch to allow a lower priority process (or other process in the round-robin list) to execute. Your process remains ready-to-execute, so that it can be executed on the next context switch, assuming no other higher priority process also is ready-to-execute.

Synching to the clock

There is a common pattern in process control software, represented by the following code fragment:

  while ( true )
  {
    // do something, over and over again
    sleep ( some_value );
  }

The first pass through this loop can occur at any time relative to the clock interrupt; however, the second and subsequent passes through the loop are somewhat synched to the clock interrupt. The higher the priority of the process, the more synched to the clock interrupt it will be.

How is this useful? Suppose you have a system that needs to generate regular output pulses to some bit of external hardware in response to some asynchronous (to the clock interrupt) input. (Assume the system’s clock interrupt frequency is adequate to space the pulses.) As in Figure 3, the input can occur any time relative to the clock interrupt. Looking at Figure 4, if the interrupt occurs at point A, and the first pulse is generated immediately, you end up with a short duration between the first and second pulses. If the interrupt occurs at point B, you may get a long duration between the first and second pulses; alternatively, you can get a long pulse. Which one you get is based on the priority of the process. You get the regular-width pulse followed by the long gap if your process is high enough priority to remain scheduled after the clock interrupt. You get the long-width pulse if your process is context switched away from executing.

Figure 4

If you synch to the clock, you have a much better chance of getting the result that you want. ⁶

  sleep ( 10 ) // This synchs to the clock
               // interrupt for each pulse
  {
    Set signal high
    waste clock cycles for the width of the pulse
      (busy wait)
    Set signal low
    sleep ( 10 )
  }

But again, getting these regularly spaced pulses is only possible for high priority processes, or processes running on SPNT systems.

Jitter

If we have a 10 Hz clock oscillator, we expect there to be 10 cycles per second, and we expect that each cycle will have a period of exactly one tenth of a second. Jitter is the deviation from the exact periodicity of that periodic signal. Temperature compensated clock oscillators do a good job of minimizing jitter; unfortunately, the same can’t be said about schedulers.

Figure 5 illustrates this point. Here we have the same waveform, synched to the clock, as Figure 4, just expanded in size a bit to make it easier to visualize. The black lines represent the ideal waveform; this is what our process hopes to achieve. The grey boxes represent one possible example of the areas in which the rising or falling edges might actually occur.

Figure 5

In general, for a real-time scheduler the lower the priority of the process the higher the jitter is likely to be. ⁷ In a round-robin scheduler jitter is the norm rather than the exception. (See Dealing with Jitter the Hard Way.)

Dealing with Jitter the Hard Way

Several years ago I developed a new interface to an old hardware subsystem. The old hardware had been designed to interface to a DEC PDP-11 through a DEC DRV11 parallel input/output board, using a pair of ribbon cables. For the new implementation I used an Advantech PCI-1739U 48-bit input/output board installed in a PC running Windows.

The DRV11 had input and output signals on the same ribbon cable. The bits on the PCI-1739U were configurable as input or output in groups of eight bits, so I designed a little circuit board that connected the signals at each end. The PCI-1739U had one group of outputs on one ribbon cable and a group of inputs on the other ribbon cable. The circuit board placed the signals on the appropriate conductors of the ribbon cables that connected to the old hardware.

Data was sent to the hardware by loading the data into bits configured as output bits on the I/O board, then toggling a dedicated signal to tell the hardware to load the bits. Most of the timing wasn’t critical, and a lot of the jitter didn’t matter, but the width of the signal to load the data had to be within certain limits. No matter what I tried, I couldn’t toggle that signal in software and reliably hit the time window.

I ended up putting a 74LS123 one-shot timer on the custom board [ TI123 ]. The rising edge of my software-generated signal triggered the 74LS123 to generate an output pulse of exactly the right width, using an R-C circuit. The falling edge of my software-generated signal had no effect on the output of the 74LS123. My software generated pulse could be wildly varying in width, while the output pulse of the 74LS123 was a constant width.

Afterword

“ So you see, young intern, the amount of time you sleep does depend on many things, ” you conclude. The intern gets up and wanders off, muttering something about asking the time and being told how to build a watch. “ That’s a clock, not a watch… oh, never mind, ” you say to the retreating SI. “ And my hair is not grey, it’s blond! ” ⁸

Acknowledgments

As always, thanks to Fran and the reviewers. They always make my stuff so much better. I am particularly indebted to the reviewer who pointed me to the Posix sleep routines. I was not familiar with these functions and their minimum delay guarantees. Special thanks to Szymon Gatner for his feedback (see Update on the Monostate NVI Proto-Pattern).

Update on the Monostate NI Proto-Pattern

My last article in Overload [ Schmidt15 ] generated a response from Szymon Gatner. Szymon suggested that all of the functions duplicated from the abstract base class to the Mono NVI class could be replaced with an overloaded pointer operator.

The set of functions represented by foo() from listing 3 (page 11), implemented like this:

  inline void foo ()
  {
    mp->foo (); // call to the virtual function
                // in the concrete class
  }

could be replaced with one pointer operator overload function

  T* operator-> ()
  {
    return  mp.get ();
  }

and used something like this instead of the method used in Listings 5 and 11 (pages 11 and 13):

  typedef mono_nvi_template< abstract_base >
    mono_nvi;
  void nested_function ( void )
  {
    mono_nvi  mnvi;  // assumes its non-default
                 // constructor has mnvi->foo ();
                 // been called
  }

My first impression focused on the unusual way that the pointer operator was being used with a non-pointer variable. Szymon pointed out that “for classes of pointer semantics I think it is something C++ are used to by now. We already have unique_ptr and shared_ptr (and auto_ptr for a long time), also iterators have pointer semantics.”

I encouraged Szymon to write up his idea for Overload . (Fran is always looking for good material; sometimes she has to settle for mine.) He said he would be “happy if you would like to side-note it in your next article”. Here it is, Szymon. Thanks, Bob

References

[Linux1] nanosleep() – http://linux.die.net/man/2/nanosleep

[Linux2] time(7) – http://linux.die.net/man/7/time

[McArdell15] ‘Raspberry Pi Linux User Mode GPIO in C++ [Part 2]’ in CVu , Volume 27, Issue 4, Pg. 17

[POSIX] sleep () – http://pubs.opengroup.org/onlinepubs/009695399/functions/sleep.html

usleep() – http://pubs.opengroup.org/onlinepubs/009695399/functions/usleep.html

Nanosleep() – http://pubs.opengroup.org/onlinepubs/009695399/functions/nanosleep.html

[Schmidt15] ‘Alternatives to Singletons and Global Variables’ in Overload 126, April 2015, Pgs. 9–13

[TI123] SN54122, SN54123, … SN74LS123: Retriggerable Monostable Multivibrators, http://www.ti.com/lit/ds/symlink/sn54123.pdf

[TI393] SN54930, SN54LS390, … SN74LS393: Decade, Divide-By-Twelve and Binary Counters, http://www.ti.com/lit/ds/symlink/sn54ls393.pdf

I shamelessly have stolen two different rhetorical devices here. My first semester calculus professor (whose name I have long since forgotten) used to ask questions by starting off “Suppose you are walking down the street and a stranger asks you…” The idea of using the Summer Intern as a convenient target for all manner of mayhem comes from Stephen Dewhurst, one of my favourite speakers at past Software Development and Embedded Systems conferences.
I use RabbitCore modules for some of my embedded designs, and the slowest clock speed they offered was around 20 MHz. Compare that to the clock on the ModComp mini-computers I maintained for 30 years, which ran at only 5 MHz.
There are inputs to the 74LS393 that are not shown.
This is meant only as an example. There are other, more efficient ways to derive a ‘slow’ clock interrupt, such as using a second, slower clock oscillator as part of the clock interrupt circuitry.
The Linux documentation for nanosleep() , under Old Behavior, indicates that delays of less than 2 milliseconds were handled with a busy wait in kernels prior to 2.5.39. [ Linux1 ]
This documentation also refers to the different clocks available in modern x86 hardware and their effects on timing. [ Linux2 ] A modern x86 CPU running at 3.0 GHz (3.0x109) has a 0.3333… nanosecond (3.333x10-10) period. That means that a call to nanosleep with a parameter equal to one nanosecond will take more time to make the call and return from it than the delay calls for.
In this example the pulse width is generated by busy waiting – wasting clock cycles by executing a tight loop. This is generally not a good idea in any multi-process or multi-threaded system – while your process is spinning nothing else is executing, either, but it is sometimes necessary.
The exception to this is a hard real-time system tuned such that every process always meets its deadline. There should be very little jitter in such a system. I personally have worked on only one hard real-time system – an airplane flight control system; all of the other systems have been soft real-time systems where jitter was an issue.
For a given value of blond.