How programmer-friendly are programming languages? Lucian Radu Teodorescu considers how principles from linguistics might allow us to read code with ease.
You walk into your favourite bookstore. Your attention is drawn by a book with good cover design, the title sounds intriguing, you flick through it and find the content interesting, you smell the book. You decide to buy it. This is precisely what happened to me at the beginning of this year when I found Language Unlimited by David Adger [Adger19] (actually, I abstained from smelling the book because of the Covid-19 pandemic).
For years, I have been interested in proper linguistics, but have never taken the time to dive into it. I still remember the times my wife shared the great ideas from the General Linguistics courses that she was attending while in college with me. One of the ideas that profoundly fascinated me was that language structures our thinking (i.e., Linguistic relativity principle [Wikipedia]), as first put forward by Wilhelm von Humboldt. Later, for a few years, I worked on a text-to-speech project and then on an automatic-speech-recognition project, and they gave me an opportunity of seeing language from a different perspective. It was during these years that I started to look with awe at the complexity of human language; I began to think that if we ever fully understood language, we would fully understand the human brain (making computers understand language equates to making them think); since then, I have relaxed my beliefs on the subject, but I still feel that is largely true.
Therefore, buying this book, was a perfect opportunity for me to brush up my knowledge on general linguistics. Besides offering a lot of linguistics information, the book also contains some gems that can be used in Software Engineering.
Main ideas from Language Unlimited
The book starts by explaining that virtually every statement (especially the more complex ones) is novel, i.e., we haven’t heard it before. Think about it; what is the probability that you heard the previous sentence before? Close to zero. And yet, it is easy for us to understand it. This suggests that our mind has a structure that allows it to understand (and produce) language; language is not learned directly from experience. Humans are born with some ability that Noam Chomsky calls Universal Grammar, which allows us to easily process language.
The book revolves around three main ideas: First, that human language is organised in a special way and cannot transgress some boundaries (imposed by human biology). Second, that language is organised hierarchically. And finally, that macrostructures in the language echo the smaller structures that they are built from (i.e., language has some fractal properties).
Languages are built hierarchically. There is no language in which grammar rules are based on sequentiality (e.g., next noun, next verb, 3 words to the right, etc.); all languages are built out of hierarchical structures. Grammars for natural languages are context-free grammars, a term that we study in Computer Science1.
The book describes numerous experiments that show how the language is inherently hierarchical, and deeply rooted in our human nature. From examples of deaf people who create structure in their own invented sign language, to examples of MRI scans performed on the newly born babies that reveal how we have innate structures for processing language. Although other animal species have better abilities than humans to listen to sounds and good abilities in statistical learning, they can’t structure language like we do. The human mind is hardwired for a particular kind of language acquisition; this is what Chomsky calls Universal Grammar; different languages appear as particularisations of this Universal Grammar.
The human languages seem to revolve around the distinction between verb and noun. Moreover, the concept of grammatical Subject, which seem to be an imperceptible property of languages, is central to the construction of a language. And, the distinction seems to never be based on meaning. For some languages, classifiers seem to be important in the distinction between nouns and verbs.
The book also briefly covers the Merge process, introduced by Chomsky in the 90s [Chomsky95]. This process appears to be used in all human languages. If we have two (compatible) units of language, then we can group them together and form another language unit. This grouping can be done hierarchically until the entire phrase becomes a language unit. If this process is deeply embedded in our brain, then this will explain why all the languages are hierarchical. This seems to provide the upper-boundary for the limits of the languages that we can have in practice.
As an example of Merge, one can look at the classic Subject Verb Object sentences. In the sentence Alice sends a message, Alice is the Subject, sends is the Verb and a message is the Object. Here, a message is a noun phrase created using Merge from determiner a and noun message; this phrase is created by Merge from its two constituents. Thereafter, the phrase sends a message can be formed by Merge from the verb and the last phrase. Adding Subject to the newly formed phrase, we obtain the complete sentence.
This Merge process makes the larger structures of the language similar to the smaller structures of the language (like a fractal). For programmers, this means that each sentence can be represented as a binary tree. The binary tree might be slightly complicated in some cases (i.e., with what we can call symbolic links between the nodes), but nevertheless, a relatively simple structure.
Thinking fast language
The entire book talks about how the human brain is hardwired to process language. We are born with Universal Grammar, and, based on the language we hear (or see) around us in early childhood, we constrain this into an actual language. It’s similar to how we are born with the ability to move our body, and then we learn different types of movements (standing, walking, running, various athletic movements, etc.).
Similar to how we can walk without thinking about it, we can process language without having our attention to the structure of the language itself. This is an automatism in on brain. This leads us to the distinction found in Daniel Kahneman’s famous book Thinking, fast and slow [Kahneman11]: our brain can be thought of as operating in two modes: one fast, called System 1, and one slow, called System 2. In the first mode of operation, our brain has the ability to respond quickly to inputs, and it does this in an automatic manner, without requiring our attention. In the second mode of operation, our brain acts only when we pay attention to a particular situation, and typically is associated with complex computations.
Language processing is fast, it’s performed by System 1. On the other hand, processing mathematical sentences is typically handled in System 2, it’s slow, and it requires our attention.
If we think about language as the means of communicating (not necessarily between humans), then the code that we have in Software Engineering is also language. It is a special type of language, but can still be language2.
And, if there is one trait that we want our code to have in common with natural language, that would be the ability to process it easily and very fast. That is, read code without focusing our attention on it, just like reading a novel. This might be a goal that we cannot achieve, but still, the closer we get to it, the better. After all, programming is a knowledge acquisition process (see [Henney19]); the better we are at reading the code, the more attention we can divert to understanding and reasoning about the code.
In this context, it makes sense to look at natural language for inspiration when we design programming languages. The more we can create the structures in programming languages to be compatible with our brain’s ability to process language, the better. Thus, we arrive at the following goal:
Programming languages should be designed in such a way that the code written can be easily processed by our innate language abilities – the programming language should be a specialisation of the Universal Grammar.
It then makes sense to look at how natural languages are constructed and draw some conclusions that might be applied to programming languages.
The different levels of language
Let us think for a bit what the English language means. If we think about the spoken language, then there is a phonetic aspect of the English language that is exercised in speech. If we think about the written text, then there are lexical rules in the language. Both oral and written English have some syntactic rules that constrain how words can form sentences. On top of the syntactic rules, we have semantics, which tell us what different words mean. The semantics and the syntax of a language overlap, but, for simplicity, let’s consider them as being completely distinct.
We also have the distinction between lexical rules, syntactical rules and semantic rules in programming languages as well. We are going to briefly look at some aspects of these levels for natural languages so that maybe we benefit in programming languages.
Let us consider the phrase: Winston Churchill was the best president that Unitd Kingdom had.
Most of the readers can easily understand all the words in this sentence; even if one word contains a typo. We usually don’t focus on the lexical part; we just read the symbols and fill in the gaps. We are using the fast part of our brain (System 1, to use Kahneman’s terms).
Similarly, most of the readers will be able to process the structure of the sentence with no problem. We unconsciously identify what is the Subject of the sentence, what is the Verb and how different words are connecting with other words, forming the structure of the sentence. And most probably, nobody thinks about the Subject when reading this sentence. The parsing of the sentence is done automatically by our brain, without needing our attention. Again, syntax processing for natural languages are done in the System 1 mode.
Things become more interesting when we look at the semantic level. Here, a lot of the readers will probably treat the sentence with increased attention, i.e., using the slow and analytic System 2. Part of the reason for doing that is that UK doesn’t have a president, and part of the reason is that claiming that somebody is the best PM of UK is highly debatable. In general, for sentences containing some novelty, we tend to involve the analytic part of the brain.
Looking back at programming languages, it would be nice if we could design them in such a way that lexical and syntactic processing is always handled by our fast brain, while leaving the semantics to the analytical, slow brain. The understanding and the reasoning about the code are typically much more demanding than understanding regular English texts. Thus, it’s important that our entire attention goes to these processes, which means we should not require the programmer to divert attention to lexical and syntactical processing of the code.
We have yet another argument that syntactic processing needs to be done with the fast brain, and thus we should try to create programming languages to model the Universal Grammar.
The infamous for-loop
I’ve written before how the classic 3-clause for
loop is not programmer-friendly [Teodorescu21]. But, at the time, I’d argued that using this for
loop is bad because of reasoning complexity. The idea is that, to fully understand what the for
loop does, one needs to perform a significant amount of reasoning; significantly more than the reasoning one needs to perform for a ranged for
loop (20 steps versus 8).
Here, I will move the argument to the next level. This structure is not user-friendly because it doesn’t follow the structure of a natural language. This means that, most probably, our brain needs to focus when reading the for
structure. It is precisely what we said above we want to avoid.
For example, let’s look at the follow classic for
loop:
for (int i=0; i<100; i++) console.print(i);
It’s hard to fit this into our patterns of reading natural language. Let’s try it out: for int i equals 0 <pause> i is less than 100 <pause> i plus plus <pause> console print i. This is hard to process by humans. It is like reading a mathematical statement, thus it most probably requires System 2 to process it. It cannot be similar to natural language.
On the other hand, let’s take a look at the ranged for
loop (written in an imaginary programming language):
for (i: 1..100 ) console.print(i);
This can be processed easier by humans; one can say this out loud: for i in range 1 to 100 <pause> console print i. This starts to sound like regular English (not quite there, but close). It means that it can be processed by the fast part of our brain, without requiring our attention.
A simple grammar test
OK, so we have decided that we shall try to create the programming languages to be processed by the fast part of our brain. Do we have a measure to see how close we are to our goal?
Approaching this from a linguistic perspective is the right way to go, but probably that is an overkill for most of us in Software Engineering. But, there might be a trick to make things much easier.
Similar to how in linguistics we use various tests to check the constituency of a phrase (probably the most known is the substitution test), we can have a test that will easily tell us if the structure of the programming language corresponds to our innate structure of the brain responsible for processing language. And that test is simply reading the code aloud.
One should be able to read aloud a piece of code and have it sound like natural language.
This is what we’ve done above for the classic for
loop to prove that it doesn’t quite sound like the languages humans are accustomed to; and we’ve also applied the same test to show that the ranged for
loop can be made to sound like natural language.
On sequentiality
Language Unlimited reveals us a fact about natural language organisation, that for us, software engineers, might sound surprising.
Apparently, there isn’t a single language found on Earth in which sequentiality plays a structural role. Languages do not have rules of the kind ‘rule X applies when a word or type of word is followed by another word or type of word’. For example, one might imagine a rule in which grammatical Agreement works between a noun and the following verb; but this is never the case; one can always insert a structure containing nouns and verbs in between the two words for which we apply the Agreement. Similarly, there are no rules that rely on counting (three words to the right, etc.)
Language is always hierarchical.
Even when we have enumerations, language is hierarchical. Let’s take the following phrase as an example: “Alice, Bob and Carol talk loudly”. We have a hierarchical structure that looks like Figure 1.
Figure 1 |
Here, the first “and” is mute, while the second one is audible. Even if we naively think that we have a sequence of words in the enumeration, our innate abilities process that as a (binary) tree structure.
The point is not that our brain cannot process enumerations fast, but that the elements of the enumeration are processed as being homogeneous. In the enumerations that we find in natural language, we can’t find the first term to have a special meaning compared to the rest of the terms.
Moreover, because of Merge, our brain can assimilate the whole enumeration as one language unit. Thus, we can easily substitute the enumeration by one word or one phrase. For example, the following sentence is equivalent: “They talk loudly”; we’ve replaced the enumeration “Alice, Bob and Carol” with the pronoun “they”.
Translating this to programming languages, we cannot naturally have lists of elements in which some elements (i.e., the first) has a special meaning compared to the rest. This means that a Lisp command line (write a b c d e)
is not necessarily easily parsed by our brains. Similarly, Haskell’s syntax func arg1 arg2 arg3 ...
doesn’t necessarily play nice with the fast part of our brain (especially when having more than 2 arguments).
The syntax of a sentence
Most phrases in English contain a Subject (e.g., doing some action), and a Verb (e.g., expressing the action done by the Subject). There is often another term, called Object, which typically represents the object acted upon. For example, in the sentence “Bob drinks wine”, “Bob” is the Subject, “drinks” is the Verb and “wine” is the Object.
The difference between the Subject and the Object is their position in the hierarchy; the Object is always merged with the Verb (see Figure 2).
Figure 2 |
English is a Subject Verb Object language; but not all the languages are like this. For example, Japanese is a Subject Object Verb language. There are also some languages that have Verb Object Subject ordering, and recently linguists also found some languages that have Object Verb Subject ordering. While there are some languages that have the ordering of Verb Subject Object, these can be explained by a more complex set of rules that involve duplicating the verb and silencing one occurrence (something similar to what English does with auxiliaries).
When designing a programming language, we have to pick a convention. For example, let us pick the Subject Verb Object order as in English. In the following paragraphs, we will be focusing only on the structure of expressions, ignoring other control structures (if
clauses, loops, etc.).
In the most common case, if we ignore any punctuation, our programming language sentences should be of the following form:
subject operation argument
This looks extremely similar to the notation in Object-Oriented languages. The only difference is that we typically have a dot between subject
and operation
and, depending on how we look at it, parentheses around the argument
. That is: subject.operation(argument)
. Now, if we have multiple arguments, they can be similar to an enumeration. That is, one can think of the whole notation being subject.operation argument
, where argument
can be written as (arg1, arg2, ...)
.
I’m not actually advocating here for intense use of OO languages, but in terms of syntax, the common practice for expressing a basic statement seems to be close to statements in natural languages. On the other hand, functional languages seem to be further away from natural language statements; they are closer to mathematical formulation, which tends to make our brain use the slower System 2 part of our brain.
An OO statement like plane.fly()
makes sense syntactically, and it can be easily spoken, too. Similarly, the following statement makes sense syntactically and again can be spoken relatively easily too: rectangle.draw(context)
.
However, I’m not necessarily satisfied by the OO solution either. While a plane can fly, a rectangle cannot draw. Especially, a rectangle doesn’t draw a context. A rectangle can only be drawn by somebody else, i.e., the Subject. But what can be this subject? I think the only reasonable assumption is that the Subject is the actual system that executes the program. Thus, a more appropriate statement would be System.draw(rectangle, context)
, read as “System, draw the rectangle in the context”.
The more we think about this assumption, the more it makes sense for most of the actions in an OO program to be of the form System.action(argument)
. But, if we repeat System
all over the place, it is maybe better to just remove it and have it implicitly there, only when we speak. Thus, we transform our statements into action(argument)
. This is similar to how this is spelled in most of the functional programming languages: doSomething arg1 arg2 arg3
; we would read this as “System do something with arg1, arg2 and arg3”. This sounds a bit better.
Starting from this, we can develop a large analysis of the possible conjugations for verbs in natural languages and how they can translate to programming languages. We could also discuss the grammatical agreement. But, unfortunately, that falls outside the scope of this article.
In natural languages Subject, and the distinction between Subject and Object, seem to be an important part of syntax. In programming languages, the notion of Subject doesn’t seem to be too well-defined. I hope that we can make some progress in programming languages for better isolating the Subject from possible Objects, and with that make the programming language more natural.
Coming back to the structure of phrase, most of the phrases in natural languages are much more complicated than just the 3 terms Subject Verb Object. Each of these terms can have sub-structures. As David Adger explained, the high-level structure of a phrase is similar to the low-level structure of the phrase; it’s Merge all the way down.
Instead of a noun we can put structures that contain determiners (e.g., “the chair”), adjectives (e.g., “red chair”), or we can put entire phrases (e.g., “The man who sold the world was caught by the authorities”). Similarly, we can have adverbs near verbs (“he spoke quickly and loudly”), and entire phrases instead of verbs (e.g., “She read a book in bed before going to sleep”). One interesting case of nesting is the possessive chains, which can go on forever (at least in English); consider for example “My mother’s brother’s wife’s book was lost”.
We often have this nesting in programming languages too. The possessive chains are often used in OO languages (ex: a.b.c.fun()
), but they can be found in other types of languages too. Replacing objects with expressions is almost universal in programming languages. Replacing verbs (i.e., functions) with complex statements is also very frequent in programming languages (ex: using lambdas instead of functions).
Sentences in Sparrow
For years, I worked on a programming language called Sparrow [SparrowRepo] [Teodorescu15]. I wanted to create a language that integrates efficiency, flexibility and naturalness, and the central feature of the language was (paradoxically) static metaprogramming. For a long time, Sparrow was mainly an OO imperative language, but later on started to migrate towards being more functional (the transition was not complete).
Borrowing from Scala, Sparrow has an interesting syntax for expressions. Coupled with the use of ranges, it creates some nice possibilities for expressing some algorithms. Expressions in Sparrow have two forms:
subject operation
– postfix notation for unary operationssubject operation object
– infix notation for binary operations
In both cases, no actual punctuation is needed. The operation can be an operator or a simple function name. Moreover, the operation has name lookup rules that will search near the given subject
.
Let us take an example (actually taken from [Teodorescu15]). Let’s compute the sum of squares for all the odd numbers belonging to the first n Fibonacci numbers. This is achieved in Sparrow by the following one-liner (assuming functions fib
, isOdd
and sqr
are already present):
1...n map \fib filter \isOdd map \sqr sum
No other punctuation is actually needed (previous versions of Sparrow required a semicolon at the end of the sentence, but the last one doesn’t). The backslash is used to transform a function name into an object. This line contains the following operations: ...
, map
, filter
and sum
.
Reading this line from left to right, it sounds like: “the inclusive range from 1 to n
, mapped through fib
, filtered by isOdd
, mapped through sqr
, then summed”. Reading this from right to left, it sounds like: “the sum of the squares of all odd Fibonacci numbers generated from range 1 to n (inclusive)”. In both cases, it sounds relatively well in English.
Let us take another example of the same kind. Let’s compute the root-mean-square of the lengths of all the Collatz sequences up to the first one that has a length greater than or equal to 500. Given a natural number, a Collatz sequence is a sequence of numbers starting with the given number, and repeatedly applying a transformation until we reach 1; although for all known starting numbers the Collatz sequence is always finite, the computer cannot know that, which makes the problem especially interesting. This problem can be solved in Sparrow by the following one-liner:
(1..) map \collatzSeq map \rangeSize takeWhile (fun s = s<500) rootMeanSquare
Here, the structure (fun s = s<500)
is a lambda function. This can be read from left to right in the following way: “the infinite range starting from 1
, mapped through collatzSeq
, mapped through rangeSize
, taking elements while the number is less than 500, and apply rootMeanSquare
for all these elements. Again, this can be read relatively easy.
Looking at this problem in more depth, we start with an infinite range. For each element in that range, we generate a range that is potentially infinite. We reduce these to a finite sequence of numbers by mapping through rangeSize
and by calling takeWhile
with an appropriate predicate. This one liner is pretty complex for its succinctness. Moreover, the tests I’ve done on this form proved that this can be as efficient as writing imperative code (actually, it was faster than traditional code, probably because it exposed some optimisations to the compiler). So, this one-liner is the most efficient implementation of the (no trivial) problem, very succinctly (shorter than the actual program description) and with a syntax that can be easily be read by humans.
I was pretty happy with the results after finishing the design of the syntax in Sparrow. Now, after some time in which I haven’t worked in Sparrow, with the new focus on linguistic structure, I find the results to be even better.
I am not arguing that the syntax of expressions in Sparrow is the best one, and all programming languages should use something similar. For that, we do need a more in-depth analysis. The point I’m trying to make is that we can find syntactic forms that will be easier to read (and to process) by humans. There are ways in which written code sounds natural, and the programmer can process the syntax with the fast part of the brain, while focusing the analytic part on the code semantics.
What can be done next
A collaboration between programming language designers and experienced linguists, and possible neuroscientists would be highly beneficial in order to design programming languages that require only System 1 for processing the syntax of the code.
David Adger brings in his book a lot of evidence that originates in experiments involving MRI scans for humans exposed to language. And, of course, that leaves us pondering whether we can do the same thing for programmers.
If we can have MRI experiments that would show how different sentences expressed in different programming languages are read by programmers, then we can compare different programming languages and different syntactic rules from a naturalness perspective. We could probably find a ranking between different types of syntactic structures. Having that, we can create programming languages that fully exploit the innate structures in our brain and allow us to read code with ease, similar to reading natural language.
Conclusions
Inspired by David Adger’s book, Language Unlimited, this article tried to question how programming languages should be designed to be as close as possible to natural languages. The article doesn’t attempt to provide any answer, but just explores different aspects of language, with the hope of having a first attempt at drawing the space of the problem.
Besides this, the article tries to argue that all programming languages should have a (new?) goal: to make the syntax similar to human language, with the same structure, so that the human mind processes the syntax in the fast mode, leaving the programmer to direct their attention on semantics.
If we achieve this goal, then maybe programmers might start to immerse in the programming language, similar to how people are plunging into language since the day they are born. We can then speak of programming language as the totality of code that can be written and easily understood, as the sea of structures that shapes how we think programming; similar to the way we use the term language to mean a fundamental part of the human existence.
Only then we can fully unleash our creativity in programming. Only then we can have programming language unlimited.
References
[Adger19] David Adger, Language unlimited: The science behind our most creative power, Oxford University Press, 2019
[Chomsky95] Noam Chomsky, The Minimalist Program, MIT Press, 1995
[Henney19] Kevlin Henney, ‘What Do You Mean?’, ACCU 2019,https://www.youtube.com/watch?v=ndnvOElnyUg
[Kahneman11] Daniel Kahneman. Thinking, fast and slow, Macmillan, 2011.
[SparrowRepo] Lucian Radu Teodorescu, ‘The Sparrow programming language’, https://github.com/Sparrow-lang/sparrow
[Teodorescu21] Lucian Radu Teodorescu, ‘How We (Don’t) Reason About Code’, Overload 163, June 2021
[Teodorescu15] Lucian Radu Teodorescu, Improving Flexibility and Efficiency in Programming Languages: a natural approach, PhD Thesis, 2015, https://github.com/Sparrow-lang/sparrow-materials/raw/master/PhD/ThesisLucTeo.pdf
[Wikipedia] Wikipedia, ‘Linguistic relativity’, https://en.wikipedia.org/wiki/Linguistic_relativity
Footnotes
- Interestingly enough, the concept of context-free grammars was invented by Noam Chomsky, the prominent linguist behind most of the main theories of languages advocated in the book. Chomsky didn’t just make major contributions to linguistics, but to Computer Science as well.
- Throughout the article I’ll use the term programming language to mean generically code, i.e., the information encoded according to some predefined rules. This is similar to how we use the term language to express a set of sentences that convey some information (e.g., a text written in English), without referring to a particular set of syntactic and semantic rules (for example of English).
has a PhD in programming languages and is a Software Architect at Garmin. He likes challenges; and understanding the essence of things (if there is one) constitutes the biggest challenge of all.