An opinion on what's a good general-purpose programming language

Sometimes in Twitter or Linkedin I see that people argue "We do not need new programming languages, because the current ones already work". I beg to differ. I think the world needs more robust languages with better defaults now more than any other time.

During an interview with Microsoft, Dr Simon Peyton Jones says¹:

Programming languages are the fundamental material out of which we build programs. When a builder builds a building, they can build out of bricks or out of straw or out of bananas or out of steel girders… And it makes a difference what you build out of, how ambitious your building can be and how likely it is to fall down. So, when developers write programs, the material that they use, the fabric of their programs – the programming language is super important to the robustness and longevity and reliability of their programs. So, programming language researchers study programming languages with the aim of building more robust building materials for developers to use.

We need robust languages, and the current programming languages have flaws in different aspects. If we don't think about fixing them we will stuck with them for the rest of the century. However, fixing them isn't as easy as it seems. Introducing breaking changes in a language is not a peaceful task, and sometimes even impossible. So what should we do? Get used to those flaws and get bitten by it for generations? Or to think about creating an alternative?

It is not in the scope of this blog post to mention every design decision I find as a mistake/flaw for the programming language X, those stuff are brought up in Programming Language holy wars many times already. I want to talk about what I personally find the right design decision for a good programming language, therefore expect this post to be somewhat opinionated, but I try to clarify my point of view with objective reasons wherever necessary.

Design

Some aspects of a language are less related to the implementation and more to how user writes code and the defaults, like immutability and type system. I'll discuss about the features of a good language from the design perspective in this section.

Immutability

Immutability should be the default, meaning that I can not mutate/change the value of a variable. Same applies for the data structures. Some of the pros of Immutability are:

More understandable code which is easier to trace as things won't change in different places
Prevents copying as the binding between the variable and its value won't evolve
Makes multi-threading and parallelism easier
Compiler can make more assumptions for optimizations
Compiler can make more assumptions for type inference

One might even go further and dictate the language to be a pure one (which means mutating is not possible at all, as in Haskell), but even if that's not the case, the mutable variable should be harder to write (like, more characters). This will prevent many unnecessary mutable variables to be declared². I find let for immutable variables, and let mutable for mutable variables a good design:

fsharp

let x = [1; 2; 3] // x can not be mutated, and the list is immutable as well

let mutable y = 2 // y can be mutated, but you need to write that extra `mutable` keyword

Some programming languages also have two separate keywords like var(or let) for mutable and const for immutable variables. In such cases, people can still tend to use the shorter keyword unless it becomes a culture in the community to use the immutable one as the default.

Some other programming languages support immutable variables, but they are harder to write. In C# for example:

csharp

int x = 10; // mutable
readonly int y = 25; // immutable (reference only, for imutable data structures you should use System.Collections.Immutable collections)

As we have seen, a programming language can be somewhere between these levels of immutability:

1. Pure: No mutability is allowed. e.g. Haskell
2. Imutable by default: Immutability is easier, and mutability is harder. e.g. F#, OCaml, and Rust
3. Neutral: Language is neutral about using either one. e.g. JavaScript and Zig
4. Mutable by default: Mutability is easier, and immutability is harder. e.g. C#
5. No Immutability: You can't have immutable variables. e.g Go ³ or Python

I find it ok for a language to be in level 1 or 2 (or even 3 if the community of that language embrace the immutability as the guidelines promote). However, I try to avoid the level 4 or 5 if there are alternatives.

Statically Typed

In a statically-typed language, types are known at compile-time. This enables you to catch many bugs before running your application and facing an error, without writing even a single test case. You will be alerted by the compiler and it will refuse to compile it if you have type errors.

player.jump();
-------^
Function `jump` is not declared for the type `Player`

One might argue that writing types are time consuming, and the bugs are not an issue when programmer can cover those cases with automated tests. In response, one should consider that:

Test cases are not a replacement for types, and vice versa.
Types are not just for avoiding bugs, they are also documentation. You can scroll through the functions and the members of the current variable, and see what it's capable of.
Many bugs are already reported by the compiler that your test cases might not be covering them.
In a language with good type inference and a lightweight syntax for types, you don't even need to write types. So it's not that time consuming.

We will cover the last item in the next section.

Type Inference

Type inference can bring back a lot of people who are tired of writing types to the realm of statically typed languages. I'm not just talking about auto variables in C++ or var declarations in C#, but a more powerful type of inference that can actually infer the whole type of a function by analyzing the body of the function. Imagine you write your function as you do it in python, but the compiler take cares of the rest of the job for you.

For example, in this F# function:

fsharp

type Person = { name: string; age: number }

let sumAges people =
    people |> List.sumBy (fun x -> x.age)

// F# compiler inferes type of `sumAges` to be:
val sumAges: (people: list<Person>) -> int
// Meaning that `sumAges` is a function that accepts a list of `Person` and returns an int

As you can see in the definition of the sumAges function, there are no types provided in the code by the developer, but the compiler infered it as list<Person> -> int.

Type inference can even be more complex than that. For example, if you mark a function as inline in F#, it can overcome some of the limitations of the .NET runtime by inlining the function and improve the inference:

fsharp

let inline myFunc p1 p2 p3 p4 p5 =
    $"${p1} ${p2/(p3 * p4)} {1.0/p5}"

// F# infers this function as:
val inline myFunc:
  p1: 'a ->
  p2: ^b ->
  p3: ^d ->
  p4: ^e ->
  p5: float ->
  string
    when ( ^b or  ^c) : (static member (/) :  ^b *  ^c ->  ^f) and
         ( ^d or  ^e) : (static member ( * ) :  ^d *  ^e ->  ^c)

Compiler improves the inference by choosing the most generic type possible for that function with analyzing how those variables are being used inside the function's body.

Functional Programming

If the language isn't itself a functional programming language, it should not be painful to write programms in a functional approach. Meaning that at the very least:

You can define lambda expressions (if it's an OO language, lambdas are as easy to define as methods)
Functions are first-class in the language. You are enabled to to pass functions as values, or assign functions to variables.
You can build larger scale functions by composing smaller functions
Language respects the functional immutable nature

General purpose

As the title of the blog post suggests, we are not talking about domain specific langauge here. There are good language out there like Agda, Coq or Futhark⁴, but they are not used as general purpose languages.

Non-tricky Performance

Performance should not be tricky, which means:

One should be able to write high performance code using the language in the first place at all.
The high performance code in the language is the idiomatic way you write code in that language. In other words, you don't need tricky hacks to gain performance (like keep FFIing with C , or keep mutating in an immutability-by-default language). The cleanest way to write the code in language X, is the most performance you can get out of X most of the time.

Independence

Design decisions of another language shouldn't f**k up or highly affect the design of our language. Language should be as independent as possible. As a counterexample, the design decisions of the C# language keeps affecting the design of F#. F# doesn't need null at all, but it has to support it to be able to talk to C# and interop with it. I understand that this interoperability enables F# to be able to consume the packages and libraries written in C#, but on the other hand makes its design very dependent to the decisions that are made by the C# or the .NET runtime team. Same thing applies for the languages like Kotlin and Java.

Type-level programming

The language doesn't have to be a total-setup type-theoretic theorem prover, but it should at least be able to do type-level programming to some extent. Languages like Haskell, TypeScript, Zig and F# are promising in this area.

typescript

// Using TypeScript's turing-complete type system, you can
// define a type that checks if a word is palindrome

type IsPalindrome<T extends string | number, K = `${T}`> =
  K extends `${infer L}${infer R}` ?
  R extends '' ? true :
  K extends `${L}${infer S}${L}` ? IsPalindrome<S> : false
  : true


type IsMadamPalindrome = IsPalindrome<"madam"> // true
type IsBroPalindrome = IsPalindrome<"bro"> // false

Compile-time capabilities

Language should be able to do compile-time calculations to some extent. This can improve performance in many cases, and prevent many unnecessary code generations. Zig is promising in this area.

zig

const expect = @import("std").testing.expect;

fn fibonacci(index: u32) u32 {
    if (index < 2) return index;
    return fibonacci(index - 1) + fibonacci(index - 2);
}

test "fibonacci" {
    // test fibonacci at run-time
    try expect(fibonacci(7) == 13);

    // test fibonacci at compile-time
    comptime {
        try expect(fibonacci(7) == 13);
    }
}

Talk to C and native code

Language should be able to talk to C and operating system APIs, without being heavily dependent on them from the user's perspective.

Implementation

There are some other apsects of creating a new programming langauge which are mostly realted to how the language will be implemented. I'll discuss them in this section.

Compiler Tooling APIs

Compiler should have tooling APIs built-in. It makes it way easier for the tooling developer, like LSP, editor extension, linter, code analyer, and formatter developers to easily traverse, analyze, change or even generate codes. A bad tooling can cause a bad developer experience, no matter how good or smart the compiler is.

Cross-platform and Cross-compilation

Language should be cross-platfrom and work well on all major platforms, and compiler should be able to compile code from one platform for another.

Self-compiling compiler

The compiler of the language should eventaully be a self-compiling compiler⁵ — which means, the compiler should eventaully be rewritten in the same language it compiles the codes. This has many advantages. When compiler of the language X is written in X:

Contributors should only know the language X to be able to implement features or fix bugs
Compiler will benefit from performance optimizations, just like any other program written in X
It's easier to detect bugs in the compiler, as the whole compiler is kind of a test case for your code

Footnotes

Functional Programming Languages and the Pursuit of Laziness with Dr. Simon Peyton Jones ↩
Because programmers are lazy and tend to write less characters. ↩
Go has a const keyword for compile time variables, but it's limited and only accepts literals as value. Meaning that you can't do something like: const x = f() ↩
Futhark is not intended to replace existing general-purpose languages. The intended use case is that Futhark is only used for relatively small but compute-intensive parts of an application. ↩
Check out the bootstrapping concept. ↩

Discuss on Twitter