Exploring a future of programming

I think the way we currently deal with programming and computing in general is approaching its local maximum. I have ideas about different ways that I want to explore. In this post I’ll explain why I think this, mention ideas and things other people are doing, and end by talking about the approach I’m taking.

I’ll reference a bunch of talks that I think are really good, and served as inspiration and/or motivation for my work. I recommend you watch them (no particular order) before reading the post, but you can watch them later or as they get referenced.

Most of what I wrote in this post comes from my own experience using and working with software, and opinions I formed over time after working in and observing a bunch of areas in software. I can find enough people that are way more knowledgeable about me on each of the things I talk about, but I don’t think it’s necessary for me to be at the state of the art to talk about those things. This may come across as naive to you, but I hope I address it at the end of the post.

I. The problems at a high level

3 big “trends” that I’ve observed to hold true for each problem I’ll list later on section II. These are all influenced by each other. They’re not orthogonal.

Abstractions are leaky, and there is no perfect abstraction

The following paragraph is a TL;DR for this “trend”, and seems kind of obvious. However, I still think it’s important to ellaborate more, because it seems that we keep forgetting this fact.

Each software implementation has properties that are unique to it. The point of an abstraction is to find a common ground that removes these unique properties, and presents only the common behaviour. However, the systems we build are influenced by the unique properties of the specific implementations we choose to use, whether we like or not, and when this happens, the abstraction we picked may end up working against us.

In certain applications that need code to go fast and process a lot of structured data, programmers usually arrange data as structures of arrays instead of arrays of structures, even though the latter is the logical choice1 for almost any programming language, given its design. This is done because modern hardware has support to speed up a bunch of operations done on sequential data, and the way to take advantage of this is to make sure that data is laid out sequentially in memory.

This is, in my opinion, one of the biggest examples of abstraction leak we have. We have programming languages because we want to abstract over hardware, but here we’re changing the natural/logical structure1 of our program because of how we can take advantage of the implementation details of the hardware (a layer below our code).

You can argue that we just don’t have a programming language that lets us abstract over those details easily, so we just need to make one, and I half agree. Yes, a new language that adds SoA to the layer of abstraction would help this particular abstraction leak, but we still have abstraction leaks in many other places, and in many other levels of the abstraction stack. A few generic examples:

  • a database is designed to take advantage of certain filesystems in a specific way, usually informed by how those filesystems perform, rather than only using the filesystem abstraction as provided by the OS. When a new filesystem that has different implementation properties is used, the database struggles (usually on performance). In some cases, some of the implementations just won’t work well with how the database uses the filesystem, no matter how much you try.
  • a filesystem might have been designed for HDDs, and struggles when running on SSDs due to the differences in their implementations. In fact, those differences are so large that to make proper use of SSDs, we need additional support from the OS (e.g. by sending TRIM commands), or we also need to change other implementations, such as the IO scheduler used by the OS. Only the initial filesystem abstraction isn’t enough anymore.
  • the behaviour of code running in a container can’t be guaranteed to be the same for all container runtime implementations. For example, runc and crun differ in memory consumption overhead and the amount of processes/threads they create, which is affected by any cgroup limits set for a container. In extreme cases, this means that a container may run in one of the implementations, but crash in another.
  • different graphics cards show different behaviours for the same shader. In fact, the OpenGL specification doesn’t guarantee exact output on each device (read the first sentence of Appendix A), which is likely the source of a lot of frustration, since this is an unexpected non-guarantee of an abstraction layer.
  • the difference in implementations of the transport layer of the OSI model are so big that almost every application has to force a specific implementation of it (e.g. they choose whether they want TCP or UDP), even though in theory the transport layer should be an abstraction for upper layers.

There is no perfect abstraction, but we seem to convince ourselves this is true every day, and end up surprised when faced with reality. Abstractions are leaky, and those leaks influence the decisions we make in our code and in our systems.

It is somewhat possible to avoid the rough edges of the abstractions we use. To do that, we not only have to leave (sometimes a lot) of performance on the table2, but we also have to look the other way at any “weird errors” that pop up from time to time, because they’re the indication of this messy interaction between the implementations of the abstractions we’re using. In Preventing the Collapse of Civilization, Jonathan Blow says that software has been free-riding on all the advances in hardware for the past decades: as long as we’re looking the other way, faster hardware masks the other half of the problem. He then shows just how much we’ve been looking the other way all this time.

When we can’t afford that, we have to work around an abstraction or just not use it entirely. Those two things mean more code, more effort to develop and maintain it, more opportunities for things to break.

We have too many layers of abstraction, and they’re often not enough

Let’s go through an exercise.

  1. Imagine I’m writing some code to manage to-do lists. I want the code to save lists, update items in a list, and retrieve a list. My first iteration is something like this (I’m using Python here as an example, but this is mostly irrelevant to the argument):

    lists = {}
    
    def get_list(list_id):
      return lists.get(list_id, [])
    
    def save_list(list_id, items):
      lists[list_id] = items
    
    def update_item(list_id, item_name, new_content):
      if list_id in lists:
        old = lists[list_id]
        updated = map(lambda i: i if i["name"] != item_name else new_content, old)
        lists[list_id] = list(updated)

    I start my code interpreter, load this code into it, and run some commands:

    >>> save_list("house chores", [{"name": "take out the garbage"}])
    >>> get_list("house chores")
    [{'name': 'take out the garbage'}]
    >>> update_item("house chores", "take out the garbage", {"name": "take out the garbage", "done": True})
    >>> get_list("house chores")
    [{'name': 'take out the garbage', 'done': True}]
    

    It gets the job done, even though I had to write some code that really doesn’t matter to the task at hand (managing to-do lists), such as having to convert updated to a list. Still, I have 10 lines of code that are understandable3, even though there is a very small overhead to read it4.

    At this point, my problem is only solved while my code interpreter is running. Stop it, and all my lists are gone. Turn off my machine, and all my lists are gone.

  2. Luckily, my computer includes a device that provides persistent storage, my operating system provides a way to put data in this persistent storage, and the programming language I’m using gives me an interface to talk the operating system, so I can make sure I don’t lose my lists:

    import pickle
    
    def get_list(list_id):
      try:
        with open(f"lists/{list_id}", "rb") as f:
          return pickle.load(f)
      except FileNotFoundError:
        return []
    
    def save_list(list_id, items):
      with open(f"lists/{list_id}", "wb") as f:
        pickle.dump(items, f)
    
    def update_item(list_id, item_name, new_content):
      try:
        with open(f"lists/{list_id}", "r+b") as f:
          old = pickle.load(f)
          updated = map(lambda i: i if i["name"] != item_name else new_content, old)
          f.seek(0)
          pickle.dump(list(updated), f)
      except FileNotFoundError:
        pass

    I start my code interpreter, load this code into it, and run the same commands as before:

    >>> save_list("house chores", [{"name": "take out the garbage"}])
    >>> get_list("house chores")
    [{'name': 'take out the garbage'}]
    >>> update_item("house chores", "take out the garbage", {"name": "take out the garbage", "done": True})
    >>> get_list("house chores")
    [{'name': 'take out the garbage', 'done': True}]
    

    Only that this time, if I stop my code interpreter and start it again, I’m still be able to retrieve the list of my house chores.

    Note that I’m still just managing to-do lists, but the code grew to 1.9x lines, and it gained fault conditions that didn’t exist before, which is somewhat fine, because I introduced an external device in the solution. However, it is harder to understand my intention when reading the code. Every line of code that I added has nothing to do with the task of managing to-do lists:

    • Am I not handling other possible errors because there’s some guarantee that they won’t happen, because I didn’t have time to handle them, I didn’t know any other errors could happen, or some other reason?
    • Am I using pickle for some specific reason, and am I aware of the constraints it introduces to the solution?
    • Why is f.seek(0) in my code? It is not related at all with the problem of managing to-do lists, and not related at all with persisting to-do lists to disk. If we were talking about managing bookmarks instead, this line would be exactly the same in that solution too.

    On top of that, I have to count some overhead for pickle and open() when considering the cost of understanding and maintaining the code5, because those introduce extra fault conditions that I need to know, and they introduce behaviour unique to their implementations. For example, I can’t use pickle with untrusted input, because if I do, Bad Thingsā„¢ can happen.

  3. I won’t bore you with the code, but once I decide that I want to have it running in a more “permanent” way without having to load my code interpreter first, I’ll need to introduce some way to communicate with it, which would add a hundred lines of extra code I have to write, understand, and maintain.

    More realistically, I’ll end up bringing in at least a couple extra libraries and writing dozens of lines. The code will be exposed to multiple extra fault conditions that I should either understand beforehand, or discover the hard way when my code fails at the most inconvenient times. In any case, it increases my mental overhead.

  4. When I decide to make this code available to other people through the Internet, we can increase the amount of code I’ll have to write, understand, and maintain by at least another hundred lines. And that’s only because I’ll be relying on several other libraries that are easily thousands (if not tens of thousands) of lines of code long.

    All those lines introduce yet more fault conditions, but this time the consequences are tougher on me. For example, if I don’t end up replacing my use of pickle with something else more secure, my machine will probably end up mining cryptocurrency for someone else in no time.

    This means that some code that was completely fine for my initial purpose will end up being my downfall once my scope grows a bit more, and all I can do to save myself from this fate is to:

    • read the documentation for every dependency I’m using and hope that every fault condition and/or edge case has been properly documented and explained; or
    • if I want to be extra safe, read the code for every dependency I’m using to understand exactly under what conditions I’ll run into issues when using each dependency.
  5. Let’s not even think about how much I’ll die inside if I ever decide I want to make my solution robust to:

    • Multiple people changing the same to-do list at the same time.
    • The machine my code is running on turning off and bringing my solution down with it.
    • The permanent storage device failing and losing all the to-do lists that were stored in it.

    At this point, my code is easily running into the thousand-line territory, with one or two orders of magnitude more in the dependencies I’m bringing in to help me deal with all of this.

    And I’m still solving the same problem: keeping track of to-do lists.

One could argue that as I add new use cases to my code and make it more robust, I’m also solving other problems too, such as all the usual fault tolerance and distributed systems problems that need solving. However, those are problems that everyone else also has to solve. There is nothing specific to to-do list management in all of that, so shouldn’t we have figured out better abstractions for all of those problems already? Why do I need so much code to use all the extra dependencies and make them work with each other?

What I’ve observed is that in practice, almost every library that we bring into our code is a layer of abstraction with only one implementation behind it. Rare are the cases where we can have a single library with multiple implementations to choose from. And each library also comes with all the failure modes from its implementations, so it’s on us to either avoid those failure modes, or handle them correctly. Take a look at how many dependencies most of our modern software uses, and you’ll understand why everything barely works nowadays.

Here’s a post from someone who worked on FoundationDB, stating that once they figured out a decent way of testing their code, they started deleting all their dependencies, because the dependencies had bugs and they could write better stuff faster. We have way too many layers of abstraction, and almost all of them don’t provide us with good abstractions that let us reduce the amount of overhead we have to work with.

In Stop Writing Dead Programs, Jack Rusher shows that many languages fail at removing enough overhead to give us a good enough abstraction to simply add 1 to every element in an array.

In languages that do provide a good enough abstraction, we have a harder time solving other problems at lower layers, because abstractions are leaky and we’ll have to end up solving things in the wrong layers.

In The Mess We’re In, Joe Armstrong says that to handle failures, we need to understand distributed programming, parallel programming, and concurrent programming.

This is an insane amount of effort, consequence of not having good abstractions that deal with that for us. We need a team of highly qualified engineers to make 10 lines of code work in a fault-tolerant way. Both Jonathan Blow and Joe Armstrong make similar arguments: it’s impossible to “just look at it” when something doesn’t work.

In We Really Don’t Know How to Compute!, Gerald Sussman shows a quote from Huw Evans that agrees that maintaining computer systems is expensive. The proportion of effort devoted to maintenance has been increasing. Sussman says that the real cost of computation is the cost of programmers, and that our code is not adequately evolvable and modifiable.

We’re losing the early generations, and software capabilities are going downhill

So, we have too many layers of abstraction, and each one of them is leaky and influences code in other layers. It’s not a surprise then that there will always be some layer misbehaving in a way we don’t really want, and we either have to spend a considerable amount of effort understanding what’s happening to either fix it (very rare, requires even more effort) or work around it (what usually ends up happening).

To work around the issues, we usually end up introducing yet another layer of abstraction. In simple cases, we introduce a library to solve a problem. In more complex cases, look at the evolution of software deployment as an example: we went from deploying directly to physical servers, to deploying inside VMs, to deploying inside containers, and nowadays with an added layer of orchestration on top of it.

Each one of those steps added one or more additional layers of abstraction, making it much harder to comprehend everything that’s going on. As Jack Rusher said, “Docker shouldn’t exist, it exists only because everything else is so terribly complicated that they added another layer of complexity to make it work.”

We’re constantly adding new layers on top of the old ones, and as Bret Victor says, the worst case scenario would be if the next generation of programmers grows up only being shown the last layers of abstraction (because we “figured it out” for them already), and not really understanding how lower layers work, and that it is possible to change them too.

If we want to fix issues in lower layers, we need knowledge of how they work, why they were introduced, and most of all understand that they’re just ways to solve a problem that we came up with some time ago. It’s easier to learn this from the people who were involved in creating those lower layers, but this is becoming increasingly hard.

The industry we’re in had its origins between the early 1950’s until the late 1960’s. Mapped into generations, we’re at a time when we’re losing the people who worked on lower layers of abstraction, and we haven’t learned enough from them. This is an argument that Jonathan Blow makes in his talk, and I agree with it.

Our software capabilities are going downhill. Technology is degrading because knowledge is being lost6. With each new layer we’re adding to all of this mess, we stop looking at lower layers, and we stop teaching them to newer generations. At some point, we lose knowledge about the lower layers, and we get stuck on a loop of just maintaining what we have. We’re creating extra layers on top of what’s degrading so we can keep going, without realising that our foundations are the issue.

We’re solving real problems in the wrong layers

This is a very short section to clarify that I think many of the problems we’re trying to solve are real. I’m not arguing that those aren’t problems. Making code fault tolerant is hard. Distributed computing is hard. UI is hard.

Yes, there are some problems we’re solving that exist only because of intermediary layers of abstraction, and which would go away if we removed them. However, as a whole, there are some big, hairy, real problems that we’re solving.

The point is that maybe we’re solving these problems in the wrong levels of abstraction. Maybe we should spend some time thinking about how we’d fix problems at lower layers instead.

II. Some problems at a lower level

I want to highlight some specific problems whose solutions are (in my opinion) in the wrong layers of abstraction. I don’t want to spend too long talking about any of these particular issues, because I could easily fill entire blog posts about each of them. The purpose of this section is to elicit thought about how the high-level “trends” show up in different areas of computing.

Ephemeral processes

As Jack Rusher put it, we currently write software for batch processing. Our unit of computation is the process, which starts up with a pristine state, interacts with the outside world, manipulates data, and eventually ceases to exist.

This is an old model that was created because a long time ago, we really computed things in batches. Since then, we’ve found other useful things that we want to do that go against this outdated abstraction.

A useful thing to do is to persist some information, which we have to do outside of the process, because the moment it finishes, all the data it was manipulating is gone. We also want to connect multiple machines together or interact with users, and to do both of these things we also have to do things outside a process, but which are linked to a specific process. For example, if we’re drawing things on the screen and the process dies, the things on the screen also go away. If we send a message to a machine and it replies back, but during that time the process dies, we’ll never process that reply, even though our machine already received the reply.

Just this one model is responsible for so many extra abstractions that we had to come up with to work around the limitations of a process. It’s insane.

Even when we’re writing and testing code, we’re massively restrained by the process: we can’t even fix our code and confirm we solved a problem without stopping everything our code had been doing and starting from scratch. Languages that provide this very useful ability all have to build additional layers of abstraction on top of the process to allow this to happen, but even then they’re still constrained by it.

Big monolithic kernel

Given that the OS encapsulates code in the process, it is no surprise that the kernel code itself was molded by the same model. In extreme simplification, we can view the kernel as one big privileged process, with similar constraints as those of a process.

Cloud and serverless

The modern cloud provides abstractions on a level different from the OS, but those abstractions are in some ways similar to the ones from an OS. Used creatively, one could leverage the cloud as a de facto OS.

For example, we can leverage serverless products to run code in a more distributed manner and with stronger guarantees than the ones a machine-level OS provides. The code we run still closely follows the model and constraints of a normal OS process, but the cloud abstraction provides some interesting guarantees.

However, the modern cloud provides too many abstractions at too many different levels with different degrees of quality. There are so many gotchas with all the different APIs from cloud services that writing code for the cloud is no better than the code we write for machine-level OS. The messy state of the cloud’s abstractions are an observation of Conway’s Law, and the result of needing so many people to work on cloud products. I think the reason why so many people are needed is because the cloud is solving a problem at the wrong level of abstraction.

Given the nature of essentially all modern cloud providers, its design evolved to extract as much value out of its users as possible. Often in sneaky and surprising ways, to the point that it’s not economically viable for some use cases (for example, running all of your code on top of serverless products).

It is also not easily extensible. We can only “extend” by building something on top of the existing products, which means more abstractions and more opportunities to be surprised by its costs.

Distributed computing

I briefly touched on this when talking about the high-level trends. We don’t have enough good lower-level abstractions that help us deal with distributed computing.

Currently, our best abstractions are at the level of programming languages, but this limits us as a field, because we have to solve the same problems over and over in other languages. Libraries don’t help much since they’re still at least an extra layer of abstraction.

Moreover, the programming languages designed to help us with distributed systems also had to make other decisions in their design, which further limits their applicability. They’re still amazing languages, but we could be solving these problems at lower layers.

Persistence of data

I think the abstraction of files and filesystems to be very powerful, but it’s an outdated concept, and modern persistence of data is implemented by additional layers of abstraction on top of files and filesystems.

When we want to query persisted data in powerful ways, we use databases. Those databases are a bunch of additional layers which our code has to interface with, often with the bonus of having to deal with distributed systems problems in our own code that talks to the database too. Out of the billions of man-hours we’ve created of future employment to clean up the mess we’re in, databases probably take a good chunk of it.

Users of our software want to persist photos, books, documents, music, videos, moments of the past and plans for the future. At lower levels, we store those directly as files, but users will almost always work with those files through higher levels of abstraction: a photo viewer/manager (Google Photos or whatever equivalent you prefer), a music tagger and/or streaming software, movie and series streaming software, all sorts of document readers, and so on.

There are too many layers of abstraction on top of files for so many different scenarios, and each one is increasingly being designed to avoid external access.

Versioning behaviour

Even with current solutions, we still have issues versioning behaviour in our code and providing backwards compatibility. If we choose to take an extreme position and maintain backwards compatibility with every version of our code, we add a lot more overhead during development (as if things weren’t already difficult). On the other extreme, we make it harder for everyone else to use or work with our code.

We define versions for our software (a consequence of treating code as batch processing), but we haven’t yet figured out how to guarantee that new minor versions haven’t broken anything new, or figure out the difference between the properties of the new and old versions and their consequences.

In the rare cases we take the time to think ahead and provide ways to perform an automatic conversion between the old and the new, we only do it one way, because why would anyone want to go from the new to the old anyway?

When we try really hard to maintain backwards compatibility, we have to live with the decisions we made and the abstractions we created in a period where we had less information than we do now. And let’s not even talk about when we version code that deals with persistent data, which can be in all sorts of states.

Graphical user interfaces

The same code that we write to deal with the logic of data manipulation and data processing is coupled to the code we write to deal with GUI. This is the content vs. presentation vs. structure debate applied at the code level.

Our UI code also runs in the same old process model, so it’s unexpected that we’d have a hard time separating the code that deals with the content from the code that deals with the rest.

As a consequence, it’s hard to have different GUIs to deal with the same data - it would require active effort to ensure the separation of code that deals with data from other code.

Command line interaction, arguments, options, documentation

As programmers, we use the command line a lot. A few of the talks I referenced already talk about lots of problems with terminals and command line (go watch A Whole New World, Stop Writing Dead Programs, and The Future of Programming), so I’ll just reinforce some of the problems they mention and add a couple more.

I feel like we haven’t been improving at all on how we learn about the commands we want to use and the software we want to run. Each piece of software has its own way to provide help and usage information. We still use man pages. We still bundle so many options into a single piece of software. We have trouble discovering options that are available to us and learning more about them. We still have trouble writing and keeping documentation up to date (man pages, help and usage information are all documentation).

The concept of piping outputs of a command into inputs of another is commendable, but it becomes difficult when input and output formats are determined by each piece of software. We can’t easily extend functionality, so we have to create yet another layer of abstraction (command aliases, shell scripts) to make things work together, and hold them with some glue (yet more commands that manipulate the inputs/outputs). We have a hard time going from command line to GUI. We have a hard time sharing our solutions to make things work together.

We are visual creatures, but we still have not developed good ways of visualising code

“We should be able to draw pictures when we’re coding as well”. This is a quote from Stop Writing Dead Programs. After thinking about the whole “why do we still code in plain text” problem, I think this quote summarises what I feel about this problem.

Our code is one-dimensional. Our editors are just yet more layers of abstraction on top of plain text. We have decorations, we have intellisense, we have plugins and language servers and a lot more. But at the end of the day, our code is text.

I have a lot to say about similarities of writing code and writing other text, especially the work of writing encyclopedias, but I would like to dedicate an entire (future) post only for that. But to spark some thought, did you know that encyclopedia writers figured out the concept of CI/CD before we were even alive at all?

In 1933, the Britannica became the first encyclopaedia to adopt ‘continuous revision’, in which the encyclopaedia is continually reprinted, with every article updated on a schedule. (source)

How come we had to rediscover this?

Code is a way to organise thoughts because of its written medium, but it is also a way to provide instructions (we want a machine to execute them). We also capture and share knowledge in our code, because we expect other people to work with/on our code in the future as well. Code is also art. We write quines, we play code golfs, we write code that makes art.

But we still can’t have a picture in our code, something that is even simpler than text on evolutionary terms.

III. Ideas and examples

Enough about problems, I also want to talk about a few ideas and show some examples that are interesting and may serve as an inspiration to better abstractions for the future. They certainly serve as inspiration for me.

One important thing to take away from the examples is that these ideas can actually be done. They may not currently be applied at the layer of abstraction I’d like them to, but they can be done. This is a huge deal, because the shape of the solution might be transferrable to a different level of abstraction.

Workflows and durable computing

I have written before that I think most of us are writing workflows when programming without realising. I believe that making this more explicit for programmers is helpful.

Once you start thinking as most of your code as solving a workflow that needs to complete, and look at many current tools for workflow orchestration, an idea might pop up that perhaps it’s useful to stop thinking of code as attached to a specific machine, and instead attach it to a system that guarantees that the code will run to completion (or encounter an unhandled failure), doesn’t matter which machine runs it.

This is pretty much what the idea of durable computing tries to accomplish. It provides a different runtime for code.

Workflow orchestration engines are a step towards durable computing, but there are already some solutions that provide stricter guarantees on the “durable” part: Azure Durable Functions, Temporal.io, Restate.dev, Flawless.dev, Golem.cloud, and plenty of others.

Although the solutions differ a lot in the guarantees they offer, how broadly applicable they are, how they achieve their guarantees, and how many extra layers of abstraction they add (or more rarely, how many layers they remove), I think all of those have inspiring approaches.

Actor models, message passing, event-driven applications

Bret Victor mentions actor models in his talk, so go watch it if you haven’t already. As one of the designers of Erlang, Joe Armstrong obviously thinks this is also a good idea. And I’ll quote Jack Rusher in Stop Writing Dead Programs: “…now when I say this, I’m not telling you ‘you should use Erlang’, what I’m telling you is ‘whatever you use, should be at least as good as Erlang at doing this!’”

The actor model is a powerful abstraction that solves a problem when other solutions would need a few layers of abstraction to do the same thing. A few languages introduce the concept of message passing so different code can communicate, instead of having all that code banging on the same parts of memory.

These concepts are a somewhat good fit for event-driven solutions. There’s also some overlap with workflows and durable computing here.

I have this idea that we can express any solution in an event-driven way (you just need to define what you consider to be “the outside”, where events come from). I haven’t done too much thinking about this yet, but if there are cases that disprove the “any solution” bit, I’d be willing to bet they are a small percentage of the whole.

Understanding flows in our code

Static analysis tools and compilers like to figure out the control-flow graph of our code and perform data flow analysis. We have interesting new tools like Flowistry that perform information flow analysis.

Some of this stuff is used to optimise code, but it’s also helpful for humans to understand what’s going on with existing code. I think when we think about different ways to program, the current solutions might not be directly applicable in the new scenario, but their ideas are powerful.

They might allow us to clarify the intent of our code, and avoid some issues that come from introducing the higher layers of abstraction that allow us to be more clear with our intent in the first place.

Interactive debugging and monitoring

This is all about dropping the batch processing model of the world and greatly reducing the time to get some feedback on our code. Spreadsheets are a famous application that showcases this: change contents in a cell, and its value automatically propagates to all other cells that use that value, and you can see the new results almost instantly.

A lot of stuff in visual programming intersects here, and perhaps I should point to this list of visual programming implementations, which has a bunch of inspiring stuff. This also intersects a bit with workflows, and all the flow analysis stuff that I mentioned above. I also like natto.dev, and I find the propagator model inspiring.

Interactive debugging matters because one of the most direct and powerful ways to develop quickly is to just reduce the time it takes to go through the feedback loop when doing work. When building stuff, we require at least a few iterations to come up with a good mental model for the solution we’re working towards. Initial versions may lack in certain aspects, but through iteration we improve the whole, but the first few iterations are very important.

Extending and adapting systems on the fly

By extending, I mean adding new functionality to an already-existing system. By adapting, I mean using an already-existing system for a new purpose. The methods for extending and adapting may at times be the same, and certain changes blur the line between extending and adapting, so I just group both of these into the same thing.

This has a lot of intersection with the previous idea, but differs in purpose. Interactive debugging has purpose mainly during development, but this idea is concerned with systems that are live, in production, where the stakes are usually way higher.

The Linux kernel can do some live-reloading through modules (which limits its usefulness, because not everything is a module). Erlang has some native capabilities to load new and/or updated code, and there are code reloading projects for a bunch of languages, although those all work at a higher level of abstraction than the language runtime itself, and so is prone to issues already discussed.

Almost every system that we currently build to run in production without interruptions lacks these capabilities. We work around it in distributed systems by trying to make changes that are compatible between adjacent versions (at least), and then slowly migrating each machine in the system to the new version.

To gain native extensibility powers, we need a runtime that supports this. Workflow orchestration engines offer this, but again, usually with some trade-offs: if changing existing code/existing workflow definitions, either all currently-running workflow executions keep the old stuff, or they all start using the new stuff. Migration between versions is a difficult problem which can be solved sometimes, but not always.

Sharing and distributing code

Within the same language, we currently have official and/or unofficial tools to share and distribute code, usually in the form of “package” managers.

On a higher level, we have tools like Nix and Guix that change how we build code and manage dependencies, but as a consequence they end up also working as “package” managers, thus providing ways to share and distribute code. On a smaller scale (but still same level), Linux distributions have all sorts of package managers that deal with sharing and distributing code.

However, all of those things are embedded with the same batch processing thinking: either you get the whole “package” or you go look for something else. Not only that, you either get a whole piece of software (that expects to be run in the batch process model), or you get a whole library that may have a lot more than you really need.

These packages sometimes come pre-compiled, or in a state where it’s very difficult to pull only a small bit of functionality from it. Trying to mix-and-match functionality from different binaries is currently almost only a dream, unless each binary is built with the goal to make this possible.

In an ideal world, we’d be able to piece-meal functions all over the place (may be from a binary, or a library, in multiple languages, whatever) so we can focus on composing them together to solve the problem we have. Joe Armstrong mentions this in his talk when talking about the condenser of code.

Ironically, I remember a lot of criticism over npm (javascript/typescript) packages that tried to do that. Each package contained a single function, and the criticism came heavy when some of those packages turned malicious. The npm single-function package approach was an interesting idea, but solved at the wrong layer.

Languages like Unison and darklang provide this function-level sharing at the language level. Other projects like val.town aim to do this at a higher level of abstraction, and GraalVM aims to do it at the language level (as some kind of polyglot tool).

There is an intersection between this and the visual programming stuff, and likely with the interactive debugging idea on a more general level. Also with workflows (e.g. sharing a piece of a workflow, which really is just sharing a piece of code) and the previous idea of extending and adapting systems as well.

Separating UI from the rest of the code

This is the direction that some web applications evolve towards: a thin interface to some CRUD, where all the logic related to data lives in a server, and the web app is only concerning with presenting it nicely.

However, at some point people decide to add fancier features on the UI, and then we start bundling the rest of the code with the UI yet again, only this time some of the code still stays on the server side (can’t trust client-side computation, after all). This comes mainly because we still haven’t figured out a proper way of separating content from presentation.

In a way, a lot of data visualisation tools kind of achieve this by connecting to external data sources and letting us define the presentation somewhat separately from the data. Obviously, they fail to do the same thing when we look at the data visualisation tool itself under the same lens.

I find some products that combine the data visualisation idea with more flexibility on how to show this data interesting, because in a way they’re going backwards on the whole idea of separating content from presentation. There are way too many in this category, but these are interesting because they’ve been adding a few other ideas above and serve as inspiration for something better: Retool, Tooljet, Budibase, Windmill.

Changing the process model, working with massively parallel processors

Going back to the actor model, imagine if each actor was able to run on its own dedicated processor, with its own dedicated memory. I have never worked with one myself, but from reading about massively parallel processor arrays, I think that was the idea with them. The future of programming mentions these as well.

Ever since I touched an FPGA and programmed a few things on it, I always wondered what programming would look like if we had lots of small cores that could be dedicated to a single (or very few) tasks, rather than the behemoth cores we have today that do it all, and switch between lots of tasks as scheduled by the OS.

I haven’t done any research yet into this, but I suspect that for certain mundane tasks (let’s pick e.g. string manipulation) you’ll rarely use specialised instructions from the behemoth cores, so a smaller core that only has commonly-used instructions may fit just as well. Now imagine if we had a whole mesh of smaller cores, each one specialised for certain tasks, and the OS’s job is to schedule actors over those cores efficiently.

In a way, architectures like Arm’s big.LITTLE and Intel’s Performance/Efficiency cores kinda go that direction, but they focus on power efficiency instead. The smaller cores use less power, but lack only some features that the behemoth cores have. They can still run almost any code. And there’s like 10’s of them at most, so it isn’t really massively parallel.

Ever since Intel and AMD shopped around for FPGA capabilities, I thought we’d start going into this territory, but I haven’t followed what new things they’ve done since then. I can say for sure that the mainstream programming community hasn’t seen much from that either.

IV. What am I doing about this?

Can we ever work on different abstractions to explore or build solutions for any of the problems I mentioned above? Given that our field and environment are so coupled to some abstractions, is it even possible to change a single one without changing them all? Changing all the layers kinda implies we’d have to redo at least some decades of work already done on the current layers, and anyone would laugh at the thought of actually doing it.

I have a feeling that over 90% of the problems we have with software today come from the fact that we follow the batch processing model of computing. I’ve always wanted to try looking for something different, but the thought of having to change literally everything else that is based on that foundation kept me from actually doing anything.

Around a year ago, I started playing with a project to experiment with an idea, and the more I thought about it, the more I realised that there are ways to change things without redoing everything. Take Erlang as an example: it provides some really good abstractions that are different from the current mainstream programming world’s, and it manages to do that while running on abstractions that really try to work against Erlang. Obviously we can observe the influence of those lower layers on some of its design, but Erlang’s abstractions are so good at their job that they still generate super decent value. This serves as an inspiration: we don’t need to be perfect from the beginning, but gradually move towards changing more abstractions.

I wondered if the programming language layer is the layer I should aim to start changing these things (this seems to be a popular starting point). A conclusion I reached is that we don’t necessarily need to change the programming language - we can change the runtime, and this is mostly sufficient. Some additional changes to existing languages may be required to ease the programmer into the new runtime, but other projects have shown that this isn’t as big of an issue as it may look like.

There will be some janky behaviour introduced by mixing all these layers, but I think the overall value generated can still be decent. Using an existing language removes one obstacle from the path of adopting a new solution, but comes with the potential introduction of one or more additional layers to make the existing language work in the new runtime.

Changing abstractions isn’t the only path, though. What if we also reduced the number of abstractions we rely on? I think a combination of both strategies can get us far: change a few abstractions at lower layers, and combine with a reduced number of abstractions at upper layers. This would definitely offset the extra layers that come with a new runtime.

Based on this feeling that there are better abstractions to find, I decided to continue working on my project, first part-time and then full-time, and I recently reached a humble milestone: someone else was able to try it out! A single sneeze is enough to make things fall apart, but it was still a milestone.

This post is the beginning of the next development phase I decided to take: I want to bring this project to market. It obviously won’t come even close to solving any of the big problems I already talked about, but I hope it’ll be a step towards that. It will take maybe a decade or two, but I’m set on trying enough abstractions to explore solutions to all the problems I mentioned above.

I have a starting point, which is to enter an existing market with a slightly different approach. Different enough that I believe provides more value than the current solutions. There’s some planning that I need to do to and a lot of development work ahead, but I feel excited about doing this.

On being naive

I’m aware that my views seem incredibly naive to many folks. I wanted to write a few words about why I think this is necessary for any serious attempt at changing the future of programming.

There’s a relevant quote in The future of programming: “Do you know the reason why all these ideas and so many other good ideas came about in this particular time period (the sixties, early seventies)? Why did it all happen then? It’s because technology, it was late enough that technology had kinda gotten to the point where you could actually kinda do things with computers, but it was still early enough that nobody knew what programming was. Nobody knew what programming was supposed to be. And they knew they didn’t know, so they just like tried everything!”

The talk goes on to say more cool things, but this quote captures the mindset needed to begin looking for new abstractions and new ways of thinking about programming and computing. It is super hard, but I’m making an effort to adopt this mindset.

However, adopting this mindset in a “mature” field, where there are decades of programming and software engineering practices already “established”, makes me look naive to everyone else. And that’s why I think it is necessary to be naive from the perspective of (almost) the rest of the world. If this is not true, it means I’m probably not exploring enough.

It is ironic that I need to say “I have no idea what I’m doing” while writing a post that makes it look like I have some idea of what I’m doing. Both things are true at the same time.

In the same talk, Bret Victor also mentions it is hard for humans to change their way of thinking, and there is always some resistance to change and to improvements on how we write software. We know this from history. I also know that it’ll be difficult to introduce something in the market that approaches things differently from what most people expect, which is why I need to plan and (think that I) have some idea of what I’m doing. But I also need to have my mind free to explore new ideas all the time.

We Really Don’t Know How to Compute! has a point that I’ll keep in mind when doing this: whatever I create needs to let people minimize the cost of changing the decisions they make. I think this will greatly help me find good exploration targets and build a product that other people will see value in.

Some encouragement

I have another post that goes into some personal notes about all of this, including how challenging all of this is and how I feel about that. But I find a lot of encouragement in what other folks have said as well:

When talking about all the things we do nowadays to work with and write software, Jonathan Blow mentions that all of these things are so ridiculously complicated, that simplifying them only requires will.

Gerald Sussman says that new ways to think may be worth investigating. They may be complete nonsense, but we have to throw away our current way of thinking if we ever expect to solve these problems (when talking about making code more evolvable in general).

Bret Victor emphasises that “the real tragedy would be if people forgot that you could have new ideas about programming models in the first place.” And also the most dangerous thought you can have as a creative person is to think that you know what you’re doing.

In A Whole New World, Gary Bernhardt says that “when you’re a programmer, and you are the customer, and you’re writing the system, and you’ve been using these tools for 20 years, …, you need to go off and sit in a hammock for a couple years and think hard”.

Ever since I started to think about these ideas over a decade ago, I’d find other people talking about similar things. As my thoughts matured and I gained more experience, I started to grow a bit more confident that maybe I wasn’t as crazy as it seemed, because I kept finding people sharing similar thoughts. There are other people talking about these ideas, there are companies working on products related to these ideas, and I think I see a path to bridge all of these things together (obviously I have no idea what I’m doing), so I’ll go try it out.


  1. When I say logical choice or logical structure, what I mean is that most programming languages were designed to make it easier to reason about code by using arrays of structures. ↩︎ ↩︎

  2. When you think about it, trying to extract more performance from our code is one of the main ways we let implementation details of other layers influence our decisions, and is how we end up getting into a mess in the first place. ↩︎

  3. By understandable, I mean that my intentions are clear when reading the code. ↩︎

  4. I could’ve changed the structure of the data I’m working with (e.g. using a dict of to-do items instead of a list) to reduce that overhead, but this would come with other drawbacks that I’d have to handle in my code, introducing additional code that really wouldn’t matter to the task at hand either. ↩︎

  5. A similar argument can be said for map() and list() from my first solution too, but they act in a “pure” way because they don’t introduce fault conditions, so the overhead of using them is almost non-existent. ↩︎

  6. If you think we don’t lose knowledge anymore because we have the Internet, here’s a toot about a recent event that shows otherwise. If you were surprised by some of the technologies shown on The Future of Programming, I think that also means we’re losing knowledge. ↩︎