Mountains of software: a romantic and a classical perspective
The terms classic and romantic, as Phædrus used them, mean the following:
A classical understanding sees the world primarily as underlying form itself. A romantic understanding sees it primarily in terms of immediate appearance.
I’ve been reading Zen and the Art of Motorcycle Maintenance , and while talking to several people about topics related to computing and programming and software development, it became apparent to me that we can all benefit from some teachings from that book.
This post explains more logically some things that every mature software engineer already knows intuitively, because that knowledge was built from experience. It was by reading ZAMM that I discovered how to lay that intuition in a way that can be passed to others without having to describe 1001 different situations and examples until the other person “gets it”.
Senses
Before we even begin to explore the classical/romantic perspectives, let’s first talk about one specific thing about humans, because this will aid further analysis. It might seem way too primitive at first, but I encourage you to keep reading. For those who find comfort in the term, we’re taking things from first principles.
Human beings process their environment through their senses. Information is gathered by multiple sensory systems, transformed into signals that are then interpreted by their brains. The degree of sensitivity in our sensory systems is different in every human: some prefer to use their vision, others prefer to use their hearing; some are unable to use a certain system, others are able to mix the information from multiple systems to augment their interpretation of the world. But every human uses their senses.
From a sensorial perspective, let’s try to examine what a computer is. We won’t do a goob job at it, but the exercise is still interesting to go through:
- We have rectangular surfaces that emit light, and we see things in this surface. Some of these things are interpreted as text, others as still images, others as moving images. Depending on what surface we’re interacting with, we may get some feedback from touching it.
- Sometimes, we’ll have another rectangular object, but this time composed of smaller rectangular shapes with glyphs on them. Touching those smaller rectangular shapes will give us feedback, and will often make the things we see on the screen change.
- We might also have yet another object, this time probably oval and contoured in shape (but it also comes in flat rectangular shapes), that we can touch and move. Doing so will often make things we see on the screen change. If you touch it harder on the right spots (heh), it’ll give you some more feedback and cause more things on the screen to change.
- Very rarely, we’ll have other objects that will make sounds depending on what we do with the other objects above. These are not very relevant for the discussion, but they complete the quartet of computer peripherals (screen, keyboard, mouse, speakers).
What we just did was an exercise in looking at a computer from a romantic understanding. This is the understanding associated with the immediate interpretation of our environment. In other words, this is as close to the senses as a computer can get.
Underlying forms
If we decide to continue from where the romantic understanding ended, we can inspect each object more closely and we’ll naturally start to divide them into even more objects. We’ll also discover new ones. I’ll start using some names to keep the text more focused. I’ll discuss names again later.
- Looking at the screen, the keyboard, the mice, the speakers, we notice one thing in common about them: they have cables (save the wireless stuff for later too). If we follow the cables, we’ll discover they’re all connected to yet another object that we weren’t aware of because it looked more like a piece of furniture, and not part of the computer.
- Looking further into this new object, we see that it has several other objects inside it. Lots of cables connecting things all around, some things that look like fans, a surface that is full of tinier objects attached to it, and bigger objects (that look like some flat sticks?) connected to it.
- Back to the keyboard, if we take each key apart we notice that the keys and the keyboard have several components inside. Some things that look like springs, some flat surfaces below the keys, and so on…
- We can do the same for the mouse, the screen, the speakers, but I’ll stop now because you get the idea.
That’s one underlying form of the computer. Let’s look at another:
- Looking at the screen, we see multiple rectangles that are grouping different things. On each rectangle, there’s a bunch of text. Some of that text is decorated in a way that seems similar to the keys on a keyboard, it almost makes me want to touch them…
- After touching some of those, we see yet more rectangles appear, and by reading all the text that shows up and trying to interact with them, we discover that they’re still “part” of the larger rectangle — we can’t really bring these smaller rectangles outside the larger one.
- There are, however, certain things we can do that cause other rectangles to open up with entirely different images. At some level, they seem disconnected from each other, but there are things that I can do in one rectangle that definitely changes what I see in other ones!
This was intentionally harder to understand. I’ll get to the point briefly, after we look at one other form.
- When it powers up, some code on the computer scans the devices connected to it and detects that there is some more code that it can load from another device, which then loads some more code, and so on…
- Eventually this loads the operating system, which then loads even more code for all the drivers to talk to even more devices, and then loads more code for the software that the user wants to run.
- The software the user wants to run requires some more software to be loaded, and so the operating system loads those libraries to make them available for the software.
- The software then performs some initialisation, and might even end up asking the operating system to load even more software for it to operate properly…
What we’ve been doing so far is seeing the computer with a classical understanding. This is a perspective that can start from the romantic understanding (like in the first exercise), but it quickly distances itself completely from that and goes into logical forms and logical structures and imaginary mountains of things that we can also call “the computer”. The computer is defined by each of these other forms as much as it’s defined by the romantic understanding.
The second exercise was still somewhat connected to the romantic perspective, but the third exercise was entirely removed from it: we can’t sense the software! We can’t see it, touch it, hear it… It is entirely a construct in our minds. The closest thing that we could probably say is the “physical form” of the software are magnetic patterns or charges in the transistors in some device in the computer, but we can’t sense any of those.
Representations are a tool used by classical thinking to reason about all of this nonsense (literally! You can’t sense any of it!). And another tool does a lot of heavy lifting for us is a name.
Compare the descriptions I wrote in the second and third exercises. On the second one, I tried hard not to use that many names, but on the third exercise, I didn’t hold back: “code”, “device”, “operating system”, “driver”, “software”, “library”. The second exercise’s description is considerably harder to understand, primarily because of a lack of names. However, the third exercise is impossible to understand if you don’t know what the names actually mean.
The user and the software developer
Throughout software development, there is a lot of tension between software developers and users and every other intermediary involved in the process, from customer support to product managers to technical writers to UI/UX designers and even executives. This tension exists when the software is being developed, when it’s “released” but doesn’t meet the expectations of any of those parties, when it breaks, when it’s being removed, and even when it’s working properly too.
This tension is often observed by software developers disappointed that the others just don’t understand the software at all.
To analyse this tension, we need to understand that the software developer is drowning in a sea of classical thinking, and the user is the closest human to the romantic understanding of a computer and of software. Every other intermediary is a point on a line between those two.
I was reading the other day on Reddit an old story about someone calling a computer shop because their computer wasn’t working right. The technician in the shop did the usual dance of “did you turn it off and on again”, but nothing seemed to fix the problem, so the technician asked the person to bring the computer to the shop so they could look at it. Person ends up hauling their super heavy computer (this story happened many years ago) through multiple bus trips to the shop and shows up only with one of these , and everyone but the person starts laughing histerically.
It’s a funny story, but it illustrates what I mean when I say the user is the closest we can get to sensing the software. The user is generic here: it can even be a fellow software developer who has absolutely no clue what some piece of software is doing.
To explain that tension, “absolutely no clue” is a hint. When we don’t have a classical understanding of the software, all we’re left with are our senses.
The other hint comes from the 3 exercises we did to show 3 different forms of the computer: there are so many ways to “see” the software from a classical understanding that we can fill whole mountains with them, where each path through the mountains is a different slice of the software from the classical perspective.
When putting those hints together, it’s obvious that there will be tension between a piece of software and a user. Software has distanced itself so much from the romantic world that to understand even parts of it requires years of exploring the paths through those mountains.
Even software developers haven’t explored all those paths, and they’re usually the ones creating new paths themselves. Yet they look down on everyone else lower on the mountain, no idea where to go because every path is very steep. Developers think the path to get there is obvious, so they expect others to do the same, and that’s where the tension shows up.
But effort alone can’t get people to go higher. Chances are they’ll just end up taking a path that leads to another side of the mountain without any change in elevation at all.
An illustration
Imagine a team working on some search functionality. The basic form of that is a search box where a user can type what they’re searching for, and after typing it they’re presented with some results. When there are no results, they’re shown a message saying there were no results.
In its basic form, this is usually implemented as a single API endpoint, and it’s fine for the search to take some time (let’s say between 0.2 and 1 second).
The user experience isn’t that great, though, and the reasons are linked to the user being the closest to sensing the software:
- As the user types, they receive no feedback at all. The only feedback they get is at the end of the whole thing, and after waiting a period of time too long compared to the time our brains expect to receive feedback after we take the action.
- Given that the user is searching for something, it’s reasonable to assume they may not fully know what they’re searching for. If they make a typo in their search, they’ll be presented with a message saying there were no results, but what happened is that one (or more) of their search terms isn’t known by the system. This will cause confusion and frustration.
I probably don’t need to ask you to imagine the subsequent conversation between a software engineer and other parties on the team (a designer, product manager, or even a user) because the chances of you having lived through one of these is high.
At first, there’s denial that there is even a problem. To a software engineer, it’s obvious that getting no results on a search possibly means the search terms have a typo. They see the path in the mountains, because they created it. They know where it comes from and where it leads to. This is the tension from one end.
But there is more tension, this time coming from commentary and requests from the other parties:
“If there’s a typo in a term, let’s just show a different message asking the user ’did you mean <correct term>?’”.
“Let’s just show possible search terms or possible results on every character that the user types without having to wait for them to finish the whole query”.
I used “just” in those sentences for an increased effect, but even without it, the tension is still there. It comes from a lack of classical understanding:
“If it took only a few minutes for the engineer to put the message saying there were no results there, then why are they telling me it’ll take much longer to put a message asking about the correct term?”.
“Why are they freaking out that we now need to show the results on every character the user types? Can’t they make it a bit faster?”.
Looking at the whole picture, we see some people upper in the mountain asking the others to meet them there, and the others lower on the mountain asking the upper ones to figure out another path. This is a simple example to illustrate the tension, but it shows up in subtle ways on all other kinds of interaction between people.
Smoothing out
From a purely classical mindset, the best outcome would be to help people lower on the mountains to climb up so they can see things more clearly. It’s satisfying when things work out this way. People feel empowered and more confident with their newfound knowledge. But it’s impractical, so this approach often turns into a trap. Climbing up the mountain requires effort, willpower, and time, and most people will be lacking at least one of those.
We can settle for an outcome that’s still good, but has a much higher chance of success: meet them at the lowest level possible. Sometimes this requires going even a bit lower than where they are and build the required knowledge to speak in terms that they will understand, but don’t require deeper understanding. After all, we all know what it’s like to use our senses, so in the worst possible case, we can start from them.
Once we’re at a common level, we help them look up and see the mountain. We don’t need to show them any paths, all we have to do is show them the mountain.
In practical terms, this means that it’s often the job of the engineer to communicate with the other parties in ways that they can understand.
Some engineers don’t like this because their thrill is climbing the mountain. They are yet to learn that the job encompasses a lot more than that.
Others find it an insult to have to do this after all the effort they took to climb it. Those will have to learn to look more widely to see all the other mountains that aren’t the one they’re on right now. They may also not even know yet they’re on a mountain.
A few remarks
This whole post is a much longer version of phrases that people repeat often: have empathy for the user, communicate with other parties using common words, explain things in more detail to empower others to make better decisions, and so on. These are things that we pass as “wisdom” and use examples or real situations to get people to understand what they actually mean. There’s a lot of romantic thinking in that.
What I like about ZAMM is that it provides us the tools to understand what’s going on more deeply in all of those examples and situations. Once we learn to see the patterns, it becomes easier to identify them in other scenarios we hadn’t encountered before. This is learning that compounds and that we can rely on.
Now, imagine if the computer peripherals were all wireless back on that exercise. How do you even begin to trace a better picture of the system if you can’t use your senses to tell the components apart? I don’t think it’s possible without approaching it scientifically. To most people, the system would remain just like magic. It’s indistinguishable from it.
Because software has no sensory grounds, that is the same feeling that people have when they approach it without any prior knowledge. If they can’t sense their way to identify its components, how are they even expected to learn more about it? Software will remain in the magic kingdom for most of them.
And I don’t know what’s more worrisome — that apparently most people are comfortable with dealing with magic, or that we’re doubling down on making things even more magic.
We should regularly look at the mountains themselves and analyse how approachable they are, particularly the mountains of software and computing. The paths we’re creating on those mountains are extremely narrow and with very few branches on the lower levels. Those mountains are not friendly for beginners, and yet they’re mountains that are present on everyone’s lives. Looming, mysterious.
To enjoy these mountains, most people have to climb to extreme levels because we didn’t bother to build more trails, better trails, and we decided to build the overlooks too high up. We can do better than this. It might require abandoning some paths that we already created, but those are unstable and dangerous already. There’s much better ground if we look elsewhere.