Thoughts on recursive structures

I find recursive structures very powerful. The idea that you can abstract parts of a structure using the structure itself allows you to build interesting tools to work with it. You can expand this to projects, and then you have projects that can be self-hosted. If I could, I’d only work on projects with this property.

If you look at a tree, you can describe any of its nodes as another tree. Take the following tree as an instance:

  E
 / \
D   C
   / \
  A   B

I can describe C as yet another tree:

  C
 / \
A   B

And this is very powerful, because it allows me to build a set of tools that work on top of trees, and then I can apply this to any level of the tree that I want. This means I can create as many layers of abstraction as I want, but as long as I can represent all of those abstractions using trees, the set of tools I created can still work on the overall thing.

It’s not a surprise that a lot of computer things are built on top of structures like this. Also not a surprise that so many companies want you to know about these structures when interviewing you (although they usually do that through flawed processes).

This becomes very interesting when you start looking at more complex systems using this same idea. Compilers are usually known for this - at some point during the creation of a language, you might be able to write a compiler using the same language you’re creating, and then you have a compiler for language X written in language X!

One of the consequences of this is that you effectively create a dogfooding cycle, and it is my opinion (backed by what little experience I have) that this is the best way of ensuring your project will be as good as you want it to be.

Another recent example that came to mind of a self-hosting project is Sourcehut (the source of this blog post is hosted there). I am a relatively early user of it (when you consider that as of this post’s date they’re still in alpha), but I have no concerns about its (current and future) quality: I know that as long as work is being put into the project, Sourcehut will be at least good enough for Drew’s use case (he’s the creator of Sourcehut, and maintainer of lots of open source projects), and my use case has less requirements than his.

(thinking about it, self-hostable projects seem like a relatively safe investment to make, as long as you’re happy with the creators’ sense of quality)

I wrote all of this so I could say that we should pay more attention to the things we design. There are too many things that could be a lot simpler if designed correctly, but the design misses the opportunity of being recursive. The most common mistake that I see is to treat an individual thing and a group of individual things as two different concepts.

Consider an authorization scheme that defines users, groups, and resources. A user is linked to a person using the system, a group is a list of users, and resources are the things you want to manage with your authorization scheme. This is a very common concept that many systems have. Generally, the design allows resources to be created/owned/read/modified/deleted by users or groups.

However, the design usually goes on so that resources can have a single user as a owner, and then extra care must be taken when a user is deleted (which leads to more code!), you need to support users transferring ownership of resources… All while still maintaining code that does almost the same thing, but for groups. One part of the code applies to a single thing (user), while the other part of the code applies to a list of things (group). It gets even worse if the design treats groups and resources as different things, and allows groups to have a single user as an owner, and now you have the same authorization code all over, but this time it applies only to groups.

This may sound “dumb”, and you may believe a design like this would never exist, but I assure you there are systems like that out there - especially internal systems at companies. I particularly remember a system which defined an owner and a secondary owner of a group (but not of a resource). The only difference between the secondary owner and the owner was that the secondary owner could not change the owner in the group. ¯\(ツ)

Why not join all of that? In a recursive design, everything could just be a list of users. A resource “owned” by a single person will only have a single user in the list. If the resource needs to be shared, just add another user to the list.

If you require more granular read/write/other restrictions, you just introduce additional lists with different permissions. As you work on this, you realize that you want to reference a specific list of users more than once, and you want any updates to this list to apply everywhere the list was being used. Rather than create a group and start writing code for groups, resist the urge and treat a list of users as another resource instead. Then, your code to authorize lists of users to read/modify/delete/whatever a resource will automatically apply to lists of users themselves.

It’s a bit hard to wrap your head around this concept, but when you do, you’ll realize you need to write a lot less code (because your tools apply everywhere, doesn’t matter the level of abstraction), and thus you can focus on making existing code much better.

Another example of this mistake is in certain (most?) workflow systems that I’ve seen out there. They all define a workflow as a set of activities, or steps, but then fail to join both concepts into one. The systems will then allow you to define sets of activities inside a workflow, but workflows themselves will have different properties that don’t easily allow recursion.

To enable recursion in the design, you treat everything as a list of activities. What was previously a “single activity” becomes a list with a single activity in it. Thus, a “workflow” becomes an activity too. With this, you’re able to easily embed workflows within other workflows, and it becomes easier to break an activity that was too coarse into a workflow (because it’s just a list of activities).

This allows all your code to deal with a list of activities in your workflow system to be applied to new scenarios (such as a workflow embedded into a workflow). You also only need to write the code once, which allows you to focus on making it good.

Hopefully by now I was able to better explain this concept: recursive structures/designs are powerful, and they allow you to apply the same code at different levels of abstraction.

I think a powerful thing can be built by approaching distributed systems this way, specifically in a microservice architecture. I may have some ideas and possibly an interesting design, but I don’t really have anything concrete yet. Maybe you have an insight into this? I’d be curious to know.