One of the best projects I worked on had zero-overhead communication
There’s a project I worked on some time ago that provided me one of the most pleasant project management experiences I had.
The work itself wasn’t any complicated: we had to migrate a ton of machines (over 20k!) to a new system. Nothing cutting edge, no new features or products, just a migration to a better system that would allow us to control the machines more smoothly. The trick was that these machines were serving a big number of customers, and we couldn’t just stop them all, we’d have to do it in parts, carefully, to avoid overloading the rest of the machines. On top of that, the code running on the machines was in some cases over 10 years old (yikes!), and of course we didn’t have a single person with knowledge about the entire thing.
The technical side of the project was boring: perform tests with machines not serving production traffic, then slowly roll out the migration to machines serving customers, learn of a few scenarios you missed because the scale of the traffic you tested with wasn’t big enough (a good problem to have?), figure out how to deal with that, incorporate the learnings into the product code, and repeat until everything’s done.
I want to focus on my experience leading this migration and why I think it was a pleasant experience.
Given the nature of the problem, it was somewhat easy to convince management that there really wasn’t a timeline we could realistically follow. Machine migrations take time, you need to do it under heavy control to avoid cases where too many machines are brought offline and you’re unable to serve your traffic. Sometimes you’ll also need to replace new machines showing weird behaviour (even though they’re supposed to be new), because at this scale it becomes impractical to investigate every little issue that shows up on an individual machine (I’d have loved to, but the company’s management style disagreed with this approach). We also had 2 people dedicated to this, and we knew we’d hit some roadblocks along the way, but we didn’t really know which ones. Nobody knew 100% of the code running on the machines, as it was a big and old codebase.
Even though it was hard to have some kind of “hard” deadline, there was also a direct, visible way to measure progress: count the number of machines migrated, so as long as progress was being made on a reasonable pace, everyone was happy. We still aimed at moving things as fast as possible, because reasonable pace had a different meaning for each person involved on the project.
Throughout all this, I felt like the other person working on this project was completely in sync with me. They didn’t know everything involved in the migration (I handled a lot of human coordination transparently for them), but they knew enough about the code running on the machines to understand all the prep work we’d need, including all the internal systems we’d need to work with (which were many, and had complex knobs to tune). They knew about some edge cases we’d need to cover. When talking about the initial steps for this project, things went so well and we finished discussing so quickly that I got worried they hadn’t understood some of the things we’d need to do. However, I let them roll with it, and when reviewing their code patches I was really surprised: in the few minutes we talked about the prep stuff, we’d really synced about the approaches to take, and just started doing it. The patches were good.
Contrast this to either of the following two (way more common) scenarios I encountered in the past: someone needs a lot of help to understand parts of the code or the system, and needs some mentoring to figure things out; or someone holds some strong beliefs about the way things should be done, and an unproductive discussion ensues about how to do things. Both scenarios end up taking time and energy to converge.
When we found a roadblock in this project, syncing up was so easy and smooth: no strong opinions, we knew which approaches would solve the problem, and we favored gettings things done rather than finding some mythical perfect solution. Reporting on the progress of the migration was also pretty smooth. We kept a simple spreadsheet with things that were already done and things that still needed to be done, and both of us dedicated a few minutes every day to update it. If we were to track those using the company’s task management system, it would’ve taken much longer to update information the same way we did with the spreadsheet. There were days we didn’t even talk at all, because we knew what needed to be done, knew it would take a bit of time to work on those, and later we could just check on whether things got done by glancing at a couple tables. No standups required.
Overall, it felt like I was just working with an extension of myself that had a different thought pattern. We both came up with different approaches to solve the problems we faced, but we also knew well how to evaluate them, and knew when we had reached a good enough state and discussing more wouldn’t be as productive.
After the project ended, I discovered that I got addicted to the way we were working and managing ourselves. It was just so pleasant that I didn’t want it to end. To this day, I regularly think about this project, and wonder if I’ll ever be part of or help build a team full of people who can work like this. Is this how high performing teams work? Is this how working at productive companies feels like? If it is, I want more of it.