How I Balanced Technical Debt and Delivery Pressure in a High-stakes Launch
Making deliberate choices when speed trumps perfection
Listen to the article in a podcast form:
Intro
Our head of engineering set the date. Users were interested, and the opportunity seemed perfect. Yet, our codebase was not ready for it. The architecture was simply not ripe for the extension we wanted to build. But we had to go fast. We all knew we would incur debt.
We’ve all been at the crossroads of speed and technical debt. The pressure to ship features and maintain code health (or, keeping tech debt at bay) is universal, especially during critical launches. It's not if you’ll incur it, but how you navigate it. It’s always difficult to strike the right balance, even if you’ve walked the tightrope between meeting aggressive deadlines and not tipping over the system, many times like myself.
This is not a theory. It's a breakdown of the practical framework and tactics used in a specific, high-stakes situation to make conscious trade-offs and survive the launch (and its aftermath). No magic, just deliberate choices.
The Perfect Storm
The product I worked on was an enterprise loyalty points calculation API. Originally, the same functionality was included in another, broader loyalty program management API. But while going to market we learned that our users didn’t really want a whole product (API) to manage loyalty programs programmatically. They all already had in-house or purchased software that did this for them. Yet, multiple potential users were interested in the points calculation API, as it was one of the features that was difficult to build well.
Upon sunsetting the broader API, we were left with a choice: do we leave the opportunity to go waste, or do we act on the signal from our users?
Myself and my product manager were adamant: this was our opportunity to show the world our smart points calculation algorithms. The signal from the market was solid, we need to use the momentum. Despite initial skepticism, the overall team showed support for the idea.
We kicked off a small development team: three engineers and the product manager, supported by the engineering manager. One senior engineer with decades of experience and a very strong understanding of the domain. The other two were a newly hired senior and a junior engineer. Still, both were technically excellent. The PM was the best in his group, hands down.
The deadline? It was January and we had 3 months to launch the product in private beta. We had users who committed to adopting the new API by Q2, as their business was seasonal and they wanted to use the new loyalty points calculation API when the new season starts – sort of a clean cut off. Doable, but aggressive.
Our existing monolithic codebase, while reasonably organized, wasn't designed for this public API. So, here’s what we were looking at: building a new API, on foundations that never previously supported a public API, with architecture that had to be pried open just enough so the new API would fit, while keeping all other calculations and other functionality operational and working without regressions, while serving over 20,000 users every day. Fixing the debt or reworking the architecture was simply not an option on the proposed timeline - we barely had time to get the basics out by the deadline, architecture work would take many more months.
We decided to bite the bullet, all to not let the prospective users down.
Conscious Choices, Not Accidental Outcomes
Now that we knew that we’ll have to cut some corners, I wanted to put down some principles / framework around it. After going through similar experiences in the past, I’ve found the following four principles that are key to hitting ambitious milestones when you consciously know you will incur debt.
Principle 1: Radical Transparency & Shared Understanding
As a leader, it’s easy to convince yourself that “it won’t be too bad” if you incur some debt. Especially if you’re in a position of authority, your engineers might not even be willing to push back on you. And you’ll think to yourself “if they don’t say anything, it’s probably fine”. While at the same time, they might be fuming about a manager who seems oblivious to the consequences of incurring debt.
That’s why it's key to be radically transparent and open to receive feedback. During the initial phases we had discussions about the debt, the risks, the deadline, and the plan. No sugar-coating. We knew we had to take some shortcuts, and we had to make it abundantly clear to ourselves why they had to be taken.
Practically, I’ve found as an engineering group it’s good to have alternatives. For example, we thought that cracking open a particular class and shoving the API parameters to it was one option. It was viable, but the debt that we’d incur there was astronomical. We challenged ourselves to look elsewhere. Was there perhaps a higher layer in the call stack that could take the API parameters, massage them to appear as if they're just an internal struct, and then pass it down the calculation bit?
Externally looking, we were under a lot of pressure to deliver by our stakeholders. Large users were waiting to use the new API, which would open new revenue streams for our product. This made our business partners willing to move mountains for us. But with that, the pressure was climbing. In high-stakes situations, as leaders it’s on us to be brutally honest about the risks associated with the speed and the debt. I went to our stakeholders and explained that the tradeoff we’re making is that if we choose to not pay the debt, we risk the possibility of severe UX issues down the line (e.g. instability, incidents), effectively losing user trust.
Once senior leadership understood the risks and accepted that this debt must be paid, we were able to proceed with execution.
Principle 2: Ruthless Prioritization via Risk Assessment
All debt is not created equal. Some of it is easy to take on, other not so much. To be able to understand the debt better I like using a simple matrix:
On the X axis, we have an impact on launch success or stability. This reflects whether the debt in question will be a direct threat to the launch functionality or stability. On the Y axis, we have the effort to quickly address the debt. In other words, can the debt in question be fixed/mitigated with reasonable speed.
The way we used this matrix is:
When the debt is high impact and low/medium effort – we have to fix it, it’s non-negotiable.
When the debt is high impact and high effort – we have to mitigate / contain it now. We asked ourselves, how can we isolate it? Would adding monitoring do the trick? Feature flag the related area carefully? Delay fixing fully until post-launch, but actively manage the risk.
When the debt is medium impact and low effort – fix it if the time allows. Basically, a potential stretch goal,or assign if someone is having some downtime or if a new person is ramping up on the project.
When the debt has low Impact and is of effort size – we’d log it and defer it for later. We explicitly added such debt to a post-launch backlog that we didn’t touch until much later.
Principle 3: Define a Specific, User-Centric "Good Enough"
Perfection is the enemy of meeting an aggressive timeline. So, we thought of what good enough means for us. The nature of the new API did not suggest that it required substantial performance or reliability profile - it was a loyalty points calculation API after all. But that doesn’t mean that we didn’t care about reliability at all. Loyalty points are currency of its own, so users do feel strongly about having their points correctly calculated and have them appear quickly enough on their profiles so they can immediately use them.
“Good enough” should not be arbitrary. It must be rooted in your users’ needs. In our case it meant:
Loyalty points calculation must take less than 1 second at the p99. In other words, we wanted 99% of the requests to be fulfilled at 1 second maximum. We identified that 1 second is enough of a sweet spot for private beta by talking with a single user. n=1 is certainly not representable of all users, but it was enough to get us started.
We were peaceful with an availability of 99% - it meant that for private beta we would tolerate ~14m of downtime daily. This was acceptable because we would have few beta users, who have a good relationship with us and would be understanding to any service degradations during beta.
The core calculation path had to be functional, any auxiliary functionality or polish could be safely ignored for the private beta phase.
Test coverage had to be good. This was essential because it’d allow us to continue safely expanding the API once we’d hit the private beta milestone. If we would ignore tests then it’d be almost certain we’d introduce regressions when we’d expand the functionality of the API.
The API documentation had to be sufficient for any new user to be able to onboard themselves, but it’d be accessible only for beta users. The docs availability being limited meant we could ship unpolished docs.
No support for internal tooling, simply emitting log lines for some debugging of beta usage.
Principle 4: Isolate Problems, Don't Chase Perfection
During crunch phases, it’s important to focus on reaching the milestone. Driveby refactoring or fixes must be punted for later. If a particular service or a module is problematic, fixing it would be an anti-pattern. In such cases it’s best to isolate it. This can be a module or a service or any other dependency on the code that you’re writing. In such events, we focussed on putting a facade around the problematic module.
I remember a particular class which tried to just do too much and it was difficult to split up into multiple classes. What we decided to do is to write a wrapper around it, that would encapsulate the complexity of the difficult class which would result in the new API path to be reasonably clean. The idea being once we can clean up the complex class we can just completely drop the “mediator layer” between the API method and the complex class.
We had layers of problematic code that we had to keep the urge to refactor at bay. This was easier said than done, as all of us had the urge to clean up stuff a bit. To make this easier for the group, I proposed a rule: refactoring will be strictly limited to work that directly unblocks items on the critical path, or demonstrably reduces immediate launch risk. Otherwise, it’s slated as a follow-up.
Tactics on the Ground: Making it Happen
Now that we covered the different principles and the context, let’s see how we operationalized the tech debt management in practice.
The IOU List
To manage debt that we incurred, we virtually had an IOU list of tickets. The whole point was to have a list of debt that we have deferred to later, once we reach the private beta phase. These lists have to be visible for everyone, that are agreed on by everyone in the team (that includes your product counterpart as well). As leaders, it’s on us to make this visible and make it top of mind as we go. Adding a new item to the list should be publicly acknowledged, so all stakeholders are aware of the tradeoffs we (the engineering team) are making.
Each ticket should include:
What it is
Why it was deferred (link to risk assessment)
Potential impact
The explicit commitment to address it post-launch (maybe even rough sizing/timeline)
This isn't just a backlog; it's a documented agreement.
In addition to that, we had dedicated "debt triage" meetings. These were short, frequent meetings (maybe 2-3 times/week initially, daily closer to launch) with tech leads, senior engineers, and potentially the PM.
Surgical refactoring
In line with principle 4, we avoided unnecessary refactoring while in crunch. But, on a few occasions we had to perform targeted refactoring, or as I like to call them “surgical strikes". One such example was refactoring the inputs to the loyalty points calculation class, where the main logic was encapsulated. Because of layers of debt in there, the parameters coming from the API had to be transformed into the parameters that were accepted by the calculation method. While we had an engineer write out the logic, in the changeset it was evident that the parameters parsing and morphing was very involved. Not only that they had to patch in some messy logic, they also had to add hundreds of lines of tests, which very clearly told us (yelled at us?) that the class was doing just too much.
Together, while looking at the changeset, we decided that we had to break the fourth principle and do very targeted refactoring. We tasked the engineer who wrote the changeset, accompanied by another engineer, to draft up a quick design on how the refactoring would look. It wasn’t just about writing the code, it was also about rolling it out before the other changes go out. Refactoring such as this one could not be shipped together with the new functionality. To have a targeted refactoring we also wanted to have a controlled rollout - that way, if the refactoring introduced any regressions it would be very easy to catch and fix.
Eventually, the targeted refactoring worked out and we were able to have a simpler abstraction and encapsulation of the API parameters parsing, morphing and passing down the call stack into the internals of the calculations.
Operational Readiness
When leaving some debt in, one thing I’ve learned is: instrument everything. Things that can go good, but especially things that can go bad. We had dashboards on top of that instrumentation, that allowed us to observe the system as it works. It’s crucial to have alerts for any anomalies, and pages if things really go south. It’s essential to be alerted when things start falling over completely. By having these dashboards and alerts, you will have the confidence to defer some fixes.
But monitoring and alerting is not enough of a precaution. It’s essential to have tight controls on the rollout of the changes. That’s why we used feature flags everywhere. They’re not just for new features, they can be leveraged also for new codepaths, or new branches of code path execution that you’re not feeling great about. Using feature flags allowed us, on multiple occasions, to disable problematic sections or fall back to older (stable) logic where necessary without having to push new changes to production.
Testing
Before making changes to existing code, one thing we did was solidify the existing tests. In our case we had test coverage data for the particular modules that we would be changing, so we knew which parts are lacking code coverage. If you ever find yourself in a similar situation where you need to be strategic about the tech debt that you incur, use some analysis tool to surface test coverage. It’ll be eye opening, trust me.
Once we knew which parts were under-tested, we started with beefing up the test coverage in those part of the codebase. Once we were at a healthy level – no, we didn’t strive for an arbitrary 100% – we began our changes. These are three areas where we were quite pedantic about having good tests:
New critical path functionality.
Areas where debt was deferred (especially high-impact).
Key integration points
But, even when being pedantic we also had to remain pragmatic. We agreed that accepting reduced coverage is OK as long as it’s temporary. If we had to accept reduced coverage, we agreed that we would spend a bit extra time on manual/exploratory testing focused on those areas. Plus, we would flag the lack of tests by adding a ticket to the IOU list.
Changing code, carefully
When making changes in brittle modules, it’s essential to maintain rigor. Especially in critical/risky hotspots. Abundance of caution is essential – what we did in such cases is have a primary and secondary reviewer for the changeset. The primary would do the review as normal, while the secondary would try to find any quirky spots or secondary effects that are not immediately obvious. Changesets that were authored via pair-programming, required only a single reviewer.
But all changesets do not require the same scrutiny. We noticed that quite quickly. So we went with a nuanced approach - if a change has a lower risk profile, then a quick review from one person is sufficient. The key here is not to be dogmatic (“two reviewers must review!”) but rather to be pragmatic and leverage extra pairs of eyes selectively.
Lastly, in changesets where debt was ignored (or, worse, introduced) reviewers had to explicitly call out the debt, usually via a comment: “Peaceful with the debt here, logged IOU #123 as follow-up”. That way all debt is accounted for and it’s consciously agreed upon between the author and the reviewer(s).
Navigating the Crunch: The Final Weeks/Days
In the last few weeks until the deadline it’s typical for engineering teams to cut more corners than originally planned. The extra corner-cutting stems from a combination of time pressure and engineers being torn by other responsibilities (advising in reviews, coordinating, meeting XFN partners, etc). In these phases, as an engineering leader, it’s paramount to keep the team focussed on the prioritized tasks. At any last minute attempt of scope creep or a meeting that popped up on their calendars, I jumped in to defend the engineering team’s focus.
By looking at your team’s focus and how they expend their energy, you also support their morale. It’s important to acknowledge the pressure, take things off their plate, especially if it’s of lower priority. I remember the engineers being pulled into interviews which were a distraction from the crunch. In such cases, I chatted with the recruitment manager, explained the situation and agreed with them to take the engineering group off interview rotation for 3 weeks.
In cases of issues or blockers, it’s important to have a swift escalation playbook. What we did is for non-strategic blockers (e.g. blockers on code level) I trusted the engineering group to self-unblock. If there were bigger problems, I gave the team the permission to spin up an near-instantaneous meeting and pull in me (the EM) and the PM of the team at will. Luckily, we had none such instances.
Finally, as leaders we have to stay away from the negativity and the pressure. To do that, I really like celebrating milestones, even though they might not be the big thing that we’re pushing for. For example, I started communicating outwards our progress every time we had a little win. The little wins are not the finish line by any means, but they boost people’s morale. People like seeing their work being celebrated, regardless of its size. And in crunch time, keeping the morale at a solid level is essential.
The Aftermath: Reckoning and Recovery
After hitting the private beta milestone it was time for a celebration and taking stock of the next steps. Our first user started using the API, and they had no notes. A resounding success!
Once we were done yelling from the rooftops about the successful private beta, it was time to start scheming what’s next. First, a quick retrospective: what worked, what didn’t? We held a blameless post-mortem focused on how the debt vs. delivery balance was managed.
Then, it was time to roll up the sleeves. The engineers put lots of trust in my promise that they will get the time and space to fix the debt after we hit the beta timeline. The chickens have come to roost.
I allocated the next three weeks to start tackling the high-priority items from the IOU list. Three weeks wasn’t much, so I gave the option to extend the cleanup period if necessary. Collectively we decided that three weeks is a good start. To me, demonstrating that the commitment to address the debt was real was of highest priority. It’s how you build trust with your people. I also publicized the progress on debt reduction as the team was making progress in the form of status updates.
Another thing we thought about is: how can we avoid accumulating this type of high-risk debt again? As I shared above, we learned a few lessons about continuous tech debt management that we try to carry forward to this day. (See the “Tactics on the Ground” section for a refresher) And of course, pacing. If you strive to be an engineering leader, you have to continuously look for the balance between development pace and quality. In my experience the pace grows linearly with the technical debt. Be intentional with it.
Liked this article? Make sure to ❤️ click the like button.
Got thoughts or feedback? Make sure to 💬 comment.
Know someone who would find this helpful? Make sure to 🔁 share this post.
Want more? Here is how I can further help you
Get one of my books here.
Sign up for 1:1 mentorship, resume feedback, career strategy or an AMA session here.
Get in touch
You can find me on LinkedIn or X.
If you wish to make a request on particular topic you would like to read, you can send me an email to blog at ieftimov dot com.