Stratechgist

Stratechgist

The SHIELD Framework: How to Handle User Escalations Without Breaking Your Team

Practical framework for your next user escalation

Ilija Eftimov's avatar
Ilija Eftimov
Dec 29, 2025
∙ Paid

A while ago, I got pulled into one of those escalations that make your stomach drop.

A major user who’d been with us for years decided to finally integrate with our newly-shipped API. Up until this point they’d been using a competitor’s solution that worked fine, but we had this big partnership announcement planned. Marketing was excited. Their account executive was excited. Then they ran their load tests.

Compared to our competitors, our API was a snail. It returned 2-second response times. Their previous solution averaged 200 milliseconds.

Suddenly I find myself on a Zoom call with my manager, an account executive, and a solutions architect – they tell me about this very unhappy customer who’s questioning whether they should abandon this whole launch. All eyes on me: “When can you fix this?”

Here’s the thing about user escalations: your first instinct is usually wrong. You want to stop the world, gather your team, and start firefighting. Don’t. That creates chaos when you need clarity.

After going through a few such escalations, I’ve developed what I call the SHIELD framework. It’s how you protect both your user relationship and your team’s sanity.

group of people in red and blue shirts
Photo by Michał Parzuchowski on Unsplash

S - Seek Context First

Before you touch your team, understand what happened. I made this mistake early in my career - ran straight to my engineers with “EVERYTHING IS ON FIRE” energy. Created panic, killed productivity, and I still didn’t understand the real problem.

Talk to your internal stakeholders first. In my API case, I spent an hour with my manager understanding the problem from his perspective. Then, I talked to the solutions architect involved. Turns out they were comparing our API to a competitor’s API that is not as feature-rich as our API. Still, it was their prerogative to be unhappy.

Then I talked to their Account Manager: “How did we get here? What commitments were made?” She showed me emails from six months ago where we’d promised “enterprise-grade performance” without defining what that meant.

Next, talk to the user directly. Don’t delegate this, it’ll be in poor taste. Get on a call with their technical team. In my case, I spoke with a senior engineer and his product manager. He walked me through their load testing results, showed me their current integration, explained their go-live timeline.

“Look,” he said, “we don’t need your API to be the fastest. We just need it to be predictable and under 500ms at p50. But right now it’s inconsistent - sometimes 200ms, sometimes 5 seconds. Your p50 is over 800ms.”

That context was gold. The problem wasn’t just speed - it was also consistency.

Write everything down. I created a shared doc with:

  • Timeline of events leading to escalation

  • Technical requirements from user

  • Business commitments made by sales/marketing

  • Current performance metrics vs. expectations

  • Key contacts and their concerns

Escalations get messy fast. Escalations also move at breakneck speed. You need this paper trail. Especially when things get heated. And they will.

H - Hold Your Team Steady

Don’t pull engineers off current work until you understand the scope. Your team doesn’t need the stress of an emergency until you know it’s actually an emergency.

I learned this the hard way. Previous escalation, I immediately grabbed three senior engineers: “Drop everything, we need to fix the user dashboard.” Turns out the “critical” issue affected 12 users and had a simple workaround. I’d derailed at least a month’s worth of roadmap work for something that could wait.

Once you do need them, be surgical. For my API issue, I needed two people: our Staff Engineer who’d built the original system, and a Senior Engineer who specialized in performance optimization. No need to pull in other people where these two would suffice.

Create a communication blackout. I told them: “You two disappear. Turn off Slack notifications. I’ll handle all external pressure. Check in with me once daily at 4PM, otherwise you’re in a bunker.”

Why? Because the moment leadership knows who’s working on the problem, they’ll ping those engineers directly. “Quick question about timeline...” turns into a 30-minute interruption that kills deep work. And the last thing we need is to have these people interrupted. On the contrary, they should be focused on the solution space, not on a Confluence space.

Protect their focus ruthlessly. When our VP of Engineering asked to “just hop on a quick call with the engineers to understand the technical approach,” I said no. “I’ll get you the information you need. They’re heads-down fixing this.”

I - Identify Solutions at Three Horizons

Work with the user to define what success looks like across different timeframes. Don’t just ask “what do you want?” - they might say “make it faster” when what they really need is predictability. When talking to users I always try to remember that quote from (probably?) Henry Ford that goes “If I asked people what they wanted they would’ve said faster horses.” Users don’t know what exactly they want, but hearing them talk about what they want is a good signal. Use it.

Short-term (days to weeks): Immediate pain relief while you work on the real fix.

For my API problem, short-term was implementing request throttling on their account and setting up dedicated sharded infrastructure for their load testing and productionizing. We deployed this in 2 days. Performance improved from 2+ seconds to 600ms - not great, but consistent and close to their 300ms threshold. Not great, yet not terrible.

Also, my PM and I negotiated with Finance to waive their transaction fees for the month. Sometimes the short-term fix is business, not technical.

Medium-term (1-3 months): Architectural improvements that address root causes.

We implemented proper caching, database query optimization, and connection pooling. Added monitoring so we could see performance patterns in real-time. This got us consistent 500ms response times.

Long-term (3-12 months): Complete rewrites or major platform shifts.

We rebuilt the API with a different architecture - using a more modern tech stack, with concurrent primitives supported by the language. The new system could handle 10x the load with consistent latency and half the costs. But this took 8 months and wasn’t ready when the user needed it.

Be explicit about trade-offs. I told the customer: “We can get you stable performance in a week, good performance in two months, and great performance in eight. What matters most for your go-live?”

They chose stable performance with a plan for good performance. Made the decision easy.

E - Establish Communication Rhythms

The biggest mistake I see engineers make: irregular, ad-hoc updates that create anxiety instead of confidence. Then they get polled for updates from different stakeholders which interrupts their focus and flow.

Internally: Daily standups with the core team only. 10am sharp, 15 minutes max. Format:

  • What did you learn yesterday?

  • What are you tackling today?

  • What’s blocking you?

  • Any changes to the estimated timeline?

Then I packaged updates for broader stakeholders without overwhelming engineers. Every evening, I sent a Slack message to our leadership channel:

API Escalation - Day 3 Update:

  • ✅ Implemented throttling fix, deployed to staging

  • 🔄 Load testing scheduled for tomorrow morning

  • ⏰ Customer demo Friday 2pm

  • 🚫 No blockers currently

  • 🔜Next update: Tomorrow 6pm

Externally: Match frequency to uncertainty level. No timeline yet? Daily written updates. Clear path to resolution? Weekly calls work.

For my API customer, I sent daily Slack updates for the first week (we shared a common channel):

“Hi [PM name],

Quick update on the API performance work:

Yesterday: Deployed throttling improvements to production Today: Running load tests on your staging environment Tomorrow: Planning to show you results and discuss next steps

Performance is already more consistent (see attached graphs), but we’re not at target numbers yet.

Any questions or concerns, ping me directly.”

Notice what I included: concrete actions, data, next steps, and my direct contact. No fluff, no false promises.

L - Lead from the Trenches

If your engineers are working late, you’re online with them. Handle the bureaucracy, project management, and organizational politics. Let them focus on building.

Take the admin work. While my engineers optimized database queries, I:

  • Scheduled all meetings with stakeholders

  • Updated Jira tickets and project tracking

  • Coordinated with infrastructure team for new servers

  • Handled communication with Finance about fee waivers

  • Prepared demo materials for customer presentation

Remove organizational friction. When my engineer needed help from the Reliability team, I didn’t say “reach out to them.” I pinged their manager, explained the situation, and got someone assigned within an hour.

When we needed to deploy on a Friday after midnight, I got approval from the on-call engineer explaining the situation and aligning with them instead of making my engineer fight that battle.

Be available, but not overbearing. I didn’t hover over their shoulders. But when they Slacked me at 9pm saying “this query optimization isn’t working,” I hopped on a call immediately.

Sometimes you can help technically. Sometimes you just need to be a sounding board. Always you need to show you’re in it with them.

D - Document and Defend

After resolution, run a retro focused on process, not blame. What broke down? How do we prevent this?

My post-mortem revealed three systemic issues:

  1. Sales promised “enterprise performance” without engineering input - We now require technical review for any performance claims

  2. No load testing in our standard customer onboarding - We built a load testing checklist for large integrations

  3. Performance monitoring was reactive, not proactive - We implemented alerts before customers notice problems

Create new operating procedures. I wrote a “Customer Escalation Playbook” that became our standard. Shared it in our engineering all-hands, posted it in our team wiki, and sent it to other engineering managers.

Claim the victory. I presented our resolution and new processes to the executive team. Not to brag, but to show we’d learned from the experience and built systems to prevent future escalations.

The customer renewed their contract and became a reference account. More importantly, we haven’t had a similar escalation since.

The Meta-Lesson

Most escalations come down to communication and process failures. The technical stuff is the symptom. Your job is to handle everything else so your engineers can do their best work - the meetings, the status updates, the political cover, the late-night Slack from the VP.

Your impact as a leader comes from coordination and protection, not solutioning.


Want more frameworks like this? Subscribe to my newsletter where I share battle-tested approaches to engineering leadership challenges every week.

Let’s connect on this topic: I post daily insights about managing up, scaling teams, and handling the messy reality of engineering leadership on LinkedIn. Would love to hear your escalation war stories.

🎁 BONUS: 4 Templates + 1 Reference Card

Paid Subscribers get:

  • SHIELD Quick Reference Card - One-page framework summary with all six phases (Seek, Hold, Identify, Establish, Lead, Document) in a scannable table format. Pin to desk or team wiki.

  • Escalation Context Template - Fill this out BEFORE touching your engineers. Covers timeline, technical requirements, business commitments, key contacts, and initial assessment. Forces you to understand scope before creating panic.

  • Communication Templates Pack - Four copy-paste templates: daily standup format, internal Slack update, external customer update, and engineer briefing script.

  • Three Horizons Solution Planner - Structured worksheet for planning short/medium/long-term fixes. Includes the customer trade-off discussion script from the article.

  • Post-Escalation Retrospective Template - Retro structure focused on systemic issues, preventive action items, and distribution plan. Process, not blame, as always.

Get them below:

User's avatar

Continue reading this post for free, courtesy of Ilija Eftimov.

Or purchase a paid subscription.
© 2025 Ilija Eftimov · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture