Featured

Designing an Azure Web App: A Conversation That Went Longer Than Planned

A detailed conversation between a senior architect and a junior architect on designing a production-ready Azure web app — where database pressure, latency, security boundaries, and observability are worked through in real time.

Gaurav Sharma

08 Feb 2026 • 10 min read

The junior architect finishes the walkthrough without rushing.
He has done this many times before.

The architecture is familiar, almost reassuring.

Azure Front Door sits at the edge.
Traffic flows to an Application Gateway inside the virtual network.
The web application runs on Azure App Service.
Azure SQL stores transactional data.

Everything is managed. Everything is supported. Everything looks reasonable.

The design review starts like any other

Junior Architect

“So this is where we landed after a few iterations. We deliberately stayed away from anything exotic. No unnecessary services, no premature complexity. It’s mostly standard Azure building blocks. The idea was to keep the system predictable and operationally simple.”

He waits, expecting a nod.

The senior architect doesn’t nod immediately. He stares at the diagram longer than usual.

Senior Architect

“Let me ask you something before we talk about services or patterns. When you say ‘predictable’, what kind of situations are you imagining? Normal load? Gradual growth? Or moments when things don’t behave politely?”

Junior Architect

“Well… all of those, ideally. Traffic spikes, for example. The app tier auto-scales, so we should be fine. And since everything is managed, Azure takes care of a lot of the failure scenarios for us.”

Senior Architect

“Alright. Let’s take just one of those situations and slow it down. No abstractions. No diagrams. Just sequence of events.
Imagine traffic doubles in ten minutes. Not because of a campaign — just organic usage picking up faster than expected. Walk me through what happens in the system.”

Junior Architect

“Okay. Front Door routes the traffic as usual. App Service detects increased load and starts scaling out. New instances spin up. Requests get distributed across instances, so response times stay within limits.”

Senior Architect

“And at the same time — without changing anything else — what happens to the database?”

The junior pauses. He already knows where this is going, but he answers honestly.

Junior Architect

“The database gets more concurrent requests. Each new app instance opens its own set of connections.”

Senior Architect

“Exactly. And here’s where most Azure designs quietly step into trouble.
Auto-scale reacts to symptoms at the web tier — CPU, memory, request count. But the database doesn’t see symptoms. It sees pressure. Immediate, multiplied pressure.”

He continues, not hurried.

Senior Architect

“One App Service instance might hold, say, fifty or sixty active database connections under load. When five new instances come online, you haven’t just added compute capacity — you’ve multiplied database concurrency by five. And the database didn’t agree to that. It wasn’t consulted.”

Junior Architect

“But Azure SQL is designed to handle concurrency. We’re not running on a single on-prem box anymore.”

Senior Architect

“It is designed for concurrency — within defined limits. Those limits are not elastic in the same way your app tier is.
Connections consume memory. Queries hold locks. Transactions stretch under latency. When queries slow down, connections remain occupied longer. When connections remain occupied, new requests wait.”

He leans back slightly.

Senior Architect

“And then something very predictable happens. The application starts retrying. Timeouts trigger. Retries pile on top of slow responses. What started as a small delay turns into a feedback loop.”

Junior Architect

“So instead of failing fast, the system slowly suffocates.”

Senior Architect

“Yes. And from the outside, nothing looks broken. No crash. No outage. Just rising latency and confused users.”

The junior looks back at the diagram again.

Junior Architect

“So what would you do differently here? Not theoretically — practically. We can’t just turn off auto-scale.”

Senior Architect

“I wouldn’t turn it off. I would contain it.
The real problem isn’t scale. The problem is that every request believes it has an equal right to the database.”

He continues carefully, building the idea step by step.

Senior Architect

“In a real system, not all requests are equal. Some are essential. Some are optional. Some are noise.
The first thing I’d do is make that explicit in the application itself.”

Junior Architect

“How?”

Senior Architect

“By introducing intentional limits.
For example, I would cap the total number of concurrent database operations across the entire app tier. Not per instance — globally. That way, when the app scales out, it doesn’t automatically gain more permission to overwhelm the database.”

Junior Architect

“But that means some requests will have to wait.”

Senior Architect

“Yes. And that’s not a failure. That’s back-pressure.
Waiting keeps the system alive. Unlimited concurrency kills it.”

He continues.

Senior Architect

“Then I’d look at access patterns. Which endpoints genuinely need fresh data every time? Which ones are read-heavy but tolerant to slight staleness? Those get cached — not to make them faster, but to remove load from the database entirely.”

Junior Architect

“So caching becomes a safety mechanism, not an optimization.”

Senior Architect

“Exactly. And write paths get special treatment. Fewer of them. Stricter limits. Shorter transactions. Clear timeouts.
Once you do this, auto-scale stops being dangerous. It increases throughput without multiplying chaos.”

The junior nods slowly.

Junior Architect

“And the database? Surely at some point we still need to scale it.”

Senior Architect

“Of course. But only after you’ve ensured it’s doing necessary work.
Scaling Azure SQL before removing avoidable load is like widening a road while leaving traffic lights broken. You move the bottleneck, you don’t remove it.”

He adds nuance.

Senior Architect

“When I scale a database, I want confidence that queries are bounded, transactions are short, and access patterns are predictable. Then scaling becomes a response to real growth — not a cover for architectural shortcuts.”

At this point, the junior is fully engaged.
He stops seeing “services” and starts seeing “behaviour”.

1) Regions and latency: “We deployed in India” is not the same as “users feel it’s fast”

Junior Architect

“Okay, database pressure is clear. Now latency — we already placed everything in Central India. Most users are in India. So aren’t we good there?”

Senior Architect

“Let’s not answer that with a region name. Let’s answer it with a user action.
Pick one journey. A user opens the app and logs in. Walk me through every dependency call that happens in those first 10 seconds.”

Junior Architect

“Front Door edge receives the request, routes to the backend. App Service serves the UI. Then the user logs in and we hit Azure AD for authentication.”

Senior Architect

“And Azure AD for your tenant is in which geography?”

Junior Architect

“Europe.”

Senior Architect

“So your user is physically in India, your app is in India, but the most emotionally sensitive step — login — depends on a Europe hop.”

The junior nods, but still looks unconvinced.

Junior Architect

“Sure, but it’s just one hop. That shouldn’t be huge.”

Senior Architect

“It usually isn’t huge on paper. It becomes huge in repeat behaviour.
Now tell me — after login, do you validate token once and trust it for the session, or do you keep calling identity-related endpoints again and again?”

Junior Architect

“Well, we validate on each request. We also fetch user profile claims sometimes.”

Senior Architect

“That’s your latency leak. And it’s a common one.
You can’t move Azure AD, but you can redesign how often identity is on the critical path.”

He opens the laptop again and speaks like he’s whiteboarding.

Senior Architect

“Think of this as two layers:

Authentication: ‘Who are you?’ — expensive, involves identity providers, cross-region calls.
Authorization: ‘Are you allowed?’ — should be local, fast, deterministic.”

Junior Architect

“So what do you do, practically?”

Senior Architect

“Three simple choices that change everything:

First: once authentication is done, don’t keep doing identity round trips.
Use short-lived access tokens, validate them locally, and keep your authorization logic in your app/API boundary.

Second: cache user context that is safe to cache.
Your app doesn’t need to fetch user profile or roles on every request. Even caching it for a few minutes can remove a large percentage of calls.

Third: isolate third-party calls from your UI response path.
If something is in the US and it’s slow today, your entire app shouldn’t feel like it moved to the US.”

Junior Architect

“Wait — caching roles/claims… is that safe? What if permissions change?”

Senior Architect

“Good question. That’s where design becomes practical instead of idealistic.
If roles can change at any moment and must apply instantly, cache for a very short window or use a versioning strategy.

For example:

issue a claim like permissionsVersion
store current version in your system
if mismatch occurs, refresh context

But in most business systems, permissions don’t change every minute.
So caching is not just safe — it’s sane.”

Junior Architect

“And for third-party API isolation, what does ‘isolate’ mean in Azure terms?”

Senior Architect

“It means: don’t let a slow dependency sit inside your main request thread for a long time.

Practically:

keep strict timeouts (not 60 seconds “hope mode”)
implement circuit breaker behaviour (if it’s failing, fail fast for a short period)
if possible, convert to async: queue the work and notify later
and for UI, degrade gracefully: show partial data instead of freezing everything”

Junior Architect

“So the region decision isn’t just ‘where the app is’. It’s ‘which dependencies are allowed to stay in the user journey’.”

Senior Architect

“Exactly.
Most teams ‘choose a region’ once.
Senior teams design the journey.”

2) Security: the goal is not only “safe”, it’s “safe under load”

Junior Architect

“Security is the next topic. We already have WAF on Front Door, and the database is private. Isn’t that enough?”

Senior Architect

“It’s a good start. But let’s test it with a realistic failure mode.

Imagine not a hacker movie.
Just a sudden surge of junk traffic: bots, scrapers, repeated login attempts, oversize payloads, random endpoints. What happens to your system resources?”

Junior Architect

“WAF blocks a lot of it. Some will still reach the app.”

Senior Architect

“And what does it mean when junk reaches the app?”

The junior thinks.

Junior Architect

“It consumes app CPU, threads… maybe even hits the database if validation happens late.”

Senior Architect

“Exactly. Security mistakes often show up as performance incidents.

So the rule is:
Reject traffic before it becomes expensive.”

Junior Architect

“But how do you do that without blocking real users?”

Senior Architect

“You don’t block “users”. You block behaviour.

For example, at the edge you enforce things like:

request body size limits
rate limits per endpoint type (login is different from search)
geo or ASN restrictions if your business scope is limited
bot protection rules if your app is frequently scraped

And you do it with tuning, not aggression.”

Junior Architect

“Okay, but WAF rules sometimes feel generic. How do we tie them to our app endpoints?”

Senior Architect

“That’s exactly the step most teams skip.

They enable WAF in ‘prevention mode’ with default rules and stop there.

Instead, you create policies aligned to real endpoints:

/login gets strict rate limits and bot checks
/search gets throttling and caching
/upload gets size + content-type restrictions
/api/* gets consistent header and token enforcement

Then, inside the app, you still validate — but validation becomes a second gate, not the first.”

Junior Architect

“So the edge becomes the first firewall, and the app becomes the final judge.”

Senior Architect

“Exactly. And capacity stays protected.

The point is not ‘we blocked bad traffic’.
The point is ‘bad traffic didn’t steal resources from good traffic’.”

3) Observability: “we have logs” is not the same as “we can explain slowness”

Junior Architect

“Monitoring — we have Application Insights, logs, dashboards. But honestly, every system claims that. What’s the real difference between average monitoring and good monitoring?”

Senior Architect

“Here’s a clean test:

At 11:20 AM the system feels slow but it’s not down.
Users are complaining.
You open your dashboard.

Can you answer these three questions in five minutes?

Is slowness coming from your code, your database, or a dependency?
Is it affecting all endpoints or one flow like login/search?
Is pressure increasing or stable?”

Junior Architect

“In many systems, no. We see CPU and request count and error rates. Not much else.”

Senior Architect

“Exactly. CPU is not a root cause.
It’s often just the symptom of waiting.

Good observability is when you track pressure signals, not just outcomes.”

Junior Architect

“What are ‘pressure signals’ in this architecture?”

Senior Architect

“For this system, I’d track:

dependency latency trends (not just failures)
database connection pool utilization
queue depth if you use queues for async work
p95 and p99 latency, not only average
breakdown by endpoint (login vs search vs write)”

Junior Architect

“And how do you implement that without building a complex monitoring project?”

Senior Architect

“You don’t start with ‘everything’. You start with ‘explainability’.

Application Insights gives you:

request telemetry
dependency telemetry
traces
correlation IDs

But you need one discipline:

Every request must carry a correlation ID end-to-end.
So when a user says ‘checkout is slow’, you can trace:
Front Door → App → dependency → DB.

Then you add alerts not on “CPU high”, but on:

dependency p95 increasing
DB time per query increasing
connection pool near saturation

That’s what makes monitoring actionable.”

Junior Architect

“So the goal is: before users complain, the system should show where pressure is accumulating.”

Senior Architect

“Yes. And once you have that, scaling and caching decisions become science, not guessing.”

4) Selective scaling: protecting the important path while letting other paths degrade

Junior Architect

“You keep hinting at something — that not all requests deserve equal treatment. Can you explain how that becomes implementable in a web app?”

Senior Architect

“Sure. Let’s talk like a product.

In most apps, there are flows that must remain healthy:

login
core reads
critical writes (like placing an order)

And there are flows that can degrade without destroying trust:

analytics dashboards
secondary metadata
optional enrichment calls
background sync”

Junior Architect

“So you prioritize. But how does that look in the system?”

Senior Architect

“It looks like intentional boundaries.

For example:

Give critical endpoints stricter dependency budgets (timeouts) and higher priority resources.
Put non-critical endpoints behind cache and stricter throttles.
Move heavy work to queues so your request thread stays light.”

He continues.

Senior Architect

“And here’s the important part: you design what happens when the system is under stress.

If the database is slow:

Do you let login suffer?
Or do you degrade secondary endpoints first?

A senior design answers that explicitly.”

Junior Architect

“So instead of ‘system is slow’, we shape it as: ‘only non-critical parts are slow’.”

Senior Architect

“Exactly. That’s what users call reliability.”

The review ends the way it should

The junior architect sits quietly for a moment.
The diagram still looks the same.
But it now feels incomplete unless the behavioural decisions are written down beside it.

Junior Architect

“So the architecture isn’t really the boxes. It’s the rules around the boxes.”

Senior Architect

“Yes. Azure gives you services.
Architecture is deciding:

how much pressure each tier is allowed to take
which requests get access to what
where failure must stop
and which experiences must remain stable even on bad days”

He closes his laptop.

Senior Architect

“And that’s why a design can be Azure-native and still fail in production.
Not because Azure was weak — but because the behaviour was never designed.”

Closing

This is what senior-level Azure web app design looks like when it’s implementable:

Not a shopping list of services.
Not “best practices” posters.
But clear decisions about limits, caching, isolation, rejection, and explainability.