Idempotency Is Not an API Thing: A Conversation Between Two Engineers

A conversation between a senior and a junior engineer on what idempotency really is, beyond REST APIs, across SQL jobs, console apps, Azure Functions, message queues, and any operation that can run more than once. Clear, end to end.

Idempotency Is Not an API Thing: A Conversation Between Two Engineers

The junior engineer has been writing production code for three years.

He knows what idempotency means. Or at least he thinks he does.

He has used idempotency keys.
He has read the Stripe documentation.
He has nodded confidently in architecture reviews when someone said "make sure it's idempotent."

He is reasonably sure he understands it.

The senior engineer is about to ask one question that will change that.


The conversation starts with a definition that isn't quite right

Junior Engineer

"So idempotency. It's basically when you send an idempotency key with an API request, right? The server checks if it's seen that key before. If yes, it returns the cached response. If no, it processes it fresh. That way, retries don't cause duplicate operations."


Senior Engineer

"That's one way to implement idempotency in an API. But it's not what idempotency is.

Let me try something simpler. Think about an elevator button. When you're waiting for a lift and you press the button, it lights up. Now you press it again. And again. Does the lift come faster?"


Junior Engineer

"No. It's already been called. Pressing it again doesn't change anything."


Senior Engineer

"Exactly. That button is idempotent. You can press it once or a hundred times. The result is the same: the lift is coming. The extra presses don't create extra lifts. They don't undo the first press. They just do nothing on top of what's already been done.

That's what idempotency means as a concept. An operation is idempotent if you can run it multiple times and the result is exactly the same as running it once.

It's a property of an operation. Not a key. Not a header. Not something specific to APIs or HTTP. Just an answer to one question: if this runs again, does anything go wrong?"


Junior Engineer

"But in practice, the way you make something idempotent is with idempotency keys. Right? That's the pattern."


Senior Engineer

"In APIs, that's one tool. But let me show you why the concept is bigger than that.

You have a SQL job that runs every night at midnight. Its job is to take orders from a staging table and insert them into the main orders table. No API. No HTTP. No key anywhere.

If that job runs twice tonight, what happens?"

The junior pauses.


Junior Engineer

"If it just does a plain INSERT... every order gets inserted twice. Duplicate rows."


Senior Engineer

"Right. So the job is not idempotent. And nobody wrote an idempotency key into it. Because developers don't usually think about SQL jobs the way they think about APIs.

Which means most SQL jobs in most systems are quietly not idempotent. And nobody realises it until the job runs twice. Which it will, eventually."

He lets that sit for a moment.


Senior Engineer

"This is what I want us to talk through today. Not idempotency as an API feature. Idempotency as a discipline. A way of thinking about every operation you build, regardless of what kind it is. Because more operations can run more than once than most engineers think."


When does something run more than once?

Junior Engineer

"Okay. I get the concept. But when would something actually run more than once by accident? I'd expect that to be pretty rare."


Senior Engineer

"It's actually one of the most common things in production. Let me walk through some everyday situations.

Your app calls an external payment API. The network is slow. After 30 seconds, your app gets no response and assumes it failed. So it retries. But the first call actually did succeed. The payment went through. Now it goes through again.

Your scheduled Azure Function is set to run at 2 AM every night. But last night's run is still going because it hit a slow database query. At 2 AM tonight, a new run starts. Now two instances are running at the same time, both doing the same work.

A developer runs a data migration script on a Friday to fix a production issue. On Monday, a second developer, not knowing about Friday's run, runs the same script again to double-check. The script runs twice.

A message arrives in Azure Service Bus. Your consumer picks it up, starts processing it, and then the server crashes halfway through. Service Bus doesn't hear back from the consumer. After a few minutes, it assumes the message was never processed. So it puts the message back and delivers it again to a new consumer instance.

A Kubernetes pod is running a background job. The cluster decides to move the pod to a different node. The pod is killed mid-job. Kubernetes starts it fresh on a new node. The job runs again from the beginning."


Junior Engineer

"So these aren't edge cases. These are just... normal production situations."


Senior Engineer

"Completely normal. Every one of those things happens regularly. And in every one of those situations, if your operation isn't idempotent, you get problems.

Duplicate rows in the database. A customer charged twice. The same email sent twice. A counter that's off by one or ten. Or the worst kind: silent data corruption that nobody notices for three days, until a customer calls support."


Junior Engineer

"And the customer doesn't know any of this happened. They just see the wrong charge on their statement."


Senior Engineer

"Exactly. And your support team has to investigate manually, trace through logs, figure out what ran when, and apologise. All of that cost comes from one missing design decision made before the feature was built."


Scenario 1: The payment API, and what happens when the key isn't enough

Senior Engineer

"Let's start where you're most familiar. A REST API for placing an order and charging a card. You said you'd use an idempotency key. Walk me through how that works."


Junior Engineer

"The client, meaning the front end or the calling service, generates a unique ID before making the request. Usually a UUID, something like '7f3d2c1a-...'. It sends that in the request header. When the server receives the request, it checks a table: have I seen this key before? If yes, return the response I stored last time. If no, process the order and save the response against this key."


Senior Engineer

"Good. That's the right idea. Now let me walk through one specific failure scenario and I want you to tell me what happens.

Your server receives the request. It checks the key: not seen before, so it starts processing. It calls the payment gateway. The gateway charges the card successfully. But then, before your server can write the order record to the database and save the idempotency key, the server crashes. Power cut, out of memory, doesn't matter. It just dies.

The client gets a timeout. No response. What does the client do?"


Junior Engineer

"It retries. With the same idempotency key."


Senior Engineer

"Your server restarts. The new request arrives. It checks the key table. Does the key exist?"


Junior Engineer

"No. Because we never saved it. We crashed before that step."


Senior Engineer

"So what does the server do?"


Junior Engineer

"It treats the request as new. It calls the payment gateway again."

The junior goes quiet for a second.

"The customer gets charged twice."


Senior Engineer

"Yes. And no error happened anywhere. Every individual step worked correctly. The payment gateway did its job. The retry logic did its job. The idempotency check did its job. But the order of operations was wrong, and the whole thing still failed the customer.

This is the gap most developers miss. The key only works if you save it as part of the same operation as the work. Not before. Not after. Together. If your database write and your key storage are not in the same transaction, there's a window where a crash leaves you in the worst possible state: work done, but no record of it."


Junior Engineer

"So the idempotency key is only as safe as the transaction around it."


Senior Engineer

"Yes. And there's a second problem that's just as common. The client sends the same key, but the server isn't crashed. It's just slow. The first request is still being processed. The client gets impatient and retries.

Now two requests with the same key are being processed at the same time. What does your server return to the second one?"


Junior Engineer

"I don't know. Maybe a 500 error? Or it just waits?"


Senior Engineer

"Most servers return something unhelpful there. The correct answer is a 409 Conflict. A response that says, in plain terms: 'I've already seen this key and I'm still working on it. Wait a moment and try again.'

Or if your operation is asynchronous, meaning it takes a long time and runs in the background, you return a 202 Accepted with a link the client can check to see the status.

The point is: an idempotency key isn't just a deduplication trick. It changes what your system has to communicate in every possible situation. Seen the key and done the work: return the stored result. Seen the key and still working: say so. Never seen the key: do the work. Each case needs a clear, intentional answer."


Junior Engineer

"I've never thought about the 'still working' case. I just assumed the system would either have done it or not."


Senior Engineer

"Most developers haven't. And one more thing. How long do you keep the keys?"


Junior Engineer

"I... haven't really decided. Until the row gets cleaned up, I suppose."


Senior Engineer

"That's the answer most systems give. Which means some systems keep them forever until the database gets large, and some delete them too early, which means a retry that comes in four hours later looks like a brand new request.

Stripe keeps idempotency keys for 30 days because that's long enough to cover any realistic retry window for a payment. Most internal systems don't need 30 days. But they need a number. A deliberate decision. Not a default that nobody chose."


Scenario 2: The nightly SQL job that nobody worries about

Junior Engineer

"Let's go back to the SQL job example. You said most of them aren't idempotent. How do you actually fix that?"


Senior Engineer

"First, let's be very concrete about the problem. Imagine your company runs an Azure Data Factory pipeline every night at midnight. It reads from a staging table where raw transaction data lands throughout the day, and it inserts those transactions into a clean fact table that the reporting team uses.

On a normal night, it runs once. Everything is fine. But one night there's a network blip halfway through and the pipeline fails. The on-call engineer sees the alert and reruns it manually. Now the pipeline runs again from the beginning. What happens to the rows that already got inserted in the first partial run?"


Junior Engineer

"They get inserted again. Duplicate rows in the fact table."


Senior Engineer

"And the reporting team runs their reports the next morning not knowing any of this happened. The numbers are wrong. Maybe slightly wrong, maybe very wrong depending on how far the first run got. And tracing it back is painful.

The fix is to change the question the job asks. Instead of 'insert this data', it should ask 'make this data exist.' There's a big difference.

A plain INSERT says: add this row, I don't care if it's already there. An upsert, or a MERGE in SQL terms, says: if this row already exists, update it to match. If it doesn't exist, create it. Either way, when you're done, the data looks exactly like it should.

Run that job once: correct state. Run it ten times: same correct state. The job is now idempotent."


Junior Engineer

"But to do a MERGE, you need some way to recognise whether a row already exists. Like a unique ID to match on."


Senior Engineer

"Exactly. And this is where the design conversation starts. Idempotency requires identity. To know whether you've already done something, you need a reliable way to recognise that thing when you see it again.

For an order, that's probably an order ID from the source system. For a transaction, maybe a combination of the transaction reference and the date. For an event log, maybe a hash of the key fields.

The point is: if your data model has no natural key, idempotency becomes much harder. This is a design decision you make early. And if you don't make it deliberately, production will eventually make it for you, in the worst possible way."


Junior Engineer

"So idempotency isn't just about the job. It starts with the data model."


Senior Engineer

"Yes. And there's a trap I see often that looks safe but isn't. A developer writes something like: check if this row already exists, and if not, insert it. Sounds fine. But what if two instances of the job run at the same time? Both do the check. Both find no existing row. Both try to insert. You get duplicate rows anyway.

The check-then-insert pattern only works if exactly one thing is running at a time, which you often can't guarantee. The database's own uniqueness constraint, combined with an upsert operation, is the only way to get a guarantee that holds under any conditions."


Scenario 3: The console app that runs every 15 minutes

Junior Engineer

"What about a background worker? Like a console app or a WebJob that runs on a schedule?"


Senior Engineer

"Good example. Let's say you have an Azure WebJob that runs every 15 minutes. Its job is to pick up new customer records, call an external enrichment API to add extra details, and write the enriched records to Azure Blob Storage.

Two problems can happen here, and neither of them feels like a bug at first.

Problem one: the job takes 16 minutes. A slow response from the enrichment API. By the time it finishes, the next scheduled run has already started. Now two instances are running at the same time, processing the same batch of records. Neither knows the other exists."


Junior Engineer

"They'd both write to the same blobs. One would overwrite the other."


Senior Engineer

"Maybe. Or they'd both call the enrichment API for the same customer, getting billed twice for that API call. Or one finishes first and marks the record as done, but then the second finishes and marks it done again with slightly different data because the API returned something different the second time around.

Problem two: the job processes a record, writes the blob, then crashes before marking that record as done. Next run, it picks up the same record again and runs the whole thing over."


Junior Engineer

"For the second problem, if the blob gets overwritten with the same data, isn't that fine?"


Senior Engineer

"Only if the enrichment API always returns identical data for the same input. If it returns a price, a stock level, a timestamp, anything that can change between calls, then the second write might have different data. Now your system has processed the same record twice and stored two different results, one of which got silently overwritten.

You might never notice. Until someone asks why customer records from a specific period look inconsistent."


Junior Engineer

"So how do you stop the overlap problem? The two instances running at the same time?"


Senior Engineer

"A distributed lease. Before the job starts its work, it tries to claim a lock on a shared resource. In Azure, you can use a Blob Storage lease for this. Think of it like a physical key to a room. Only one person can hold the key at a time. The job picks up the key before it starts. If another instance tries to start and finds the key already taken, it simply exits. It doesn't fight. It doesn't wait. It just walks away.

When the first job finishes, it releases the key. The next scheduled run picks it up normally.

One run at a time. Clean."


Junior Engineer

"So in this case, the solution isn't making the operation idempotent. It's preventing the duplication from happening at all."


Senior Engineer

"Right. And that's an important distinction. Idempotency means tolerating duplication. Prevention means eliminating it. Both are valid. Often you want both: prevent where you can, tolerate where you can't. The blob lease prevents concurrent runs. The upsert write tolerates the occasional restart where the same record gets processed twice."


Scenario 4: Message queues and the guarantee that surprises people

Junior Engineer

"Message queues. I know at-least-once delivery means a message might arrive more than once. So idempotency matters there."


Senior Engineer

"Yes. But I want to make sure the 'at-least-once' part is clear, because a lot of developers hear it and think 'that's an edge case, it rarely happens.'

It's not an edge case. Azure Service Bus guarantees that a message will be delivered. It does not guarantee it will only be delivered once. The reason is: to know that a message was truly processed, Service Bus needs the consumer to send back an acknowledgement. If the consumer processes the message and then crashes before sending that acknowledgement, Service Bus has no idea the work was done. So it re-delivers the message. It has to. The alternative is losing the message entirely, which is worse.

So duplicates aren't a bug. They're the price of reliability. And your consumer has to be built with that in mind."


Junior Engineer

"Doesn't Service Bus have duplicate detection built in though? I've seen a setting for it."


Senior Engineer

"It does. And this is a common source of false confidence. Service Bus can detect if the exact same message is delivered twice within a time window, based on the message ID. That covers broker-level duplicates, situations where the broker itself sends the same message twice.

But it doesn't cover the scenario I just described. If your consumer crashes after processing but before acknowledging, Service Bus delivers the message again, but from its perspective, that's a legitimate re-delivery of a message that was never confirmed, not a duplicate. The duplicate detection won't catch it.

Your consumer needs to handle it."


Junior Engineer

"So how do you make a consumer handle it correctly?"


Senior Engineer

"The cleanest approach depends on what your consumer does.

If your consumer is writing to a database and there's a natural business key on the record, like an order ID, just use an upsert. Write the record if it doesn't exist, update it if it does. Processing the same message ten times leaves you with exactly one record in the correct state. No extra infrastructure needed.

If your consumer has side effects beyond a database write, like sending an email or calling a payment gateway, you need to track what you've already done. Before processing a message, check whether that message has already been processed successfully. Azure Cache for Redis works well here: store the message's business ID with a short expiry. If it's already there, skip the processing and just acknowledge the message. Simple check before every action.

The key choice is which ID to track. Service Bus gives every message its own message ID, which is an infrastructure concept. But your message also contains a business concept, an order number, a customer ID, a transaction reference. Use that as your deduplication key. It's the thing that actually means something to your system, and it survives across retries and redeliveries in a way that infrastructure IDs sometimes don't."


Junior Engineer

"How do you decide which approach to use?"


Senior Engineer

"Ask one question: what is the cost if this runs twice?

If the cost is nothing, the operation is naturally safe, just use an upsert and move on.

If the cost is money, like a payment, or trust, like a notification, or anything the customer will notice, then you need the explicit check.

The answer to that question tells you exactly how much effort to invest."


Scenario 5: Azure Functions, serverless, and why state can't live in memory

Junior Engineer

"What about Azure Functions? The HTTP-triggered kind."


Senior Engineer

"Azure Functions are a great example of why understanding idempotency as a concept matters more than knowing any specific implementation.

Here's what makes Functions different. A regular web app might run as one or two instances. You might even be able to pretend it's a single server in some situations. A Function can scale to dozens or hundreds of instances in seconds. If 500 users all click the same button at the same time, 500 separate Function instances could all be handling those requests simultaneously.

Each instance is completely isolated. It has no memory of what other instances are doing. It doesn't know if another instance is already processing the exact same request that came in twice due to a network retry."


Junior Engineer

"So you can't store 'have I seen this request before' in memory. Because memory is per-instance."


Senior Engineer

"Exactly. The only place where truth can live is somewhere that all instances can read from and write to. A database. Azure Table Storage. Redis. Something external and shared.

The logic is the same as what we discussed before: when a request arrives, check a shared store for that idempotency key. If it exists, return the stored result without doing the work again. If it doesn't exist, do the work, save the result and the key to the shared store, and return.

The extra question with Functions is concurrency. What if two instances receive the same request at almost the same moment, both check the store, both find no key, and both start processing?

You let the database handle that. If you're using Azure Table Storage, it has optimistic concurrency built in. Only one write will succeed when two try to insert the same key at the same time. The second one gets a conflict error. At that point your Function catches the conflict, re-reads the stored result from the first instance, and returns it. Clean."


Junior Engineer

"So the Function itself doesn't need to be complicated. The data layer does the heavy lifting."


Senior Engineer

"Yes. And that's the general principle in serverless: compute is cheap and disposable. Data is where guarantees live. Your idempotency design has to be in the data layer, not the compute layer. Functions just execute whatever logic you give them. They don't remember anything between invocations unless you build that memory into storage."


Why this all matters beyond just preventing duplicates

Junior Engineer

"So we've gone through APIs, SQL jobs, console apps, queues, and Functions. I understand the problem better now. But what's the bigger payoff? Why invest in this properly?"


Senior Engineer

"Let me ask you something. When you're testing a feature, what makes testing hard?"


Junior Engineer

"Writing tests for all the different things that can go wrong. Error cases, edge cases, unexpected sequences of events."


Senior Engineer

"Right. Now think about retry scenarios specifically. If you have an operation that isn't idempotent, you need test cases for: what if the client retried once? What if it retried three times? What if two retries overlapped? What if the first attempt half-succeeded and then the retry came in?

Each of those is a separate test scenario. Each one requires setup, assertions, and maintenance.

If the operation is idempotent, all of those scenarios collapse into one: does the operation produce the correct result? You don't care how many times it ran. The result is always the same.

That's a real, measurable reduction in test surface. Fewer tests to write. Fewer tests to maintain. Fewer bugs that come from test gaps."


Junior Engineer

"And when something goes wrong in production, what changes?"


Senior Engineer

"This is where it matters most for your day-to-day life as an engineer.

When something isn't idempotent and it fails, your recovery process is: investigate what ran, figure out what got into the database and what didn't, write a script to fix the inconsistency, test the script, run the fix, verify the result, and update the customer. That process takes hours. Sometimes days if the failure was subtle.

When something is idempotent and it fails, your recovery process is: run it again. That's it. It takes minutes. And you can do it with confidence because you know that running it again will leave the system in the correct state, not make things worse."


Junior Engineer

"So idempotency changes your 3 AM incident from 'I need to figure out what happened and carefully fix the data' to 'I just rerun the job and go back to sleep.'"


Senior Engineer

"Yes. And it changes things beyond incidents too.

Imagine you need to replay six months of transactions through a new processing pipeline you just built. If your pipeline is idempotent, you just run the data through and whatever already exists gets updated to match, whatever's missing gets created. No risk.

If your pipeline isn't idempotent, replaying data means duplicates everywhere. You have to build cleanup logic before you can even start. What should be a straightforward migration becomes a careful, scary operation.

Idempotency turns reruns from something you fear into something you can do without thinking."


Junior Engineer

"You mentioned earlier that idempotency is also a contract with your callers. Can you say more about that?"


Senior Engineer

"Every system you build is used by other systems. Other services call your API. Other jobs consume your queue. Other pipelines read your output.

When those callers hit an error or a timeout, they have to decide: do I retry? If your operation is idempotent, the answer is always yes. Retry as many times as you need. You won't cause any harm.

If your operation is not idempotent, the answer is: maybe. It depends on what stage the previous request reached. The caller now has to write complicated state-tracking logic to figure out whether it's safe to retry. Their code gets more complex because your design didn't make a guarantee.

Most teams never write down which operations are idempotent and which aren't. Callers guess. When they guess right, nothing bad happens. When they guess wrong, you get an incident that looks mysterious until someone traces through logs for two hours and realises a retry caused a double charge.

One sentence in your documentation, 'this endpoint is idempotent, it is safe to retry with the same key', prevents that entire category of problem. Good engineering is also good communication. They're the same thing."


The one question to ask before you build anything

Junior Engineer

"If I take one thing from this conversation and apply it to every new piece of work I do, what should it be?"


Senior Engineer

"Before you build any operation, ask: can this run more than once?

Not 'will it.' Because the answer to 'will it' is often 'probably not.' The answer to 'can it' is almost always 'yes.'

Networks are unreliable. Servers crash. Deployments overlap. Developers rerun scripts. Queues redeliver messages. Schedulers fire twice. In the real world, any operation that can run once will, at some point, run more than once.

So once you've accepted that, the question becomes: if it runs twice, what happens?

If the answer is 'nothing bad,' you're fine.

If the answer is 'duplicates,' 'double charges,' 'inconsistent state,' or 'I'm not sure,' then idempotency is not an optional nice-to-have. It's a requirement. And the time to think about it is before you build, not after production finds the problem for you."


Junior Engineer

"So it's not a pattern you add on top. It's a question you answer during design."


Senior Engineer

"Yes. And one more thing worth remembering.

Most bugs in distributed systems aren't caused by failures. They're caused by successful operations that ran more than once."


He closes his laptop.


Senior Engineer

"A failure is visible. You get an error. An alert fires. A log entry appears. You know something went wrong and you go fix it.

An operation that succeeds twice is invisible. No error. No alert. No log that says anything is wrong. Just a customer who checks their statement and finds two charges. Or a report that shows slightly wrong numbers. Or an email that went to 50,000 people twice.

The damage is quiet. And you find it when someone complains, not when your monitoring catches it. Because your monitoring is watching for failures. And this wasn't a failure. It was a success. Twice."


Closing

Idempotency is not something you add to an API when Stripe tells you to.

It's not a checkbox in a design review.

It's not something senior engineers think about and junior engineers don't.

It's a question. One question, asked before every operation you design:

If this runs again, what happens?

Answer that question clearly, in SQL jobs, in background workers, in message consumers, in serverless Functions, in every place code runs that can run more than once, and a whole category of production incident quietly stops happening.

Not because anything is perfect. But because you designed for the world as it actually is.


I write at https://www.thetruecode.com to help developers grow beyond just code — through system thinking, clarity, and communication.


💬 Let’s Connect
Enjoyed this article or have a perspective to share? Let’s connect on LinkedIn.