1. E00_Retries_Are_Easy_Idempotency_Is_the_Whole_Game
Retries Are Easy; Idempotency Is the Whole Game
Distributed workers don't fail because they retry too much. They fail because their retries aren't safe to repeat.
Key Takeaways
- Every task that touches external state will run twice. Make the second run harmless.
- Idempotency keys are the default. Conditional writes are the fallback. Fencing tokens are the last resort.
- Don't paper over a duplicate-execution bug with more retries; fix the safety boundary.
Imagine your worker crashes 200 ms before committing a payment. The supervisor restarts it. The new attempt charges the customer again. Same task, same code, two outcomes — that isn't a flaky network, that's a missing idempotency contract.
The fix is not "retry smarter." The fix is making the side effect repeatable.
flowchart LR
A[Task arrives] --> B{Already done?}
B -- Yes --> C[Return
1m / Article + audio