Durable execution frameworks are becoming increasingly popular because they make it easier for developers to ensure code executes reliably in the face of infrastructure disruptions. In this post, I’ll explain how the DBOS Transact open source durable execution library (Python and TypeScript) takes durable execution even further by ensuring reliable code execution with transactional correctness guarantees.
Stuff happens
Imagine you’re working on a cloud-based retail application that reserves inventory for an order, then charges the customer money, ships the order, and sends a confirmation email. The pseudocode might look like this:
readyToShip = reserveInventory()
if (readyToShip):
gotMoney = collectPayment()
if (gotMoney):
shipOrder()
sendEmail()
else:
releaseInventoryReservation()
The business logic is simple, and it should be straightforward to code, test, and maintain. It should be…but if you’ve implemented processes similar to this, then you know that the production code never looks this simple. That’s because issues with the cloud infrastructure on which this runs could interrupt the application at any time. In anticipation of that, you add a lot of extra code to try and make sure the user’s purchase is successful no matter what.
If the computer collected payment but stopped before finishing the part of the program that shipped the order, that would be bad. Or, if the inventory got stuck in limbo at the time the program stopped, that would be bad too. It’s hard to anticipate all those scenarios to make sure the purchase works as it should.
If the program stops running for whatever reason (a maintenance window, upgrade, crash, power loss, network failure, etc.), wouldn’t it be great if, after the disruption was resolved, execution automatically resumed where it left off? If we can get the program’s runtime environment to do something like that for us, then we have durable execution.
Durable execution: reliable code
Durable execution means guaranteeing that each operation in your program executes at least once. This is something that a library can provide. It’s not trivial, but if information about how much code has already been executed is persisted, the program can resume approximately where it left off.
Durable execution certainly helps our scenario. It closes the biggest holes, but far from all of them. What if the reserveInventory() operation has completed, but the durable execution library didn’t get time to write down that this step was completed? You’d have to add a lot of checks to your code to see whether steps have already been completed.
Durable execution guarantees that your code runs to completion. But what if you want to ensure that your code runs exactly once (or has the same effect as that)?
That requires transactional execution.
Transactional execution: reliable code with transactional correctness
Transactional execution means guaranteeing each operation in your program executes exactly once. Assuming that reserveInventory() is already implemented with a database transaction, we can perform transactional execution by inserting the durable execution record as part of that same transaction. In this scenario, the effects of the reserveInventory() call and the record indicating that the code already ran reserveInventory() are committed together. Once committed, transactional execution won’t run that part of the code again. But, if a particular run stopped without getting a chance to commit, transactional execution will abort the transaction (revert changes made in the database), and try the statement again when the workflow restarts.
DBOS Transact is an open source durable execution library (Python and TypeScript) that enhances applications with transactional execution.
Distributed systems theory (and intuition) tells me this is impossible!
For distributed systems, such a simple scheme for transactional execution is not possible because we can’t tuck the durable execution record into any transaction. However, there’s a simple engineering solution, which is to deduplicate requests in the components, using an “idempotency key” or similar concept.
Let’s take the sending of email as an example. Deep down, there are some network packets sent to the email server, and if it likes the request to send email, it’ll send some packets back to confirm. If some of those response packets get lost or don’t get handled, the email will get sent and the transactional execution framework won’t have a durable record of it. If the code restarts, a second email might get sent.
Most email systems have a mechanism for duplicating messages. Setting an email’s “Message-ID” header to a unique value causes email servers and clients to discard messages with a “Message-ID” that has been seen before. In DBOS, each instance of executing code is assigned a UUID. Making the UUID part of the “Message-ID” of the outgoing mail prevents duplicate email sends.
Likewise, if payment collection is an external service, it will generally have some mechanism for providing a key for deduplicating requests. For example, in Stripe, an idempotency key is used to ensure that a request is only processed once. Each DBOS execution has a UUID which can be coupled with the current execution step to form a key suitable for such purposes.
How DBOS Transact performs transactional execution of workflows
Here’s a quick overview of how DBOS Transact performs transactional execution. For more information, check out our tutorial or explanations page.
- Application code is written as a function in TypeScript and decorated as a “@Workflow”. When this code is invoked, DBOS Transact durably records in the system database that the workflow has started. Invocation of a workflow will also check to ensure that no workflow with the same idempotency key has already been started, providing once-and-only-once guarantees to the caller.
- Each part of the workflow logic that involves a database transaction is written as a separate function with the “@Transaction” decorator. When these transaction functions are invoked from the workflow, the runtime will start and end a transaction around the call to the function’s code and include the execution record in the same transaction when the function returns. There is no need for the programmer to make transactions idempotent, exactly-once execution is built in.
- Each part of the workflow logic that calls out to an external system is broken out as a separate function with the “@Communicator” decorator. A durable record of communicator invocations is recorded in the system database after the call is known to have completed. This ensures that communicators will run at least once; additional mechanisms are required to ensure that multiple invocations have the same effect as a single invocation.
If a workflow is restarted, DBOS Transact starts the workflow function from the beginning. When the workflow encounters a transaction or communicator invocation, it consults a database to establish whether this part of the code was already executed. If a record is found, the result from the database is used instead of re-executing the transaction or communicator code.
Transactional execution as a service on DBOS Cloud
The DBOS Transact Typescript framework makes transactional execution easy, and DBOS Cloud makes running DBOS Transact applications even easier, with:
- Secure serverless, scalable code hosting and execution
- Automatic failure recovery and workflow restart
- HTTPS endpoints for your code, with access control
- Observability data dashboard - traces, logs, and metrics
- Time Travel Debugging, which steps you through transactional execution of workflows exactly as they ran in the past. This greatly simplifies auditing and troubleshooting.
Use DBOS Transact and DBOS Cloud for free
To get started with DBOS, just download the open source (MIT license) DBOS Transact open source durable execution library (Python and TypeScript) to start running code locally. Check out the quickstart or download an example application for help.
Once your application is written and running you can deploy it to DBOS Cloud and run it for free.
- Create a free DBOS Cloud account.
- Join the DBOS user community on Discord.