Are Your Lambdas Costing You More to Wait Than to Compute?

AWS hidden wait costs
Qian Li and Peter Kraft
October 11, 2024

Serverless computing should make it easy to deploy AI applications to the cloud–just push a button and go, without worrying about provisioning or maintaining infrastructure. However, if you use popular platforms like AWS Lambda for AI workloads, you’ll probably receive a surprisingly large bill. Why? Most serverless platforms charge you for how long your function runs, not how much CPU time it actually uses. This means that AI applications are charged for the idle time they spend waiting for LLMs to respond–and since LLMs take a long time to respond, that cost adds up fast.

We believe a serverless platform should only charge you for the resources you actually use–and not charge you for idle time. Therefore, the DBOS serverless compute platform is designed to charge only for the CPU time your app actually uses, not for the total wall-clock time of each request.

In this post, we’ll first show how charging only for CPU time makes DBOS 53x more cost-efficient in a benchmark we ran (your mileage may vary of course). We will also  describe the efficiencies in the DBOS architecture that makes this possible.

Cost Analysis: 10 million OpenAI  gpt-4o-mini requests

Let’s analyze the cost of an AI workload. Every request in this workload queries OpenAI’s gpt-4o-mini model, utilizing an average of 17 input and 350 output tokens. These requests spend most of their time idle, waiting for a response from OpenAI. The average end-to-end latency of each request (wall-clock time) is 4.8 seconds, but each request consumes on average 5 milliseconds of CPU time.

We analyze the cost of 10M of these requests with DBOS and with AWS Lambda. We assume 512MB executors are used for both DBOS Cloud and Lambda.

DBOS Cloud Cost

DBOS bills only for the 5 ms of compute time that each request takes:

 DBOS Cloud Cost 
 50,000 CPU seconds $2.50
 10 million requests $5.00
 Total $7.50

AWS Lambda Cost

AWS Lambda bills for the full 4.8 second wall-clock duration of each request:

 AWS Lambda Cost 
 48 million wall-clock seconds $400.00
 10 million requests $2.00
 Total $402.00


Because each LLM call takes only 5ms of CPU time but 4.8s of wall-clock time, DBOS is 53x more cost-efficient than AWS Lambda in this benchmark. These cost savings are huge for any AI app, but they aren’t exclusive to AI. This cost-efficiency will be attained by any application that spends significant time waiting for I/O. This includes any application with long-running database queries, calls to external APIs, or interactive waits for user input.

Why It Works

Why can DBOS afford to charge only for CPU time and not for idle time?  It’s possible because DBOS shares execution environments across requests to the same application to improve resource utilization. Consider a Retrieval-Augmented Generation (RAG) chatbot application. When you send it a message, it queries a vector database for relevant context, makes a call to an OpenAI API, waits for a response, then adds the response to your chat history. Here’s what that might look like:

RAG workflow diagram

Current serverless platforms execute such applications inefficiently because they launch every request into a separate execution environment. Each execution environment sends a request to the database or a LLM API endpoint, then is blocked doing nothing useful until it receives the database’s response. Here’s what it looks like if two users concurrently chat in AWS Lambda:

Why are RAG workflows so expensive on AWS Lambda

For a RAG application, or for any other I/O-intensive application, current serverless platforms spend most of their time doing nothing, then charge you for the idle time.

By contrast, DBOS multiplexes concurrent requests to the same execution environment, using the time one request spends waiting for a result to serve another request. To make sure execution environments are never overloaded, DBOS continuously monitors their utilization and autoscales when needed, using Firecracker to rapidly instantiate new execution environments. Here’s what it looks if multiple people concurrently chat in DBOS:

How DBOS runs RAG workflows more cost efficiently than AWS Lambda


Create a Generative AI Application in 9 Lines of Code on DBOS

Want to try deploying your own AI applications to DBOS?  Check out this 9-line generative AI application quickstart, which walks you through serverlessly deploying an interactive AI app to the cloud in just 9 lines of code. For a more complex example, check out this RAG-powered Slackbot which answers questions about your previous Slack conversations.

© DBOS, Inc. 2024