
Soria Analytics provides a unified data-intelligence platform for the healthcare-services industry. The company aggregates and normalizes hundreds of public and commercial data sources — from PDFs and spreadsheets to APIs and regulatory releases — into continuously updated, analytics-ready datasets. With real-time alerts, natural-language search, and full data lineage, Soria enables analysts, investors, and operators to access durable insights without manual data collection or cleanup.
Soria continuously monitors 300+ data sources, representing decades of changing file formats, shifting schemas, and thousands of healthcare-company subsidiaries. These workflows must run reliably and in parallel. When a government dataset updates, Soria wants to detect the change, pull all new files, clean them, map evolving schema, and load into BigQuery — without manual intervention or brittle one-off scripts.
Soria’s ingestion pipelines scrape hundreds of government sources, parse inconsistent file formats, reconcile decades of schema drift, and fan out into hundreds of parallel mapping and cleaning tasks. Initially, they tried using Celery to enqueue work, but it lacked the workflow primitives needed to model multi-step, long-running pipelines. As the platform grew, chaining tasks together became fragile and difficult to reason about.
“Trying to build our ingestion engine on Celery got ugly fast once we needed multi-step, highly parallelized workflows,” Cameron Spiller, CTO of Soria Analytics.
Healthcare data changes unpredictably — columns are renamed, formats shift, new regulatory fields appear. When something failed, Celery only showed the broken task, not the workflow it belonged to. The team couldn’t easily trace lineage or understand where failures originated, making debugging slow and operationally expensive.
While Soria built most of its product with lightweight, modern infrastructure, Celery required its own ecosystem: a persistent Redis or RabbitMQ cluster, dedicated workers, autoscaling logic, and separate monitoring. Maintaining two parallel infrastructures introduced unnecessary friction for a small team focused on speed and reliability.
Soria enhanced its ingestion pipeline code with the open source DBOS Transact library, giving them instant, end-to-end workflow durability and visibility. DBOS allowed the team to orchestrate scraping, cleaning, mapping, and ingestion as natural Python workflows.
Switching to the DBOS durable workflow orchestration library eliminated the need for Celery’s Redis, worker clusters, or a dedicated orchestrator. The team simply deploys Python code, and DBOS handles concurrency, retries, and durability automatically, and it uses their existing Postgres database to store workflow state. CI/CD became faster, rollbacks became trivial, and no one on the team has had to think about orchestration plumbing in months.
Using DBOS Conductor, Soria gained real workflow-level observability, something they struggled with in Celery. Engineers can now monitor workflow state, inspect stuck jobs, and understand failures in context, dramatically reducing debugging time and increasing reliability.
“We’re a tiny team, and DBOS let us move fast without running more infrastructure. It gave us durable orchestration and real visibility, with almost no overhead.” — Cameron Spiller, CTO, Soria Analytics
Discover why brands are turning to DBOS for reliable and observable programs.
Add a few annotations to your program to make it resilient to any failure.