Adaptive Phage Therapeutics (APT) is a clinical-stage biotech advancing therapies to treat multi-drug resistant infections. Prior approaches in antimicrobials have been ‘fixed’ while the pathogens continue to evolve resistance – therefore all have either become obsolete or are becoming obsolete due to antimicrobial resistance. APT’s phage bank approach leverages an ever-expanding library of phages that collectively provide evergreen broad spectrum and polymicrobial coverage. Phage bank therapy is matched through a proprietary phage susceptibility assay that APT has teamed with Mayo Clinic Laboratories to commercialize on a global scale.
PhageBank is positioned to be the first antimicrobial to increase in spectrum of coverage and does not require market-suppressing antibiotic stewardship. Advanced development of APT’s therapeutics is funded in part by the US Dept of Defense.
APT used an existing pipeline built leveraging SnakeMake. Snakemake is a bioinformatics workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow.
The team at APT are experts in the science behind antimicrobial resistance and had built an effective cloud environment to run their pipelines, but engaged PTP seeking to standardize to a High Performance Compute platform that would accelerate the pipeline runtime, delivering refined data back to the scientists in less time.
Pipeline jobs that originate from wet bench labs from scientists require the scientist to complete the activity with their research instrumentation and then submit the raw data through the pipline. Depending on the research and data, along with the availability of compute resources, these pipelines can range from minutes to days. The faster the process can be completed, the faster APT can analyze the data.
The CloudOps Engineering team at PTP established a design and a plan for APT to update their pipeline discovery process to a standardized, modern application architecture. PTP built an AWS Batch environment, migrated from SnakeMake to Nextflow for coding and pipeline development and built in a Docker image.
This workflow begins with a job being submitted to AWS Batch, then launches an EC2 instance running Nextflow. The AWS Batch job then initiates in Docker with the most recent pipeline and then starts the pipeline in addition to all the jobs and sub-jobs.
Nextflow continues to create instances to run all parts of the pipeline but uses existing instances where it can for efficiency and cost. When the jobs are complete, the output is saved to S3, the Docker containers are shut down, Nextflow sends logs to S3, and it shuts down.
AWS Infrastructure Design:
Goals for the design were to deploy a manageable, current architecture to speed the time to refined data for analysis while managing costs. With the previous architecture, the pipeline jobs ran serially, one after the other. By building an environment with AWS Batch and utilizing Lambda, PTP was able to create an environment to run the jobs in parallel in the following manner:
- APT loads data into S3 in a designated area. This is the data from the wet lab. Lambda listens for new data and triggers the first pipeline automatically.
- Data from the pipeline is delivered back to S3 and arranged to accept data for the second pipeline.
- APT reviews the output and determines what data needs polished and moves the polished data into a designated bucket in S3.
- A second Lambda functions grabs the polished data, incorporates it back to the original data run, and submits another AWS Batch job to run the finishing pipeline.
- All data is organized into S3 from both runs and presented to the customer.
AWS services implemented as part of the solution:
CodeCommit, CloudFormation, CloudWatch, CloudTrail, Lambda, AWS Config, Identity & Access Management (IAM), Virtual Private Cloud (VPC), Simple Storage Service (S3), Elastic Compute Cloud (EC2), Elastic Container Registry (ECR), and AWS Batch.
When PTP was engaged and reviewed the current pipeline process, the elapsed time was 5-6 hours. With the implementation of the AWS Batch Squared based workflow, the runtime was shortened to 1.15 hours using approximately 650 CPUs. By using AWS, APT can run these services in parallel, using the same amount of CPU time and cost but with the scalability to run concurrently which delivers actionable data back to Bioinformaticians faster.
PTP’s CloudOps Engineering services allow APT to utilize PTP as an extension of their data science team, quickly responding to ongoing CloudOps requests for design, architecture, and cloud management.