In the fascinating, high risk/high reward domain of biotechnology, research validation and landing the next wave of funding stand as crucial checkpoints in first years of a startup. The 2023 Life Sciences arena has witnessed escalating scrutiny, disturbing reports of manipulated outcomes, and pressure to deliver results faster than ever before. Because of these pressures, building efficient, transparent and reliable data pipelines, in this case with Terraform, has become a top priority for leadership teams across the industry.
In an effort to address the data authenticity priority, PTP’s life sciences client selected PTP as their strategic consulting partner. From the beginning, the client’s primary goal in their engagement with PTP’s CloudOps and DevOps teams has been to fortify the validity of their research by streamlining and automating a mission critical data pipeline. The leading cloud infrastructure platform for Life Sciences, AWS, Terraform, and Nextflow, a software that is a recognized leader in scientific workflow systems, were chosen as cornerstone technologies to increase their likelihood for success.
After completing an AWS Well Architected Review, PTP had a solid working knowledge of how a couple of the client’s data pipelines had to work. PTP’s lead Solution Architect began using Nextflow to programmatically author a sequence of dependent compute steps tying multiple software apps (Cell Ranger, Seurat, Picard, and Star Aligner) to auto-configured AWS resources including E2, ELB, Auto Scaling, Lambda and Fargate. The integration of software tools and AWS infrastructure was completed, tested, and proven to work. From there EC2 Image Builder and Service Catalogs were set up to produce compute images in a controlled and repeatable manner. This allowed the client’s research teams to independently launch pipelines on AWS compute infrastructure via a pre-built Service Catalog. With this solution in place users can do their research securely online via a few easy-to-follow clicks. Governance of the work that these users perform is managed via AWS security policies with each user given the ability to launch relevant predetermined pipelines through the Service Catalog. The research process was, in a matter of weeks, automated, repeatable, adjustable and fully documented. This joint PTP and AWS solution ensures research validation while accelerating science. The image below demonstrates the overall environment.
All PTP builds have been put into Terraform templates to maintain known image files and component lists. Version control is handled by AWS Code Commit. As components change in Terraform, for example a software update to “version 4.2”, Terraform will know the file has changed and will deploy a new version of the component which then creates a new version of the recipe in Image Builder. For AWS cost optimization, which is extremely important when building infrastructure in this manner, the Service Catalog services are tied to Cloudwatch events. When devices go idle a trigger is pulled, then SQS queue and Lambda are used to terminate resources. This automates cost control. This systematic approach was arrived at by thoughtful engineering and has created an operationally efficient environment that can easily scale. The image below represents the Terraform and Service Catalog environment.
Recently, PTP has been working with the client to incorporate Amazon WorkSpaces for SAS access and AWS Managed AD. In the months ahead, PTP’s experience and consultative approach will be invaluable when determining the best options to isolate data and create additional levels of control and security.
PTP and the client’s IT and Informatics teams have implemented solutions that, at first, seemed complex and foreign. Embracing change is never easy, but by doing so the client has taken huge steps to avoid common data related pitfalls that face all Life Sciences companies, and thus has improved the likelihood of finding life changing relief for patients suffering from debilitating diseases.
Learn More about PTP’s CloudOps Service for Accelerating Science on AWS