Integrating Machine Learning with Generative AI for Protein Research in Life Sciences

A biotech company partnered with PTP to integrate machine learning and Generative AI on AWS, creating a secure, scalable pipeline that cut research cycle times, improved collaboration, and accelerated therapeutic protein discovery.

Illustration of Goat working on servers leading data to the cloud and to a proved treatment

Overview

A clinical-stage biotechnology company, focused on engineering next-generation proteins to accelerate therapeutic innovation, was searching for AI-enabled advancements to their research. At the heart of their pipeline were machine learning (ML) models that predicted protein folding and interaction patterns, helping researchers identify promising therapeutic candidates. While these ML models delivered powerful predictive capabilities, the company’s scientists faced a persistent bottleneck: turning raw predictions into actionable insights.

Protein research is inherently interdisciplinary, requiring collaboration among computational biologists, molecular modelers, chemists, and wet-lab researchers. While ML systems such as AlphaFold could produce detailed folding predictions, these outputs often needed extensive interpretation and translation into experimental briefs. This process consumed valuable time and slowed experimental cycles, hindering the company’s ability to quickly iterate and validate new therapeutic hypotheses.

To address this challenge, the company partnered with PTP to integrate its existing ML pipeline with Generative AI (GenAI) capabilities on AWS Bedrock. The result was a transformative workflow that combined the predictive power of ML with the contextualization strengths of GenAI. Predictions became clear, plain-language, experiment-ready briefs that allowed interdisciplinary teams to collaborate more effectively, shorten research cycles, and accelerate the development of new protein-based therapeutics.


The Challenge

The company’s research bottlenecks were shaped by three interrelated challenges:

Interpretation Gap

The company’s ML models could generate folding predictions and structural interactions, but these outputs were dense, technical, and difficult for non-specialists to interpret quickly. Cross-functional teams had to spend significant time translating computational predictions into insights usable for experimental design.

Time-Consuming Summarization

Reports summarizing ML outputs were drafted manually by data scientists and computational biologists. Each cycle required days of analysis and writing, extending experimental planning cycles and delaying downstream work.

Scaling Research Output

As the company expanded its protein engineering pipeline, the number of candidate proteins under investigation grew dramatically. Scaling human effort to match ML output was not feasible, creating a widening gap between computational predictions and actionable experimentation.

The company set a clear goal: Join ML to GenAI in a seamless pipeline that could automatically generate structured, comprehensible, and actionable reports—without sacrificing scientific rigor or compliance.

The Solution

PTP designed and implemented an integrated ML + GenAI pipeline on AWS that addressed the company’s bottlenecks and established a repeatable research framework.

Key Solution Components

Data Ingestion & Normalization

Raw protein data—including sequences, structural metadata, and prior experimental results—was ingested into Amazon S3 as the central data repository. AWS Glue pipelines performed data cleaning and normalization, ensuring consistent formats across protein datasets. This allowed downstream ML and GenAI systems to interact with structured, reliable inputs.

Protein Folding with AlphaFold

The company’s existing ML capabilities, centered on AlphaFold, were deployed on Amazon SageMaker to predict protein folding and interaction structures. Outputs included 3D models of folded proteins and associated confidence metrics, stored securely in S3 for accessibility. These predictions formed the foundation of the GenAI-driven contextualization step.

Generative AI Summarization with AWS Bedrock

PTP integrated AWS Bedrock into the pipeline, enabling seamless orchestration of large language models (LLMs) specialized for life sciences data. Using ProtGPT2 and ProtBERT as foundational models, the system was fine-tuned on the company’s proprietary dataset of protein predictions and experimental results. Bedrock agents automatically generated plain-language summaries contextualizing folding predictions, highlighting unique structural features, and identifying potential therapeutic implications.

OpenWebUI Research Interface

Instead of relying on pre-packaged SaaS solutions, PTP deployed a custom OpenWebUI front end. Researchers interacted with the pipeline through a simple, intuitive interface:

  • Submit queries about specific protein candidates.
  • Retrieve folding predictions and GenAI-generated summaries.
  • Access structured experiment briefs ready for validation.

Human-in-the-Loop Validation

While GenAI produced clear, structured outputs, the company insisted on maintaining rigorous scientific oversight. Every GenAI-generated report was reviewed by scientists, who could validate, refine, or discard suggestions. Selected protein candidates underwent a secondary lethality re-check, leveraging AlphaFold and additional ML models to ensure safety before moving to wet-lab validation.

Extensible Framework for Future Growth

PTP built the pipeline with modularity in mind. The orchestration layer—anchored on AWS Lambda and Amazon API Gateway—ensured that new GenAI agents or ML models could be added with minimal reconfiguration. Documentation and training were provided so the company’s team could extend the framework independently.

Why AWS

The company selected AWS as the backbone for this project because of three critical advantages:

Security and Compliance

With sensitive research data at the core of operations, AWS provided a secure, compliance-ready environment. S3, SageMaker, and Bedrock operated within the company’s isolated VPC, ensuring data never left the secure boundary.

Breadth of Model Choice

AWS Bedrock offered access to multiple foundation models through a unified API, allowing experimentation with ProtGPT2, ProtBERT, and other specialized models without costly redevelopment.

Scalability

AWS’s elastic infrastructure meant the company could scale computationally intensive protein folding workloads up or down as research demands shifted. This flexibility allowed acceleration without overinvesting in static infrastructure.

Why PTP

The company chose PTP as its partner because of its deep expertise in both AWS consulting and life sciences R&D.

Life Sciences Competency

As an AWS Life Sciences Competency partner, PTP brought domain-specific knowledge of biotech workflows, regulatory constraints, and scientific data handling.

Proven AWS Delivery

With years of AWS consulting experience, PTP designed and delivered a pipeline that adhered to AWS best practices while meeting the company’s unique research needs.

Innovation and Enablement

Beyond building the system, PTP enabled the company’s team with training, documentation, and extensibility—ensuring they could independently grow the framework to support future research initiatives.

The Results

The integrated ML + GenAI pipeline delivered measurable impact across The Company’s protein research workflows:

Time Efficiency

Experiment planning cycles shortened by 35%.

Reports that once required days of manual drafting were now generated automatically in minutes.

Research Productivity

Cross-disciplinary teams gained immediate clarity from GenAI-generated summaries, enabling biologists, chemists, and clinicians to collaborate more effectively.

Faster turnaround times allowed the company to expand the number of protein candidates in active development without adding headcount.

Quality and Consistency

Reports generated in plain language improved communication across the organization.

Consistent formatting and structure ensured that every experimental brief was regulator-ready and scientifically coherent.

Scalable Innovation

The modular framework positioned the company to add new GenAI agents for tasks such as literature review, knowledge graph exploration, or biomarker discovery.

The company’s scientists could now focus on higher-value tasks—hypothesis generation, experimental design, and strategic decision-making.


Conclusion

The Company Bio’s integration of ML and GenAI represents a breakthrough in how biotech organizations can accelerate protein research. By pairing AlphaFold-driven predictions with Bedrock-powered contextualization, the Company transformed dense, technical outputs into experiment-ready briefs that fuel collaboration and speed.

The results speak for themselves: shorter research cycles, more scalable experimentation, and higher-quality outputs—all achieved within a secure, AWS-native framework designed for life sciences. With PTP’s expertise, the Company now has a repeatable pipeline that will evolve alongside their research portfolio.

Most importantly, this project underscores how cloud-native AI integration can fundamentally reshape biotech R&D. For the Company, the fusion of ML and GenAI isn’t just an IT upgrade—it’s a strategic capability that empowers scientists to discover, validate, and deliver new protein therapeutics faster than ever before.

Isometric graph icon representing secure AWS Transfer Family architecture for life sciences

Accelerate Your Research with AI + Cloud

Ready to transform complex data into actionable insights? Partner with PTP, an AWS Life Sciences Competency Partner, to harness machine learning and generative AI for faster, more scalable research.

Schedule your free consultation today.

Tell us a bit about your project to get started with PTP. Fill out the form below and our team will be in touch shortly.

Homepage Contact Us