Project PISCES

Process Integration & Synthesis using Chemical Engineering Standards

Harmonizing bioprocess data from diverse sources into a unified, machine-readable format to enable AI-powered knowledge extraction, analysis, and process flowsheet creation.

The Problem:
Dispersed Bioprocess Data

Decades of bioprocess engineering knowledge are trapped in unstructured formats like PDFs and images, making it nearly impossible to systematically analyze or leverage data for modern AI applications.

Researchers waste countless hours manually extracting and reformatting information that should be readily accessible and machine-readable, hindering innovation and discovery.

Illustration of dispersed bioprocess data

Our Solution:
Flowsheet Standardization

Project PISCES introduces the Standard Flowsheet Format (SFF), a flexible, extensible, and non-proprietary JSON-based standard representing bioprocess flowsheets as directed graphs where unit operations are nodes and process streams are edges.

This structure allows rich data to be attached to both, capturing a process’s essential elements—units, streams, chemicals, and utilities—in a machine-readable format. This unlocks the potential for automated analysis and AI-driven insights across the bioprocess knowledge space.

{
  "metadata": {
    "sff_version": "1.0",
    "source_document": "doi:10.1021/...",
    "extraction_date": "2024-07-29"
  },
  "units": [
    {
      "id": "unit_1",
      "type": "ChromatographyColumn",
      "label": "Protein A Column"
    },
    {
      "id": "unit_2",
      "type": "Tank",
      "label": "Elution Buffer Tank"
    }
  ],
  "streams": [
    {
      "id": "stream_1",
      "source": "unit_2",
      "destination": "unit_1",
      "label": "Elution Buffer"
    }
  ],
  "chemicals": [
    {
      "id": "chem_1",
      "name": "Tris-HCl",
      "role": "buffer"
    }
  ],
  "utilities": [
    {
      "id": "util_1",
      "type": "WFI",
      "description": "Water for Injection"
    }
  ]
}

Project Methodology & Vision

AI-Powered Flowsheet Mining

We leverage state-of-the-art large language models to automatically extract flowsheet information from scientific literature, converting unstructured PDFs into structured SFF JSON files.

Knowledge Augmentation

Beyond basic extraction, we augment SFF data with detailed information from the full text, including operating conditions, performance metrics, and experimental context.

A FAIR Knowledge Base

Our goal is to create a Findable, Accessible, Interoperable, and Reusable (FAIR) repository of validated bioprocess flowsheets, freely available to the research community.

Enabling Generative AI

With a comprehensive, standardized knowledge base, we can train generative AI models to design novel bioprocesses, optimize existing workflows, and accelerate innovation in the field.

Download the Whitepaper