Project PISCES
Process Integration & Synthesis using Chemical Engineering Standards
Harmonizing bioprocess data from diverse sources into a unified, machine-readable format to enable AI-powered knowledge extraction, analysis, and process flowsheet creation.
The Problem:
Dispersed Bioprocess Data
Decades of bioprocess engineering knowledge are trapped in unstructured formats like PDFs and images, making it nearly impossible to systematically analyze or leverage data for modern AI applications.
Researchers waste countless hours manually extracting and reformatting information that should be readily accessible and machine-readable, hindering innovation and discovery.

Our Solution:
Flowsheet Standardization
Project PISCES introduces the PISCES Flowsheet Format (PFF), a flexible, extensible, and non-proprietary JSON-based standard representing bioprocess flowsheets as directed graphs where unit operations are nodes and process streams are edges.
This structure allows rich data to be attached to both, capturing a process’s essential elements—units, streams, chemicals, and utilities—in a machine-readable format. This unlocks the potential for automated analysis and AI-driven insights across the bioprocess knowledge space.
PFF Preview:
{
"metadata": {
"pff_version": "1.0",
"source_document": "doi:10.1021/...",
"extraction_date": "2024-07-29"
},
"units": [
{
"id": "unit_1",
"type": "ChromatographyColumn",
"label": "Protein A Column"
},
{
"id": "unit_2",
"type": "Tank",
"label": "Elution Buffer Tank"
}
],
"streams": [
{
"id": "stream_1",
"source": "unit_2",
"destination": "unit_1",
"label": "Elution Buffer"
}
],
"chemicals": [
{
"id": "chem_1",
"name": "Tris-HCl",
"role": "buffer"
}
],
"utilities": [
{
"id": "util_1",
"type": "WFI",
"description": "Water for Injection"
}
]
}Project Methodology & Vision
AI-Powered Flowsheet Mining
We leverage state-of-the-art large language models to automatically extract flowsheet information from scientific literature, converting unstructured PDFs into structured PFF JSON files.
Knowledge Augmentation
Beyond basic extraction, we augment PFF data with detailed information from the full text, including operating conditions, performance metrics, and experimental context.
A FAIR Knowledge Base
Our goal is to create a Findable, Accessible, Interoperable, and Reusable (FAIR) repository of validated bioprocess flowsheets, freely available to the research community.
Enabling Generative AI
With a comprehensive, standardized knowledge base, we can train generative AI models to design novel bioprocesses, optimize existing workflows, and accelerate innovation in the field.