Multivariate Statistical Process Optimization in the Industrial Production of Enzymes

Research output: Book/ReportPh.D. thesis

Standard

Multivariate Statistical Process Optimization in the Industrial Production of Enzymes. / Klimkiewicz, Anna.

Department of Food Science, Faculty of Science, University of Copenhagen, 2016. 175 p.

Research output: Book/ReportPh.D. thesis

Harvard

Klimkiewicz, A 2016, Multivariate Statistical Process Optimization in the Industrial Production of Enzymes. Department of Food Science, Faculty of Science, University of Copenhagen. <https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122273754105763>

APA

Klimkiewicz, A. (2016). Multivariate Statistical Process Optimization in the Industrial Production of Enzymes. Department of Food Science, Faculty of Science, University of Copenhagen. https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122273754105763

Vancouver

Klimkiewicz A. Multivariate Statistical Process Optimization in the Industrial Production of Enzymes. Department of Food Science, Faculty of Science, University of Copenhagen, 2016. 175 p.

Author

Klimkiewicz, Anna. / Multivariate Statistical Process Optimization in the Industrial Production of Enzymes. Department of Food Science, Faculty of Science, University of Copenhagen, 2016. 175 p.

Bibtex

@phdthesis{3327b6ae8b88442db290aae76e6bc72e,
title = "Multivariate Statistical Process Optimization in the Industrial Production of Enzymes",
abstract = "In modern biotech production, a massive number of diverse measurements, with abroad diversity in information content and quality, is stored in data historians. Thepotential of this enormous amount of data is currently under-employed in processoptimization efforts. This is a result of the demanding steps required in thoughtfuldata retrieval from the historian and the subsequent data pre-processing steps.Furthermore, efficient methods are needed capable of handling the data in thenatural structure in which it was generated.This dissertation work is meant to address some of the challenges and difficultiesrelated to {\textquoteleft}recycling{\textquoteright} of historical data from a full-scale manufacturing of industrialenzymes. First, the crucial and tedious step of retrieving the data from the systems ispresented. The prerequisites that need to be comprehended are discussed, such assensors accuracy and reliability, aspects related to the actual measuring frequencyand non-equidistance retaining strategies in data storage. Different regimes of dataextraction can be employed, and some might introduce undesirable artifacts in thefinal analysis results (POSTER II1). Several signal processing techniques are alsobriefly discussed and examples of applications presented, e.g. how to compensate forsensors with low signal to noise ratio or the handling of artifacts in the data. Asecond important step is alignment and synchronization of process data. This isparticularly significant when looking at the relation between sequences of unitoperations separated in time and, even more so when working with (semi-)continuous processes when generating the time series data. For this application, thepotential of auto- and cross-correlation analysis and the effect of the prerequisitesignal de-trending are explored in the context of the continuous granulation-dryingprocess (POSTER I).Posters and papers marked by all-capitals can be found at the end of this thesis.The research presented in this thesis is primarily centered on the ultrafiltration stepduring which enzymes are purified and up-concentrated. The throughput of acontinuous ultrafiltration operation is limited by the membrane fouling phenomenawhere the production capacity - monitored as flow through the membrane or flux -decreases over time. The flux varies considerably from run to run within the sameproduct and likewise between different products. This variability clearly affects theproduction scheduling and leads to additional costs due to the more frequentmembrane cleaning. The dataset examined in this investigation was compiled fromrecords of conventional, univariate process sensors collected over several years ofproduction of one type of intermediate enzyme products. Different strategies for theorganization of these datasets, with varying number of timestamps, into datastructures fit for latent variable (LV) modeling, have been compared. The ultimateaim of the data mining steps is the construction of statistical {\textquoteleft}soft models{\textquoteright} whichcapture the principle or latent behavior of the system under investigation. If thisleads to new knowledge, it could be used for optimization of future production runs.Data reduced to mean value per run, combined with some other relevant features,has been used together with PLS2 regression in the primary investigation. It allowedus to identify the major differences between the processing variants of theinvestigated enzyme. Data arrangement into three-way cubes has been achieved bylimiting the datasets to the median length. Studies with LV techniques after thebatch-wise unfolding did not led to any special findings. Hence, it has beenconcluded that the process can be modeled sufficiently well when the datasets areconcatenated variable-wise. The later studies used this type of data arrangement andfocused only on the products with higher concentration degree as in those cases theflux decline problem has been the most pronounced. Blocking in the row or timedirection was used in PAPER II. The dataset has a natural multilevel structure withlevel one being the process timestamps which are nested within the ultrafiltrationruns, referred to as level two. Multilevel Simultaneous Component Analysis withinvariant Pattern (MSCA-P) is applied to explore this historical dataset in the contextof flux decline. We build on the two-level idea and expand the model to a third level:{\textquoteleft}processing recipe{\textquoteright}. In PAPER III blocking in the column or process tags direction hasbeen used. A multiblock PLS breaks the process variables into smaller groups,clustering variables of similar importance and characteristics, to facilitate the-Vdiagnostic procedure. Both methods lead to decomposition of the data structures intointuitively interpretable solutions by keeping the natural structure of the analyzeddata.Additionally, the ultrafiltration system has been also investigated in terms of productyield. The potential of NIR technology to monitor the activity of the enzyme has beenthe subject of a feasibility study presented in PAPER I. It included (a) evaluation onwhich of the two real-time NIR flow cell configurations is the preferred arrangementfor monitoring of the retentate stream downstream to the UF, and (b) if the systemcan be used for statistical process monitoring and early warning/fault detection. Itwas possible to develop satisfying robust calibration models for four types of enzymeproducts where specific enzyme activities have been standardized into one globalQC parameter. Finally, the study revealed that the less demanding in-line flow cellsetup outperformed the on-line arrangement. The former worked satisfactory robusttowards different products (amylases and proteases) and associated processingparameters such temperature and processing speed.This dissertation work shows that chemometric methods specially designed for twowayand multiset problems have great potential as PAT tools as they fulfill theprimary goal of PAT, namely to obtain a better process understanding in a faster andmore intuitive way, especially when preserving the original data structure anddimensionality.",
author = "Anna Klimkiewicz",
year = "2016",
language = "English",
publisher = "Department of Food Science, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Multivariate Statistical Process Optimization in the Industrial Production of Enzymes

AU - Klimkiewicz, Anna

PY - 2016

Y1 - 2016

N2 - In modern biotech production, a massive number of diverse measurements, with abroad diversity in information content and quality, is stored in data historians. Thepotential of this enormous amount of data is currently under-employed in processoptimization efforts. This is a result of the demanding steps required in thoughtfuldata retrieval from the historian and the subsequent data pre-processing steps.Furthermore, efficient methods are needed capable of handling the data in thenatural structure in which it was generated.This dissertation work is meant to address some of the challenges and difficultiesrelated to ‘recycling’ of historical data from a full-scale manufacturing of industrialenzymes. First, the crucial and tedious step of retrieving the data from the systems ispresented. The prerequisites that need to be comprehended are discussed, such assensors accuracy and reliability, aspects related to the actual measuring frequencyand non-equidistance retaining strategies in data storage. Different regimes of dataextraction can be employed, and some might introduce undesirable artifacts in thefinal analysis results (POSTER II1). Several signal processing techniques are alsobriefly discussed and examples of applications presented, e.g. how to compensate forsensors with low signal to noise ratio or the handling of artifacts in the data. Asecond important step is alignment and synchronization of process data. This isparticularly significant when looking at the relation between sequences of unitoperations separated in time and, even more so when working with (semi-)continuous processes when generating the time series data. For this application, thepotential of auto- and cross-correlation analysis and the effect of the prerequisitesignal de-trending are explored in the context of the continuous granulation-dryingprocess (POSTER I).Posters and papers marked by all-capitals can be found at the end of this thesis.The research presented in this thesis is primarily centered on the ultrafiltration stepduring which enzymes are purified and up-concentrated. The throughput of acontinuous ultrafiltration operation is limited by the membrane fouling phenomenawhere the production capacity - monitored as flow through the membrane or flux -decreases over time. The flux varies considerably from run to run within the sameproduct and likewise between different products. This variability clearly affects theproduction scheduling and leads to additional costs due to the more frequentmembrane cleaning. The dataset examined in this investigation was compiled fromrecords of conventional, univariate process sensors collected over several years ofproduction of one type of intermediate enzyme products. Different strategies for theorganization of these datasets, with varying number of timestamps, into datastructures fit for latent variable (LV) modeling, have been compared. The ultimateaim of the data mining steps is the construction of statistical ‘soft models’ whichcapture the principle or latent behavior of the system under investigation. If thisleads to new knowledge, it could be used for optimization of future production runs.Data reduced to mean value per run, combined with some other relevant features,has been used together with PLS2 regression in the primary investigation. It allowedus to identify the major differences between the processing variants of theinvestigated enzyme. Data arrangement into three-way cubes has been achieved bylimiting the datasets to the median length. Studies with LV techniques after thebatch-wise unfolding did not led to any special findings. Hence, it has beenconcluded that the process can be modeled sufficiently well when the datasets areconcatenated variable-wise. The later studies used this type of data arrangement andfocused only on the products with higher concentration degree as in those cases theflux decline problem has been the most pronounced. Blocking in the row or timedirection was used in PAPER II. The dataset has a natural multilevel structure withlevel one being the process timestamps which are nested within the ultrafiltrationruns, referred to as level two. Multilevel Simultaneous Component Analysis withinvariant Pattern (MSCA-P) is applied to explore this historical dataset in the contextof flux decline. We build on the two-level idea and expand the model to a third level:‘processing recipe’. In PAPER III blocking in the column or process tags direction hasbeen used. A multiblock PLS breaks the process variables into smaller groups,clustering variables of similar importance and characteristics, to facilitate the-Vdiagnostic procedure. Both methods lead to decomposition of the data structures intointuitively interpretable solutions by keeping the natural structure of the analyzeddata.Additionally, the ultrafiltration system has been also investigated in terms of productyield. The potential of NIR technology to monitor the activity of the enzyme has beenthe subject of a feasibility study presented in PAPER I. It included (a) evaluation onwhich of the two real-time NIR flow cell configurations is the preferred arrangementfor monitoring of the retentate stream downstream to the UF, and (b) if the systemcan be used for statistical process monitoring and early warning/fault detection. Itwas possible to develop satisfying robust calibration models for four types of enzymeproducts where specific enzyme activities have been standardized into one globalQC parameter. Finally, the study revealed that the less demanding in-line flow cellsetup outperformed the on-line arrangement. The former worked satisfactory robusttowards different products (amylases and proteases) and associated processingparameters such temperature and processing speed.This dissertation work shows that chemometric methods specially designed for twowayand multiset problems have great potential as PAT tools as they fulfill theprimary goal of PAT, namely to obtain a better process understanding in a faster andmore intuitive way, especially when preserving the original data structure anddimensionality.

AB - In modern biotech production, a massive number of diverse measurements, with abroad diversity in information content and quality, is stored in data historians. Thepotential of this enormous amount of data is currently under-employed in processoptimization efforts. This is a result of the demanding steps required in thoughtfuldata retrieval from the historian and the subsequent data pre-processing steps.Furthermore, efficient methods are needed capable of handling the data in thenatural structure in which it was generated.This dissertation work is meant to address some of the challenges and difficultiesrelated to ‘recycling’ of historical data from a full-scale manufacturing of industrialenzymes. First, the crucial and tedious step of retrieving the data from the systems ispresented. The prerequisites that need to be comprehended are discussed, such assensors accuracy and reliability, aspects related to the actual measuring frequencyand non-equidistance retaining strategies in data storage. Different regimes of dataextraction can be employed, and some might introduce undesirable artifacts in thefinal analysis results (POSTER II1). Several signal processing techniques are alsobriefly discussed and examples of applications presented, e.g. how to compensate forsensors with low signal to noise ratio or the handling of artifacts in the data. Asecond important step is alignment and synchronization of process data. This isparticularly significant when looking at the relation between sequences of unitoperations separated in time and, even more so when working with (semi-)continuous processes when generating the time series data. For this application, thepotential of auto- and cross-correlation analysis and the effect of the prerequisitesignal de-trending are explored in the context of the continuous granulation-dryingprocess (POSTER I).Posters and papers marked by all-capitals can be found at the end of this thesis.The research presented in this thesis is primarily centered on the ultrafiltration stepduring which enzymes are purified and up-concentrated. The throughput of acontinuous ultrafiltration operation is limited by the membrane fouling phenomenawhere the production capacity - monitored as flow through the membrane or flux -decreases over time. The flux varies considerably from run to run within the sameproduct and likewise between different products. This variability clearly affects theproduction scheduling and leads to additional costs due to the more frequentmembrane cleaning. The dataset examined in this investigation was compiled fromrecords of conventional, univariate process sensors collected over several years ofproduction of one type of intermediate enzyme products. Different strategies for theorganization of these datasets, with varying number of timestamps, into datastructures fit for latent variable (LV) modeling, have been compared. The ultimateaim of the data mining steps is the construction of statistical ‘soft models’ whichcapture the principle or latent behavior of the system under investigation. If thisleads to new knowledge, it could be used for optimization of future production runs.Data reduced to mean value per run, combined with some other relevant features,has been used together with PLS2 regression in the primary investigation. It allowedus to identify the major differences between the processing variants of theinvestigated enzyme. Data arrangement into three-way cubes has been achieved bylimiting the datasets to the median length. Studies with LV techniques after thebatch-wise unfolding did not led to any special findings. Hence, it has beenconcluded that the process can be modeled sufficiently well when the datasets areconcatenated variable-wise. The later studies used this type of data arrangement andfocused only on the products with higher concentration degree as in those cases theflux decline problem has been the most pronounced. Blocking in the row or timedirection was used in PAPER II. The dataset has a natural multilevel structure withlevel one being the process timestamps which are nested within the ultrafiltrationruns, referred to as level two. Multilevel Simultaneous Component Analysis withinvariant Pattern (MSCA-P) is applied to explore this historical dataset in the contextof flux decline. We build on the two-level idea and expand the model to a third level:‘processing recipe’. In PAPER III blocking in the column or process tags direction hasbeen used. A multiblock PLS breaks the process variables into smaller groups,clustering variables of similar importance and characteristics, to facilitate the-Vdiagnostic procedure. Both methods lead to decomposition of the data structures intointuitively interpretable solutions by keeping the natural structure of the analyzeddata.Additionally, the ultrafiltration system has been also investigated in terms of productyield. The potential of NIR technology to monitor the activity of the enzyme has beenthe subject of a feasibility study presented in PAPER I. It included (a) evaluation onwhich of the two real-time NIR flow cell configurations is the preferred arrangementfor monitoring of the retentate stream downstream to the UF, and (b) if the systemcan be used for statistical process monitoring and early warning/fault detection. Itwas possible to develop satisfying robust calibration models for four types of enzymeproducts where specific enzyme activities have been standardized into one globalQC parameter. Finally, the study revealed that the less demanding in-line flow cellsetup outperformed the on-line arrangement. The former worked satisfactory robusttowards different products (amylases and proteases) and associated processingparameters such temperature and processing speed.This dissertation work shows that chemometric methods specially designed for twowayand multiset problems have great potential as PAT tools as they fulfill theprimary goal of PAT, namely to obtain a better process understanding in a faster andmore intuitive way, especially when preserving the original data structure anddimensionality.

UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122273754105763

M3 - Ph.D. thesis

BT - Multivariate Statistical Process Optimization in the Industrial Production of Enzymes

PB - Department of Food Science, Faculty of Science, University of Copenhagen

ER -

ID: 164116330