Multivariate Statistical Process Optimization in the Industrial Production of Enzymes

Research output: Book/ReportPh.D. thesisResearch

  • Anna Klimkiewicz
In modern biotech production, a massive number of diverse measurements, with a
broad diversity in information content and quality, is stored in data historians. The
potential of this enormous amount of data is currently under-employed in process
optimization efforts. This is a result of the demanding steps required in thoughtful
data retrieval from the historian and the subsequent data pre-processing steps.
Furthermore, efficient methods are needed capable of handling the data in the
natural structure in which it was generated.
This dissertation work is meant to address some of the challenges and difficulties
related to ‘recycling’ of historical data from a full-scale manufacturing of industrial
enzymes. First, the crucial and tedious step of retrieving the data from the systems is
presented. The prerequisites that need to be comprehended are discussed, such as
sensors accuracy and reliability, aspects related to the actual measuring frequency
and non-equidistance retaining strategies in data storage. Different regimes of data
extraction can be employed, and some might introduce undesirable artifacts in the
final analysis results (POSTER II1). Several signal processing techniques are also
briefly discussed and examples of applications presented, e.g. how to compensate for
sensors with low signal to noise ratio or the handling of artifacts in the data. A
second important step is alignment and synchronization of process data. This is
particularly significant when looking at the relation between sequences of unit
operations separated in time and, even more so when working with (semi-)
continuous processes when generating the time series data. For this application, the
potential of auto- and cross-correlation analysis and the effect of the prerequisite
signal de-trending are explored in the context of the continuous granulation-drying
process (POSTER I).
Posters and papers marked by all-capitals can be found at the end of this thesis.The research presented in this thesis is primarily centered on the ultrafiltration stepduring which enzymes are purified and up-concentrated. The throughput of acontinuous ultrafiltration operation is limited by the membrane fouling phenomenawhere the production capacity - monitored as flow through the membrane or flux -decreases over time. The flux varies considerably from run to run within the sameproduct and likewise between different products. This variability clearly affects theproduction scheduling and leads to additional costs due to the more frequentmembrane cleaning. The dataset examined in this investigation was compiled fromrecords of conventional, univariate process sensors collected over several years ofproduction of one type of intermediate enzyme products. Different strategies for theorganization of these datasets, with varying number of timestamps, into datastructures fit for latent variable (LV) modeling, have been compared. The ultimateaim of the data mining steps is the construction of statistical ‘soft models’ whichcapture the principle or latent behavior of the system under investigation. If thisleads to new knowledge, it could be used for optimization of future production runs.Data reduced to mean value per run, combined with some other relevant features,has been used together with PLS2 regression in the primary investigation. It allowedus to identify the major differences between the processing variants of theinvestigated enzyme. Data arrangement into three-way cubes has been achieved bylimiting the datasets to the median length. Studies with LV techniques after thebatch-wise unfolding did not led to any special findings. Hence, it has beenconcluded that the process can be modeled sufficiently well when the datasets areconcatenated variable-wise. The later studies used this type of data arrangement andfocused only on the products with higher concentration degree as in those cases theflux decline problem has been the most pronounced. Blocking in the row or timedirection was used in PAPER II. The dataset has a natural multilevel structure withlevel one being the process timestamps which are nested within the ultrafiltrationruns, referred to as level two. Multilevel Simultaneous Component Analysis withinvariant Pattern (MSCA-P) is applied to explore this historical dataset in the contextof flux decline. We build on the two-level idea and expand the model to a third level:‘processing recipe’. In PAPER III blocking in the column or process tags direction hasbeen used. A multiblock PLS breaks the process variables into smaller groups,clustering variables of similar importance and characteristics, to facilitate the-Vdiagnostic procedure. Both methods lead to decomposition of the data structures intointuitively interpretable solutions by keeping the natural structure of the analyzeddata.Additionally, the ultrafiltration system has been also investigated in terms of productyield. The potential of NIR technology to monitor the activity of the enzyme has beenthe subject of a feasibility study presented in PAPER I. It included (a) evaluation onwhich of the two real-time NIR flow cell configurations is the preferred arrangementfor monitoring of the retentate stream downstream to the UF, and (b) if the systemcan be used for statistical process monitoring and early warning/fault detection. Itwas possible to develop satisfying robust calibration models for four types of enzymeproducts where specific enzyme activities have been standardized into one globalQC parameter. Finally, the study revealed that the less demanding in-line flow cellsetup outperformed the on-line arrangement. The former worked satisfactory robusttowards different products (amylases and proteases) and associated processingparameters such temperature and processing speed.This dissertation work shows that chemometric methods specially designed for twowayand multiset problems have great potential as PAT tools as they fulfill theprimary goal of PAT, namely to obtain a better process understanding in a faster andmore intuitive way, especially when preserving the original data structure anddimensionality.
Original languageEnglish
PublisherDepartment of Food Science, Faculty of Science, University of Copenhagen
Number of pages175
Publication statusPublished - 2016

ID: 164116330