Case Study

Nitrogen Source Optimization for Cellulase Production by Penicillium funiculosum, using a Sequential Experimental Design Methodology and the Desirability Function

The chosen article for the case study suggests employing a sequential experimental design methodology to enhance cellulase production efficiency. Lignocellulosic biomass, being one of the most abundant materials globally, holds immense potential for biomass production. However, the widespread application of lignocellulosic biomass for biofuel production faces constraints due to the considerable expenses associated with acquiring the necessary enzymes for the chemical process of deriving second-generation sugars. Consequently, the optimization of enzyme production emerges as a pivotal focus within the energy industry’s interests.

The Chosen Article

Maeda, R.N., da Silva, M.M.P., Santa Anna, L.M.M. et al. Nitrogen Source Optimization for Cellulase Production by Penicillium funiculosum, using a Sequential Experimental Design Methodology and the Desirability Function. Appl Biochem Biotechnol 161, 411–422 (2010)


Second-generation ethanol production. Source: https://doi.org/10.1016/j.cej.2022.138690

Experimental Methods

Cellulase production by Penicillium funiculosum was assessed through the measurement of cellulase activity, quantified in terms of activity units (U), indicating the enzymatic extract required to liberate 1 μmol of sugars per minute.

Three distinct methods were employed to quantify cellulase activity, serving as the variables under consideration for optimization:

Filter Paper Assay (FPase)
Carboxymethylcellulose Assay (CMCase)
Cellobiose Assay (𝛽-Glucosidase)

The dependent variables in this study encompass the concentrations of various nitrogen sources present within the samples, namely:

Urea
Ammonium Sulfate
Peptone
Yeast Extract

The experimental samples consist of conical flasks, each containing 200 mL of a production medium. This medium features varying concentrations of urea, ammonium sulfate, peptone, and yeast extract, all acting as nitrogen sources. It’s worth noting that the carbon source remains constant throughout the experiments, maintaining a concentration of 15 g/L of partially delignified Cellulignin. Additionally, the spore suspension of Penicillium funiculosum ATCC 11797 maintains a consistent value of $10^6$ conidia/mL.

Sequential Experimenta Design

The sequential experimental design involves a series of meticulously crafted iterations, wherein each step entails the elimination of an insignificant variable.

This systematic approach begins with a $2^4$ factorial design, from which the least influential variable is identified and excluded based on Pareto plot analysis of the effects observed. The ensuing stage employs the remaining three variables within a $2^3$ experimental design, following a similar process of identifying and omitting the least impactful variable. Ultimately, the final stage entails the execution of a central composite design, employing the last two variables that have emerged from the iterative refinement process.

Factorial $2^4$


Levels for independent variables for the $2^4$ experimental design.


Experimental results of each run of the $2^4$ design.

`explann` hands-on on $2^4$ design

A python package (`explann <https://github.com/properallan/explann/>`__) was developed to assist design and statistical analysis of experiments.

All the following source code is hostes on github https://github.com/properallan/explann/. Both the table of levels and the experimental results can be easily imported to the explann package. There are functions to import data in string format or even xlsx file format. The explann package can assisti also in the creation of the experimetal design.

TwoLevelFactorial implements a generic $2^n$ factorial design, with $n$ variables defined as a dictionary containunt the variable name and range.

[1]:

from explann.doe import TwoLevelFactorial

f2b4 = TwoLevelFactorial(
    variables = {
        'U': (0.15, 0.45),
        'A': (0.70, 2.10),
        'P': (0.40, 1.10),
        'Y': (0.13, 0.38)
    },
    central_points=3
)

Once instantiated this class build the doe table, storage as an object attribute as a pandas DataFrame.

[2]:

f2b4.doe

[2]:

	U	A	P	Y
Index
1	-1.0	-1.0	-1.0	-1.0
2	1.0	-1.0	-1.0	-1.0
3	-1.0	1.0	-1.0	-1.0
4	1.0	1.0	-1.0	-1.0
5	-1.0	-1.0	1.0	-1.0
6	1.0	-1.0	1.0	-1.0
7	-1.0	1.0	1.0	-1.0
8	1.0	1.0	1.0	-1.0
9	-1.0	-1.0	-1.0	1.0
10	1.0	-1.0	-1.0	1.0
11	-1.0	1.0	-1.0	1.0
12	1.0	1.0	-1.0	1.0
13	-1.0	-1.0	1.0	1.0
14	1.0	-1.0	1.0	1.0
15	-1.0	1.0	1.0	1.0
16	1.0	1.0	1.0	1.0
17	0.0	0.0	0.0	0.0
18	0.0	0.0	0.0	0.0
19	0.0	0.0	0.0	0.0

A table of the levels are also automaticly created.

[3]:

f2b4.levels

[3]:

	U	A	P	Y
Levels
-1.0	0.15	0.7	0.40	0.130
0.0	0.30	1.4	0.75	0.255
1.0	0.45	2.1	1.10	0.380

The doe table can be complemented with the response variables. For this task either ImportString or ImportXLSX can be used to load data.

[4]:

from explann.dataio import ImportString

f2b4_results = ImportString(data="""
F,CM,B
39,1.328,170
87,1.699,122
48,1.332,473
71,1.979,511
43,1.458,156
84,2.189,204
45,1.343,385
112,1.707,288
19,1.257,114
146,2.148,116
50,1.592,244
92,1.726,126
107,1.203,72
172,2.261,210
62,1.434,234
82,1.848,154
75,1.726,223
70,1.782,219
89,1.753,226
""",
delimiter=',')

f2b4_results.data

[4]:

	F	CM	B
1	39	1.328	170
2	87	1.699	122
3	48	1.332	473
4	71	1.979	511
5	43	1.458	156
6	84	2.189	204
7	45	1.343	385
8	112	1.707	288
9	19	1.257	114
10	146	2.148	116
11	50	1.592	244
12	92	1.726	126
13	107	1.203	72
14	172	2.261	210
15	62	1.434	234
16	82	1.848	154
17	75	1.726	223
18	70	1.782	219
19	89	1.753	226

The data is the merged with the doe table using append_results method.

[5]:

f2b4.append_results(results=f2b4_results.data)
f2b4.doe

[5]:

	U	A	P	Y	F	CM	B
Index
1	-1.0	-1.0	-1.0	-1.0	39	1.328	170
2	1.0	-1.0	-1.0	-1.0	87	1.699	122
3	-1.0	1.0	-1.0	-1.0	48	1.332	473
4	1.0	1.0	-1.0	-1.0	71	1.979	511
5	-1.0	-1.0	1.0	-1.0	43	1.458	156
6	1.0	-1.0	1.0	-1.0	84	2.189	204
7	-1.0	1.0	1.0	-1.0	45	1.343	385
8	1.0	1.0	1.0	-1.0	112	1.707	288
9	-1.0	-1.0	-1.0	1.0	19	1.257	114
10	1.0	-1.0	-1.0	1.0	146	2.148	116
11	-1.0	1.0	-1.0	1.0	50	1.592	244
12	1.0	1.0	-1.0	1.0	92	1.726	126
13	-1.0	-1.0	1.0	1.0	107	1.203	72
14	1.0	-1.0	1.0	1.0	172	2.261	210
15	-1.0	1.0	1.0	1.0	62	1.434	234
16	1.0	1.0	1.0	1.0	82	1.848	154
17	0.0	0.0	0.0	0.0	75	1.726	223
18	0.0	0.0	0.0	0.0	70	1.782	219
19	0.0	0.0	0.0	0.0	89	1.753	226

This experimental planning can be easily saved in .xlsx format. doe raw data is stored in the first sheet and the levels data is stored in a separate sheet named levels.

[6]:

f2b4.save_excel('../../data/f2b4.xlsx')

The ImporXLSX assist the import of data from excel file, this could be filled by hand or generated in explann as described above.

[7]:

from explann.dataio import ImportXLSX

f2b4_from_excel = ImportXLSX(
    path = '../../data/f2b4.xlsx',
    levels_sheet = 'levels'
)

f2b4_from_excel.data

[7]:

	Index	U	A	P	Y	F	CM	B
1	1	-1	-1	-1	-1	39	1.328	170
2	2	1	-1	-1	-1	87	1.699	122
3	3	-1	1	-1	-1	48	1.332	473
4	4	1	1	-1	-1	71	1.979	511
5	5	-1	-1	1	-1	43	1.458	156
6	6	1	-1	1	-1	84	2.189	204
7	7	-1	1	1	-1	45	1.343	385
8	8	1	1	1	-1	112	1.707	288
9	9	-1	-1	-1	1	19	1.257	114
10	10	1	-1	-1	1	146	2.148	116
11	11	-1	1	-1	1	50	1.592	244
12	12	1	1	-1	1	92	1.726	126
13	13	-1	-1	1	1	107	1.203	72
14	14	1	-1	1	1	172	2.261	210
15	15	-1	1	1	1	62	1.434	234
16	16	1	1	1	1	82	1.848	154
17	17	0	0	0	0	75	1.726	223
18	18	0	0	0	0	70	1.782	219
19	19	0	0	0	0	89	1.753	226

If the levels_sheet argument is passed, the loaded object will contain an aditional table with the levels parsed.

[8]:

f2b4_from_excel.parsed_data

[8]:

	Index	U	A	P	Y	F	CM	B
1	1	0.45	0.7	0.40	0.130	39	1.328	170
2	2	0.15	2.1	0.40	0.130	87	1.699	122
3	3	0.45	2.1	0.40	0.130	48	1.332	473
4	4	0.15	0.7	1.10	0.130	71	1.979	511
5	5	0.45	0.7	1.10	0.130	43	1.458	156
6	6	0.15	2.1	1.10	0.130	84	2.189	204
7	7	0.45	2.1	1.10	0.130	45	1.343	385
8	8	0.15	0.7	0.40	0.380	112	1.707	288
9	9	0.45	0.7	0.40	0.380	19	1.257	114
10	10	0.15	2.1	0.40	0.380	146	2.148	116
11	11	0.45	2.1	0.40	0.380	50	1.592	244
12	12	0.15	0.7	1.10	0.380	92	1.726	126
13	13	0.45	0.7	1.10	0.380	107	1.203	72
14	14	0.15	2.1	1.10	0.380	172	2.261	210
15	15	0.45	2.1	1.10	0.380	62	1.434	234
16	16	0.30	1.4	0.75	0.255	82	1.848	154
17	17	0.30	1.4	0.75	0.255	75	1.726	223
18	18	0.30	1.4	0.75	0.255	70	1.782	219
19	19	0.00	0.0	0.00	0.000	89	1.753	226

We can now use the data to build our factorial model. The functions attribute is a dictionary, the keys are funcion names, and the values are function equations, the syntax follows `patsy <https://patsy.readthedocs.io/en/latest/>`__ stardadization. Here U * A * P * Y stands for all interaction terms between the 4 variables

[9]:

from explann.models import FactorialModel

fm2b4 = FactorialModel(
    data = f2b4_from_excel.data,
    functions = {
        "Fso"  : "F ~ U * A * P * Y + I(U**2) + I(A**2) + I(P**2) + I(Y**2)",
        "CMso" : "CM ~ U * A * P * Y + I(U**2) + I(A**2) + I(P**2) + I(Y**2)",
        "Bso"  : "B ~ U * A * P * Y + I(U**2) + I(A**2) + I(P**2) + I(Y**2)",
        "F"  : "F ~ U * A * P * Y",
        "CM" : "CM ~ U * A * P * Y",
        "B"  : "B ~ U * A * P * Y"}
)

The object fm2b4 stores all the filted models(Ordinary Least Squares) listed in functions, any given model can be retriev by indexing by function name.

[10]:

fm2b4['F']

[10]:

<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7fcd1570edf0>

An overview of the results can be viewd using the summary method. The method accepts also a given funtion name as argument, if None is passed a dictionary of summary() is returned.

[11]:

fm2b4.summary()

/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "

[11]:

{'Fso': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                      F   R-squared:                       0.992
 Model:                            OLS   Adj. R-squared:                  0.931
 Method:                 Least Squares   F-statistic:                     16.16
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):             0.0598
 Time:                        19:16:43   Log-Likelihood:                -49.032
 No. Observations:                  19   AIC:                             132.1
 Df Residuals:                       2   BIC:                             148.1
 Df Model:                          16
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept     78.0000      5.686     13.717      0.005      53.534     102.466
 U             27.0625      2.462     10.991      0.008      16.468      37.657
 A             -8.4375      2.462     -3.427      0.076     -19.032       2.157
 U:A           -8.0625      2.462     -3.274      0.082     -18.657       2.532
 P              9.6875      2.462      3.934      0.059      -0.907      20.282
 U:P           -2.9375      2.462     -1.193      0.355     -13.532       7.657
 A:P           -4.6875      2.462     -1.904      0.197     -15.282       5.907
 U:A:P          5.6875      2.462      2.310      0.147      -4.907      16.282
 Y             12.5625      2.462      5.102      0.036       1.968      23.157
 U:Y            4.6875      2.462      1.904      0.197      -5.907      15.282
 A:Y          -11.3125      2.462     -4.594      0.044     -21.907      -0.718
 U:A:Y         -8.1875      2.462     -3.325      0.080     -18.782       2.407
 P:Y            4.8125      2.462      1.955      0.190      -5.782      15.407
 U:P:Y         -7.5625      2.462     -3.071      0.092     -18.157       3.032
 A:P:Y         -9.3125      2.462     -3.782      0.063     -19.907       1.282
 U:A:P:Y       -0.6875      2.462     -0.279      0.806     -11.282       9.907
 I(U ** 2)      0.1719      1.549      0.111      0.922      -6.493       6.837
 I(A ** 2)      0.1719      1.549      0.111      0.922      -6.493       6.837
 I(P ** 2)      0.1719      1.549      0.111      0.922      -6.493       6.837
 I(Y ** 2)      0.1719      1.549      0.111      0.922      -6.493       6.837
 ==============================================================================
 Omnibus:                       19.262   Durbin-Watson:                   2.036
 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               38.617
 Skew:                           1.278   Prob(JB):                     4.12e-09
 Kurtosis:                       9.500   Cond. No.                     1.96e+30
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 [2] The input rank is higher than the number of observations.
 [3] The smallest eigenvalue is 2.1e-59. This might indicate that there are
 strong multicollinearity problems or that the design matrix is singular.
 """,
 'CMso': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                     CM   R-squared:                       0.999
 Model:                            OLS   Adj. R-squared:                  0.992
 Method:                 Least Squares   F-statistic:                     146.2
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):            0.00682
 Time:                        19:16:43   Log-Likelihood:                 62.359
 No. Observations:                  19   AIC:                            -90.72
 Df Residuals:                       2   BIC:                            -74.66
 Df Model:                          16
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept      1.7537      0.016    108.457      0.000       1.684       1.823
 U              0.2881      0.007     41.152      0.001       0.258       0.318
 A             -0.0364      0.007     -5.195      0.035      -0.066      -0.006
 U:A           -0.0933      0.007    -13.319      0.006      -0.123      -0.063
 P              0.0239      0.007      3.410      0.076      -0.006       0.054
 U:P            0.0327      0.007      4.678      0.043       0.003       0.063
 A:P           -0.0610      0.007     -8.712      0.013      -0.091      -0.031
 U:A:P         -0.0331      0.007     -4.731      0.042      -0.063      -0.003
 Y              0.0271      0.007      3.874      0.061      -0.003       0.057
 U:Y            0.0240      0.007      3.428      0.076      -0.006       0.054
 A:Y            0.0027      0.007      0.393      0.732      -0.027       0.033
 U:A:Y         -0.0819      0.007    -11.694      0.007      -0.112      -0.052
 P:Y           -0.0210      0.007     -2.999      0.096      -0.051       0.009
 U:P:Y          0.0231      0.007      3.303      0.081      -0.007       0.053
 A:P:Y          0.0491      0.007      7.016      0.020       0.019       0.079
 U:A:P:Y        0.0472      0.007      6.749      0.021       0.017       0.077
 I(U ** 2)     -0.0243      0.004     -5.515      0.031      -0.043      -0.005
 I(A ** 2)     -0.0243      0.004     -5.515      0.031      -0.043      -0.005
 I(P ** 2)     -0.0243      0.004     -5.515      0.031      -0.043      -0.005
 I(Y ** 2)     -0.0243      0.004     -5.515      0.031      -0.043      -0.005
 ==============================================================================
 Omnibus:                       13.078   Durbin-Watson:                   3.023
 Prob(Omnibus):                  0.001   Jarque-Bera (JB):               33.486
 Skew:                           0.110   Prob(JB):                     5.35e-08
 Kurtosis:                       9.500   Cond. No.                     1.96e+30
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 [2] The input rank is higher than the number of observations.
 [3] The smallest eigenvalue is 2.1e-59. This might indicate that there are
 strong multicollinearity problems or that the design matrix is singular.
 """,
 'Bso': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                      B   R-squared:                       1.000
 Model:                            OLS   Adj. R-squared:                  0.999
 Method:                 Least Squares   F-statistic:                     1291.
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):           0.000774
 Time:                        19:16:43   Log-Likelihood:                -29.439
 No. Observations:                  19   AIC:                             92.88
 Df Residuals:                       2   BIC:                             108.9
 Df Model:                          16
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept    222.6667      2.028    109.819      0.000     213.943     231.391
 U             -7.3125      0.878     -8.329      0.014     -11.090      -3.535
 A             78.1875      0.878     89.055      0.000      74.410      81.965
 U:A          -24.8125      0.878    -28.261      0.001     -28.590     -21.035
 P            -10.8125      0.878    -12.315      0.007     -14.590      -7.035
 U:P            8.4375      0.878      9.610      0.011       4.660      12.215
 A:P          -25.8125      0.878    -29.400      0.001     -29.590     -22.035
 U:A:P        -20.5625      0.878    -23.420      0.002     -24.340     -16.785
 Y            -64.9375      0.878    -73.963      0.000     -68.715     -61.160
 U:Y            0.0625      0.878      0.071      0.950      -3.715       3.840
 A:Y          -47.4375      0.878    -54.031      0.000     -51.215     -43.660
 U:A:Y        -17.4375      0.878    -19.861      0.003     -21.215     -13.660
 P:Y           19.5625      0.878     22.281      0.002      15.785      23.340
 U:P:Y         13.3125      0.878     15.163      0.004       9.535      17.090
 A:P:Y         21.5625      0.878     24.559      0.002      17.785      25.340
 U:A:P:Y        8.3125      0.878      9.468      0.011       4.535      12.090
 I(U ** 2)      0.2552      0.552      0.462      0.689      -2.121       2.632
 I(A ** 2)      0.2552      0.552      0.462      0.689      -2.121       2.632
 I(P ** 2)      0.2552      0.552      0.462      0.689      -2.121       2.632
 I(Y ** 2)      0.2552      0.552      0.462      0.689      -2.121       2.632
 ==============================================================================
 Omnibus:                       13.909   Durbin-Watson:                   2.640
 Prob(Omnibus):                  0.001   Jarque-Bera (JB):               34.047
 Skew:                          -0.435   Prob(JB):                     4.04e-08
 Kurtosis:                       9.500   Cond. No.                     1.96e+30
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 [2] The input rank is higher than the number of observations.
 [3] The smallest eigenvalue is 2.1e-59. This might indicate that there are
 strong multicollinearity problems or that the design matrix is singular.
 """,
 'F': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                      F   R-squared:                       0.992
 Model:                            OLS   Adj. R-squared:                  0.954
 Method:                 Least Squares   F-statistic:                     25.70
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):             0.0107
 Time:                        19:16:43   Log-Likelihood:                -49.091
 No. Observations:                  19   AIC:                             130.2
 Df Residuals:                       3   BIC:                             145.3
 Df Model:                          15
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept     78.5789      1.851     42.463      0.000      72.690      84.468
 U             27.0625      2.017     13.420      0.001      20.645      33.480
 A             -8.4375      2.017     -4.184      0.025     -14.855      -2.020
 U:A           -8.0625      2.017     -3.998      0.028     -14.480      -1.645
 P              9.6875      2.017      4.804      0.017       3.270      16.105
 U:P           -2.9375      2.017     -1.457      0.241      -9.355       3.480
 A:P           -4.6875      2.017     -2.324      0.103     -11.105       1.730
 U:A:P          5.6875      2.017      2.820      0.067      -0.730      12.105
 Y             12.5625      2.017      6.230      0.008       6.145      18.980
 U:Y            4.6875      2.017      2.324      0.103      -1.730      11.105
 A:Y          -11.3125      2.017     -5.610      0.011     -17.730      -4.895
 U:A:Y         -8.1875      2.017     -4.060      0.027     -14.605      -1.770
 P:Y            4.8125      2.017      2.386      0.097      -1.605      11.230
 U:P:Y         -7.5625      2.017     -3.750      0.033     -13.980      -1.145
 A:P:Y         -9.3125      2.017     -4.618      0.019     -15.730      -2.895
 U:A:P:Y       -0.6875      2.017     -0.341      0.756      -7.105       5.730
 ==============================================================================
 Omnibus:                       14.090   Durbin-Watson:                   2.047
 Prob(Omnibus):                  0.001   Jarque-Bera (JB):               27.071
 Skew:                           0.726   Prob(JB):                     1.32e-06
 Kurtosis:                       8.664   Cond. No.                         1.09
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 """,
 'CM': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                     CM   R-squared:                       0.986
 Model:                            OLS   Adj. R-squared:                  0.917
 Method:                 Least Squares   F-statistic:                     14.24
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):             0.0250
 Time:                        19:16:43   Log-Likelihood:                 35.898
 No. Observations:                  19   AIC:                            -39.80
 Df Residuals:                       3   BIC:                            -24.69
 Df Model:                          15
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept      1.6718      0.021     79.166      0.000       1.605       1.739
 U              0.2881      0.023     12.520      0.001       0.215       0.361
 A             -0.0364      0.023     -1.581      0.212      -0.110       0.037
 U:A           -0.0932      0.023     -4.052      0.027      -0.166      -0.020
 P              0.0239      0.023      1.037      0.376      -0.049       0.097
 U:P            0.0328      0.023      1.423      0.250      -0.040       0.106
 A:P           -0.0610      0.023     -2.651      0.077      -0.134       0.012
 U:A:P         -0.0331      0.023     -1.439      0.246      -0.106       0.040
 Y              0.0271      0.023      1.179      0.323      -0.046       0.100
 U:Y            0.0240      0.023      1.043      0.374      -0.049       0.097
 A:Y            0.0028      0.023      0.119      0.912      -0.070       0.076
 U:A:Y         -0.0819      0.023     -3.558      0.038      -0.155      -0.009
 P:Y           -0.0210      0.023     -0.913      0.429      -0.094       0.052
 U:P:Y          0.0231      0.023      1.005      0.389      -0.050       0.096
 A:P:Y          0.0491      0.023      2.135      0.122      -0.024       0.122
 U:A:P:Y        0.0473      0.023      2.053      0.132      -0.026       0.120
 ==============================================================================
 Omnibus:                       19.927   Durbin-Watson:                   0.346
 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               20.834
 Skew:                           2.121   Prob(JB):                     2.99e-05
 Kurtosis:                       5.884   Cond. No.                         1.09
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 """,
 'B': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                      B   R-squared:                       1.000
 Model:                            OLS   Adj. R-squared:                  0.999
 Method:                 Least Squares   F-statistic:                     1867.
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):           1.80e-05
 Time:                        19:16:43   Log-Likelihood:                -30.403
 No. Observations:                  19   AIC:                             92.81
 Df Residuals:                       3   BIC:                             107.9
 Df Model:                          15
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept    223.5263      0.692    322.991      0.000     221.324     225.729
 U             -7.3125      0.754     -9.696      0.002      -9.713      -4.912
 A             78.1875      0.754    103.677      0.000      75.787      80.588
 U:A          -24.8125      0.754    -32.901      0.000     -27.213     -22.412
 P            -10.8125      0.754    -14.337      0.001     -13.213      -8.412
 U:P            8.4375      0.754     11.188      0.002       6.037      10.838
 A:P          -25.8125      0.754    -34.227      0.000     -28.213     -23.412
 U:A:P        -20.5625      0.754    -27.266      0.000     -22.963     -18.162
 Y            -64.9375      0.754    -86.107      0.000     -67.338     -62.537
 U:Y            0.0625      0.754      0.083      0.939      -2.338       2.463
 A:Y          -47.4375      0.754    -62.902      0.000     -49.838     -45.037
 U:A:Y        -17.4375      0.754    -23.122      0.000     -19.838     -15.037
 P:Y           19.5625      0.754     25.940      0.000      17.162      21.963
 U:P:Y         13.3125      0.754     17.652      0.000      10.912      15.713
 A:P:Y         21.5625      0.754     28.592      0.000      19.162      23.963
 U:A:P:Y        8.3125      0.754     11.022      0.002       5.912      10.713
 ==============================================================================
 Omnibus:                       31.293   Durbin-Watson:                   2.398
 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               77.186
 Skew:                          -2.374   Prob(JB):                     1.73e-17
 Kurtosis:                      11.658   Cond. No.                         1.09
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 """}

[12]:

from explann.plot import ParetoPlot

import matplotlib.pyplot as plt
fig, ax = plt.subplots(3,2, figsize=(9,10))
ax = ax.flatten()
pp_fm2b4 = ParetoPlot(fm2b4)

pp_fm2b4.plot(['Fso', 'F', 'CMso', 'CM' ,'Bso', 'B'], ax=ax)
plt.tight_layout()

../_images/notebooks_Experimental_Planning_Case_Study_36_0.png

[13]:

fm2b4.print_equation()

[13]:

{'Fso': 'F = 78.0000 + 27.0625 * U + 12.5625 * Y - 11.3125 * AY',
 'CMso': 'CM = 1.7537 + 0.2881 * U - 0.0364 * A - 0.0933 * UA + 0.0327 * UP - 0.0610 * AP - 0.0331 * UAP - 0.0819 * UAY + 0.0491 * APY + 0.0472 * UAPY - 0.0243 * I(U ** 2) - 0.0243 * I(A ** 2) - 0.0243 * I(P ** 2) - 0.0243 * I(Y ** 2)',
 'Bso': 'B = 222.6667 - 7.3125 * U + 78.1875 * A - 24.8125 * UA - 10.8125 * P + 8.4375 * UP - 25.8125 * AP - 20.5625 * UAP - 64.9375 * Y - 47.4375 * AY - 17.4375 * UAY + 19.5625 * PY + 13.3125 * UPY + 21.5625 * APY + 8.3125 * UAPY',
 'F': 'F = 78.5789 + 27.0625 * U - 8.4375 * A - 8.0625 * UA + 9.6875 * P + 12.5625 * Y - 11.3125 * AY - 8.1875 * UAY - 7.5625 * UPY - 9.3125 * APY',
 'CM': 'CM = 1.6718 + 0.2881 * U - 0.0932 * UA - 0.0819 * UAY',
 'B': 'B = 223.5263 - 7.3125 * U + 78.1875 * A - 24.8125 * UA - 10.8125 * P + 8.4375 * UP - 25.8125 * AP - 20.5625 * UAP - 64.9375 * Y - 47.4375 * AY - 17.4375 * UAY + 19.5625 * PY + 13.3125 * UPY + 21.5625 * APY + 8.3125 * UAPY'}

build_significant_models get the models with only significant terms withing a significance value $\alpha$.

[14]:

sig_fm2b4=fm2b4.build_significant_models(alpha=0.05)
sig_fm2b4.summary()

/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
/home/ppiper/micromamba/envs/explann/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=19
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "

[14]:

{'Fso': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                      F   R-squared:                       0.644
 Model:                            OLS   Adj. R-squared:                  0.573
 Method:                 Least Squares   F-statistic:                     9.063
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):            0.00115
 Time:                        19:16:45   Log-Likelihood:                -85.472
 No. Observations:                  19   AIC:                             178.9
 Df Residuals:                      15   BIC:                             182.7
 Df Model:                           3
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept     78.5789      5.616     13.993      0.000      66.609      90.549
 U             27.0625      6.120      4.422      0.000      14.019      40.106
 Y             12.5625      6.120      2.053      0.058      -0.481      25.606
 A:Y          -11.3125      6.120     -1.849      0.084     -24.356       1.731
 ==============================================================================
 Omnibus:                        3.592   Durbin-Watson:                   2.130
 Prob(Omnibus):                  0.166   Jarque-Bera (JB):                1.637
 Skew:                          -0.592   Prob(JB):                        0.441
 Kurtosis:                       3.815   Cond. No.                         1.09
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 """,
 'CMso': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                     CM   R-squared:                       0.974
 Model:                            OLS   Adj. R-squared:                  0.942
 Method:                 Least Squares   F-statistic:                     30.17
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):           2.87e-05
 Time:                        19:16:45   Log-Likelihood:                 29.977
 No. Observations:                  19   AIC:                            -37.95
 Df Residuals:                       8   BIC:                            -27.57
 Df Model:                          10
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept      1.7537      0.044     39.456      0.000       1.651       1.856
 U              0.2881      0.019     14.971      0.000       0.244       0.333
 A             -0.0364      0.019     -1.890      0.095      -0.081       0.008
 U:A           -0.0933      0.019     -4.845      0.001      -0.138      -0.049
 U:P            0.0328      0.019      1.702      0.127      -0.012       0.077
 A:P           -0.0610      0.019     -3.170      0.013      -0.105      -0.017
 U:A:P         -0.0331      0.019     -1.721      0.124      -0.078       0.011
 U:A:Y         -0.0819      0.019     -4.254      0.003      -0.126      -0.037
 A:P:Y          0.0491      0.019      2.553      0.034       0.005       0.094
 U:A:P:Y        0.0473      0.019      2.455      0.040       0.003       0.092
 I(U ** 2)     -0.0243      0.012     -2.006      0.080      -0.052       0.004
 I(A ** 2)     -0.0243      0.012     -2.006      0.080      -0.052       0.004
 I(P ** 2)     -0.0243      0.012     -2.006      0.080      -0.052       0.004
 I(Y ** 2)     -0.0243      0.012     -2.006      0.080      -0.052       0.004
 ==============================================================================
 Omnibus:                        2.073   Durbin-Watson:                   1.950
 Prob(Omnibus):                  0.355   Jarque-Bera (JB):                1.028
 Skew:                          -0.013   Prob(JB):                        0.598
 Kurtosis:                       1.861   Cond. No.                     2.87e+16
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 [2] The smallest eigenvalue is 9.82e-32. This might indicate that there are
 strong multicollinearity problems or that the design matrix is singular.
 """,
 'Bso': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                      B   R-squared:                       1.000
 Model:                            OLS   Adj. R-squared:                  1.000
 Method:                 Least Squares   F-statistic:                     2661.
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):           3.23e-07
 Time:                        19:16:45   Log-Likelihood:                -30.425
 No. Observations:                  19   AIC:                             90.85
 Df Residuals:                       4   BIC:                             105.0
 Df Model:                          14
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept    223.5263      0.600    372.531      0.000     221.860     225.192
 U             -7.3125      0.654    -11.184      0.000      -9.128      -5.497
 A             78.1875      0.654    119.579      0.000      76.372      80.003
 U:A          -24.8125      0.654    -37.948      0.000     -26.628     -22.997
 P            -10.8125      0.654    -16.536      0.000     -12.628      -8.997
 U:P            8.4375      0.654     12.904      0.000       6.622      10.253
 A:P          -25.8125      0.654    -39.477      0.000     -27.628     -23.997
 U:A:P        -20.5625      0.654    -31.448      0.000     -22.378     -18.747
 Y            -64.9375      0.654    -99.315      0.000     -66.753     -63.122
 A:Y          -47.4375      0.654    -72.550      0.000     -49.253     -45.622
 U:A:Y        -17.4375      0.654    -26.669      0.000     -19.253     -15.622
 P:Y           19.5625      0.654     29.919      0.000      17.747      21.378
 U:P:Y         13.3125      0.654     20.360      0.000      11.497      15.128
 A:P:Y         21.5625      0.654     32.977      0.000      19.747      23.378
 U:A:P:Y        8.3125      0.654     12.713      0.000       6.497      10.128
 ==============================================================================
 Omnibus:                       31.153   Durbin-Watson:                   2.404
 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               76.327
 Skew:                          -2.365   Prob(JB):                     2.67e-17
 Kurtosis:                      11.605   Cond. No.                         1.09
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 """,
 'F': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                      F   R-squared:                       0.924
 Model:                            OLS   Adj. R-squared:                  0.847
 Method:                 Least Squares   F-statistic:                     12.08
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):           0.000499
 Time:                        19:16:45   Log-Likelihood:                -70.868
 No. Observations:                  19   AIC:                             161.7
 Df Residuals:                       9   BIC:                             171.2
 Df Model:                           9
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept     78.5789      3.361     23.377      0.000      70.975      86.183
 U             27.0625      3.663      7.388      0.000      18.776      35.349
 A             -8.4375      3.663     -2.303      0.047     -16.724      -0.151
 U:A           -8.0625      3.663     -2.201      0.055     -16.349       0.224
 P              9.6875      3.663      2.645      0.027       1.401      17.974
 Y             12.5625      3.663      3.430      0.008       4.276      20.849
 A:Y          -11.3125      3.663     -3.088      0.013     -19.599      -3.026
 U:A:Y         -8.1875      3.663     -2.235      0.052     -16.474       0.099
 U:P:Y         -7.5625      3.663     -2.065      0.069     -15.849       0.724
 A:P:Y         -9.3125      3.663     -2.542      0.032     -17.599      -1.026
 ==============================================================================
 Omnibus:                        0.222   Durbin-Watson:                   1.718
 Prob(Omnibus):                  0.895   Jarque-Bera (JB):                0.374
 Skew:                          -0.200   Prob(JB):                        0.830
 Kurtosis:                       2.441   Cond. No.                         1.09
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 """,
 'CM': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                     CM   R-squared:                       0.858
 Model:                            OLS   Adj. R-squared:                  0.829
 Method:                 Least Squares   F-statistic:                     30.16
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):           1.34e-06
 Time:                        19:16:45   Log-Likelihood:                 13.772
 No. Observations:                  19   AIC:                            -19.54
 Df Residuals:                      15   BIC:                            -15.77
 Df Model:                           3
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept      1.6718      0.030     55.244      0.000       1.607       1.736
 U              0.2881      0.033      8.737      0.000       0.218       0.358
 U:A           -0.0933      0.033     -2.828      0.013      -0.164      -0.023
 U:A:Y         -0.0819      0.033     -2.483      0.025      -0.152      -0.012
 ==============================================================================
 Omnibus:                        3.181   Durbin-Watson:                   1.725
 Prob(Omnibus):                  0.204   Jarque-Bera (JB):                1.658
 Skew:                          -0.704   Prob(JB):                        0.437
 Kurtosis:                       3.336   Cond. No.                         1.09
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 """,
 'B': <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results
 ==============================================================================
 Dep. Variable:                      B   R-squared:                       1.000
 Model:                            OLS   Adj. R-squared:                  1.000
 Method:                 Least Squares   F-statistic:                     2661.
 Date:                Wed, 16 Aug 2023   Prob (F-statistic):           3.23e-07
 Time:                        19:16:45   Log-Likelihood:                -30.425
 No. Observations:                  19   AIC:                             90.85
 Df Residuals:                       4   BIC:                             105.0
 Df Model:                          14
 Covariance Type:            nonrobust
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 Intercept    223.5263      0.600    372.531      0.000     221.860     225.192
 U             -7.3125      0.654    -11.184      0.000      -9.128      -5.497
 A             78.1875      0.654    119.579      0.000      76.372      80.003
 U:A          -24.8125      0.654    -37.948      0.000     -26.628     -22.997
 P            -10.8125      0.654    -16.536      0.000     -12.628      -8.997
 U:P            8.4375      0.654     12.904      0.000       6.622      10.253
 A:P          -25.8125      0.654    -39.477      0.000     -27.628     -23.997
 U:A:P        -20.5625      0.654    -31.448      0.000     -22.378     -18.747
 Y            -64.9375      0.654    -99.315      0.000     -66.753     -63.122
 A:Y          -47.4375      0.654    -72.550      0.000     -49.253     -45.622
 U:A:Y        -17.4375      0.654    -26.669      0.000     -19.253     -15.622
 P:Y           19.5625      0.654     29.919      0.000      17.747      21.378
 U:P:Y         13.3125      0.654     20.360      0.000      11.497      15.128
 A:P:Y         21.5625      0.654     32.977      0.000      19.747      23.378
 U:A:P:Y        8.3125      0.654     12.713      0.000       6.497      10.128
 ==============================================================================
 Omnibus:                       31.153   Durbin-Watson:                   2.404
 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               76.327
 Skew:                          -2.365   Prob(JB):                     2.67e-17
 Kurtosis:                      11.605   Cond. No.                         1.09
 ==============================================================================

 Notes:
 [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 """}

[15]:

sig_fm2b4.anova('F')

[15]:

	df	sum_sq	mean_sq	F	PR(>F)
U	1.0	11718.062500	11718.062500	54.585296	0.000042
A	1.0	1139.062500	1139.062500	5.306002	0.046733
U:A	1.0	1040.062500	1040.062500	4.844838	0.055241
P	1.0	1501.562500	1501.562500	6.994606	0.026706
Y	1.0	2525.062500	2525.062500	11.762293	0.007513
A:Y	1.0	2047.562500	2047.562500	9.537994	0.012964
U:A:Y	1.0	1072.562500	1072.562500	4.996231	0.052249
U:P:Y	1.0	915.062500	915.062500	4.262561	0.068965
A:P:Y	1.0	1387.562500	1387.562500	6.463569	0.031588
Residual	9.0	1932.069079	214.674342	NaN	NaN

Factorial $2^3$


Levels for factorial $2^3$ experimental design.


Experimental results for factorial $2^3$

building a model with `explann`

[16]:

from explann.doe import TwoLevelFactorial
from explann.models import FactorialModel
from explann.dataio import ImportString, ImportXLSX
from explann.plot import ParetoPlot
import matplotlib.pyplot as plt


f2b3 = TwoLevelFactorial(
    variables = {
        'U': (0.3, 0.8),
        'A': (1.20, 4.00),
        'Y': (0.08, 0.25)
    },
    central_points=3
)

f2b3_results = ImportString(
    data=
    """F,CM,B
    158,3995,835
    202,4291,1540
    137,3642,1021
    240,4895,1702
    191,4311,1311
    221,4520,1717
    141,4726,1345
    175,5144,1728
    225,4715,1623
    234,4565,1687
    250,4743,1713
    """,
    delimiter=',')

f2b3.append_results(results=f2b3_results.data)

f2b3.save_excel('../../data/f2b3.xlsx')

f2b3_from_excel = ImportXLSX(
    path = '../../data/f2b3.xlsx',
    levels_sheet = 'levels'
)


fm_f2b3 = FactorialModel(
    data = f2b3_from_excel.data,
    functions = {
        "Fso"  : "F ~ U * A * Y + I(U**2) + I(A**2) + I(Y**2)",
        "CMso" : "CM ~ U * A * Y + I(U**2) + I(A**2) + I(Y**2)",
        "Bso"  : "B ~ U * A * Y + I(U**2) + I(A**2)  + I(Y**2)",
        "F"  : "F ~ U * A * Y",
        "CM" : "CM ~ U * A * Y",
        "B"  : "B ~ U * A * Y"}
)

pp_f2b3 = ParetoPlot(fm_f2b3)

fig, ax = plt.subplots(3,2, figsize=(9,10))

pp_fm2b4.plot(['Fso', 'F', 'CMso', 'CM' ,'Bso', 'B'], ax=ax)
plt.tight_layout()

../_images/notebooks_Experimental_Planning_Case_Study_44_0.png

Central Composite Design


Levels for central composite experimental design.


Experimental results for ccd.

[146]:

from explann.doe import CentralCompositeDesign
from explann.models import FactorialModel
from explann.dataio import ImportString, ImportXLSX
from explann.plot import ParetoPlot
import matplotlib.pyplot as plt


ccd = CentralCompositeDesign(
    variables = {
        'U': (0.07, 2.33),
        'Y': (0.00, 0.59)
    },
    center=(0,3)
)

ccd_results = ImportString(
    data=
    """F,CM,B
    158,4029,727
    171,4354,1119
    166,5302,1080
    244,5513,1101
    208,5460,1085
    255,7105,1435
    148,4481,743
    263,6529,1213
    250,5364,1390
    269,5524,1499
    261,5793,1420
    """,
    delimiter=',')

ccd.append_results(results=ccd_results.data)

ccd.save_excel('../../data/ccd.xlsx')

ccd_from_excel = ImportXLSX(
    path = '../../data/ccd.xlsx',
    levels_sheet = 'levels'
)


fm_ccd = FactorialModel(
    data = ccd_from_excel.data,
    functions = {
        #"Fso"  : "F ~ U * Y + I(U**2) + I(Y**2)",
        #"CMso" : "CM ~ U *  Y + I(U**2) + I(Y**2)",
        #"Bso"  : "B ~ U * Y + I(U**2) + I(Y**2)",
        "Fso"  : "F ~ U * Y + np.power(U,2) + np.power(Y,2)",
        "CMso" : "CM ~ U *  Y + np.power(U,2) + np.power(Y,2)",
        "Bso"  : "B ~ U * Y + np.power(U,2) + np.power(Y,2)",
        "F"  : "F ~ U * Y",
        "CM" : "CM ~ U * Y",
        "B"  : "B ~ U * Y"},
    levels = ccd_from_excel.levels,
)

pp_ccd = ParetoPlot(fm_ccd)

fig, ax = plt.subplots(3,2, figsize=(9,10))

pp_ccd.plot(['Fso', 'F', 'CMso', 'CM' ,'Bso', 'B'], ax=ax)
plt.tight_layout()

../_images/notebooks_Experimental_Planning_Case_Study_47_0.png

[147]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import cm # for a scatter plot
from mpl_toolkits.mplot3d import Axes3D

def plot_surface(x, y, z, model, n_pts=10, other_params={}, labels:dict=None, ax=None, cmap='viridis', scaled=False):
    name_z = z #'Fso'
    name_x = x #'U'
    name_y = y #'Y'
    n_pts = 10
    if not scaled:
        X = np.linspace( model.data[name_x].min(), model.data[name_x].max(), n_pts)
        Y = np.linspace( model.data[name_y].min(), model.data[name_y].max(), n_pts)
    else:
        X = np.linspace( model.levels[name_x].min(), model.levels[name_x].max(), n_pts)
        Y = np.linspace( model.levels[name_y].min(), model.levels[name_y].max(), n_pts)

    x, y = np.meshgrid(X, Y)

    try:
        variables = pd.DataFrame({name_x:x.ravel(), name_y:y.ravel(), **other_params})
    except:
        variables = pd.DataFrame([{name_x:x.ravel(), name_y:y.ravel(), **other_params}])

    if not scaled:
        z = model.predict(name_z, variables).values.reshape(x.shape)
    else:
        z = model.predict_rescaled(name_z, variables).values.reshape(x.shape)

    if ax is None:
        fig, ax = plt.subplots(figsize=(6,6),subplot_kw={"projection": "3d"})
    else:
        fig = plt.gcf()
    ax.set_box_aspect(None, zoom=0.8)

    pax = ax.plot_surface(x, y, z, cmap=cmap, edgecolor='black', linewidth=0.5, alpha=0.6, antialiased=True)
    #ax.contour(x, y, z, cmap=cmap, linestyles='solid', alpha=1)
    ax.contourf(x, y, z, zdir='z', offset=ax.get_zlim()[0], cmap=cmap, alpha=0.5,  antialiased=True)
    ax.contourf(x, y, z, zdir='x', offset=ax.get_xlim()[0], cmap=cmap, alpha=0.5,  antialiased=True)
    ax.contourf(x, y, z, zdir='y', offset=ax.get_ylim()[1], cmap=cmap, alpha=0.5,  antialiased=True)

    if labels is not None:
        ax.set(**labels)
        ax.zaxis.set_rotate_label(True)
        ax.xaxis.set_rotate_label(True)
        ax.yaxis.set_rotate_label(True)

    try:
        if 'zlabel' in labels:
            zlabel=labels['zlabel']
    except:
        zlabel=None


    fig.colorbar(pax, ax=ax, location='top', fraction=0.04, pad=-0.05, label=zlabel)

[148]:

fig, ax = plt.subplots(2,3, figsize=(15,9),subplot_kw={"projection": "3d"})
ax = ax.flatten()

for i,(key,val) in enumerate(fm_ccd.functions.items()):
    plot_surface(x='Y', y='U', z=key,
         model=fm_ccd,
         n_pts=10,
         ax=ax[i],
         other_params=dict(),
         labels=dict(xlabel='Y', ylabel='U', zlabel=key), cmap='inferno', scaled=True)

../_images/notebooks_Experimental_Planning_Case_Study_49_0.png

Optimization Using Desirabilty Function

A desirability function $D$ is used to optimize the two independe variables at same time.

\[D = \left(\sum{d_{i=1}^N}\right)^{(1/N)}\]

The function $d_i$ was defined as

[285]:

@latexify.with_latex
def d_i(Y, Ymin=Ymin, Ymax=Ymax, r=1):
    if Y <= Ymin:
        return 0
    elif Y > Ymin and Y < Ymax:
        return ((Y-Ymin)/(Ymax-Ymin))**r
    else:
        return 1

d_i

[285]:

$$ \displaystyle \mathrm{d_i}(Y, Ymin, Ymax, r) = \left\{ \begin{array}{ll} {0}, & \mathrm{if} \ {Y \le Ymin} \\ \left( \frac{Y - Ymin}{Ymax - Ymin} \right)^{r}, & \mathrm{if} \ {{Y > Ymin} \land {Y < Ymax}} \\ {1}, & \mathrm{otherwise} \end{array} \right. $$

[269]:

from scipy.optimize import minimize
import latexify

responses = ['Fso','CMso','Bso']
model = fm_ccd
Ymax = 7000
Ymin = 200


def D(independent_vars):
    d_list = []
    for Yi in responses:
        d_list.append( d_i(model[Yi].predict(dict(U=independent_vars[0],Y=independent_vars[1]))[0]) )
    d_array = np.array(d_list)
    n = len(d_array)
    return 1-(np.prod(d_array)**(1/n))


from scipy import optimize as opt

x0 = (0, 0)
bounds = [(-1.41,1.41),(-1.41,1.41)]

optimum = minimize(D, x0, method='SLSQP', bounds=bounds)
optimum

[269]:

 message: Optimization terminated successfully
 success: True
  status: 0
     fun: 0.8802039525109903
       x: [ 6.344e-01  4.816e-01]
     nit: 10
     jac: [ 1.291e-04  2.973e-05]
    nfev: 30
    njev: 10

Anova table and lack of fit

To check model validity an ANOVA test and lack of fit can be used.

[270]:

fm_ccd.anova('Fso')

[270]:

	df	sum_sq	mean_sq	F	PR(>F)
U	1.0	3099.522852	3099.522852	4.600914	0.084782
Y	1.0	7419.724833	7419.724833	11.013797	0.021038
U:Y	1.0	1056.250000	1056.250000	1.567891	0.265894
np.power(U, 2)	1.0	918.771390	918.771390	1.363819	0.295530
np.power(Y, 2)	1.0	7192.080882	7192.080882	10.675884	0.022263
Residual	5.0	3368.377315	673.675463	NaN	NaN

[271]:

fm_ccd.lack_of_fit('Fso', alpha=0.05)

[271]:

	Source_of_Variation	df	sum_sq	mean_sq	F	F_table	p
0	Regression	3.0	11575.497685	3858.499228	2.352901	4.346831	0.158352
1	Residual	7.0	11479.229588	1639.889941	NaN	NaN	NaN
2	Lack_of_Fit	5.0	11297.229588	2259.445918	24.829076	19.296410	0.039167
3	Pure_Error	2.0	182.000000	91.000000	NaN	NaN	NaN
4	Total	10.0	23054.727273	NaN	NaN	NaN	NaN

Optmization Results

In the original papaer authors have found a desirability function of 0.87 and an optimum point at

\[U=0.97\; g/L\]

\[Y=0.36\; g/L\]

The corresponding cellulase activities of

\[F = 254\; U/L\]

\[C = 6383\; U/L\]

\[B = 1448\; U/L\]

Totaling a production of 8085 U/L. In this work the desirabilty function was 0.88 and the optium point found at

\[U=1.71\; g/L\]

\[Y=0.40\; g/L\]

This corresponts to a total cellulase activity of 8106 U/L, from which partial values for each metric are

\[F = 245\; U/L\]

\[C = 6623\; U/L\]

\[B = 1238\; U/L\]

[280]:

fm_ccd.decode_variables(variables=dict(U=optimum.x[0], Y=optimum.x[1]))

[280]:

{'U': 1.7069155936557219, 'Y': 0.39546634255213986}

[281]:

F=fm_ccd.predict(function='Fso', variables=fm_ccd.decode_variables(variables=dict(U=optimum.x[0], Y=optimum.x[1])))
F

[281]:

0    244.92823
dtype: float64

[282]:

C=fm_ccd.predict(function='CMso', variables=fm_ccd.decode_variables(variables=dict(U=optimum.x[0], Y=optimum.x[1])))
C

[282]:

0    6623.108285
dtype: float64

[283]:

B=fm_ccd.predict(function='Bso', variables=fm_ccd.decode_variables(variables=dict(U=optimum.x[0], Y=optimum.x[1])))
B

[283]:

0    1238.00179
dtype: float64

Case Study

The Chosen Article

Experimental Methods

Sequential Experimenta Design

Factorial \(2^4\)

`explann` hands-on on \(2^4\) design

Factorial \(2^3\)

building a model with `explann`

Central Composite Design

Optimization Using Desirabilty Function

Anova table and lack of fit

Optmization Results

Case Study

The Chosen Article

Experimental Methods

Sequential Experimenta Design

Factorial \(2^4\)

explann hands-on on \(2^4\) design

Factorial \(2^3\)

building a model with explann

Central Composite Design

Optimization Using Desirabilty Function

Anova table and lack of fit

Optmization Results

`explann` hands-on on \(2^4\) design

building a model with `explann`