What are mixture experiments?

Design of experiments (DOE) provides many tools for studying physical systems. All share the goal of efficiently identifying how specific variables (factors) control the properties (responses) of a system. Factorial designs and fractional factorial designs are two well-known approaches, but they assume that factors can be varied independently. In many chemical systems, this assumption fails.

Mixtures are one such case. The response depends on the relative proportions of the ingredients, not on their absolute quantities. This constraint applies to pharmaceutical formulations, glass development, polymer blends, food products — any system where the components' proportions must sum to 1. Mixture designs are built specifically for these systems.

A simplex example

The following example, adapted from John Cornell’s “A Primer on Experiments with Mixtures,” shows what a simple mixture design and analysis looks like. Suppose you produce a polymer-based yarn used for draperies and want to improve its toughness.

Your current formulation is a blend of polyethylene (PE) and polypropylene (PP). A colleague suggests that substituting polypropylene with polystyrene (PS) might help. You want to know if and how this substitution affects the yarn’s mechanical properties, and you want an efficient way to find out.

Planning and designing the experiments

First, you need to choose a property that you can measure — your response variable yy. You decide to take a closer look at the elongation of fibers produced with different polymer blends, which measures how much you can lengthen your fibers before they break.1

Next, you need to plan your experiments (in DOE terminology: you need to plan your experimental runs). How are you going to vary the proportions of PE, PP, and PS? You need to ensure that you cover the experimental region while minimizing the number of experiments that you’ll run.

Mixture designs are built for situations where the components must sum to a constant — typically 100% or 1. The experimental space is represented using simplex-based geometries (triangular for 3 components, tetrahedral for 4, etc.) and each experimental run is optimally placed inside the simplex, ensuring that the compositional space is properly explored.

The simplest mixture design is the Simplex Lattice Design (SLD). In this type of design, each factor (in this case, PE, PP, and PS) can take m+1m+1 values (0,1/m,2/m,,m/m0, 1/m, 2/m, \dots, m/m) where mm represents the number of levels. A level is the specific value that a factor takes in a given run. The value of mm is also important for later modeling, because it defines the order of the polynomial that can be fitted to the data: m=1m=1 means you’ll only be able to fit first order polynomials, m=2m=2 second order, m=3m=3 third order, and so on.

You decide to keep it simple and explore only pure or binary blend compositions, meaning that each factor is going to take two levels (m=2m=2). This setup includes three points for each component: 1 for a single-component, 0.5 for the binary blend, and 0 for blends that do not include that component. This allows you to fit a second-order polynomial and see how pairs of components interact.

Calculating the proportions and plotting them, the design looks like this:

SLD design
Simplex Lattice Design (SLD) for a three-component system. The triangle shows the entire compositional space defined by the three components (polyethylene, polypropylene, and polystyrene). The red dots indicate the experimental runs required to map the entire space.

Analyzing the results

You prepare the 6 polymer blends, spin them into yarns, and measure their elongation. Using shorthand notation — x1x_{1} for polyethylene, x2x_{2} for polystyrene, x3x_{3} for polypropylene — and averaging three replicates per run, the results are:

Design pointx1x_{1}x2x_{2}x3x_{3}Average elongation
110011.7
21/21/2015.3
30109.4
401/21/210.5
500116.4
61/201/216.9

At this point, you’ll want to fit this data into a model that can be used to extrapolate the behavior of the system. There is one problem, though. Consider a standard polynomial with an intercept β0\beta_{0}:

y=β0+β1x1+β2x2+β3x3+ y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \beta_{3}x_{3} + \dots

Where β1\beta_{1}, β2\beta_{2}, and β3\beta_{3} are the regression coefficients of each component. In mixture space, when all components approach zero we have x10x_{1} \rightarrow 0, x20x_{2} \rightarrow 0, x30x_{3} \rightarrow 0. However, this violates the mixture constraint, since the sum of all components ixi\sum_{i} x_{i} must equal 1. There is no point in mixture space where all xi=0x_{i} = 0, so the intercept β0\beta_{0} has no physical meaning.2

The solution is to use Scheffé polynomials, special polynomial forms that respect the mixture constraint. For a ternary mixture system, the second-order Scheffé polynomial is:

y=β1x1+β2x2+β3x3+β12x1x2+β13x1x3+β23x2x3 y = \beta_{1}x_{1} + \beta_{2}x_{2} + \beta_{3}x_{3} + \beta_{12}x_{1}x_{2} + \beta_{13}x_{1}x_{3} + \beta_{23}x_{2}x_{3}

Here, β1\beta_{1}, β2\beta_{2}, and β3\beta_{3} represent the expected response when component 1, 2, and 3 respectively comprise 100% of the mixture, whereas the terms β12\beta_{12}, β13\beta_{13} and β23\beta_{23} represent binary interactions. These latter terms have physical meaning and represent synergistic or antagonistic effects between components.

The fitted model becomes:

y=11.7x1+9.4x2+16.4x3+19.0x1x2+11.4x1x39.6x2x3 y = 11.7 \cdot x_{1} + 9.4 \cdot x_{2} + 16.4 \cdot x_{3} + 19.0 \cdot x_{1}x_{2} + 11.4 \cdot x_{1}x_{3} -9.6 \cdot x_{2}x_{3}

With six experiments, you now have a model that describes the elongation for any binary blend of the three polymers. Plotting the model over the simplex shows the full picture:

Polymer response surface
Response surface modeled on the collected data

This type of visualization is called a response surface. It shows how a response (the elongation) changes across the entire compositional space. Each point within the triangle represents a unique three-component composition, while the distances from the three sides correspond to the proportions of each component. The contour lines show predicted elongation values, revealing trends that would not be obvious from the raw data alone.

The results show that your colleague’s intuition was partially right. The PE-PS combination has a strong synergistic effect: the binary blend achieves 15.3 elongation, 45% higher than the simple average of the pure components (10.55). But in absolute terms, the PE-PP blend still wins. The highest elongation values (16-17 range) occur along the PE-PP edge, with the maximum near the 50:50 PE:PP blend (16.9 from design point 6). PS-PP combinations show antagonistic behavior, particularly in the lower-right region of the triangle.

The model provides clear guidance for optimization. If maximum elongation is the primary goal, aim for compositions between 40-60% PE and 40-60% PP with minimal PS content. If other considerations require PS, keep it below 20% while maintaining high PE content to take advantage of the PE-PS synergy. The relatively flat response surface around the PE-PP optimum also suggests these formulations will be robust to minor mixing variations during production.

Where to learn more

This example covered the simplest case. Several important topics were left out:

The following resources cover these and more.

Note

I am not affiliated with any of the following publishers or software companies.

Books

Software

There are two main options for running mixture analyses, each with different tradeoffs between ease-of-use and cost:

  1. A proprietary DOE program with a graphical interface.
  2. R or Python with dedicated libraries.

Commercial solutions

Several GUI-based DOE platforms support mixture designs:

These are GUI-based statistical platforms with built-in DOE functionality. Each has a learning curve, but they are generally straightforward to use once set up.

I avoid this kind of software for two reasons:

  1. They’re expensive. For example, at the time of writing a single one-year license subscription for Design-Expert costs 1140 USD, while a single one-year license subscription for Minitab is priced at 1851 USD.
  2. They provide black-box solutions. Being closed-source means you can’t inspect what a certain function in the program actually does. This might not be a concern for you now, but it might be in the future when you need to understand exactly how a result was obtained.

Free coding alternatives

Everything these programs do can be replicated in code, with greater flexibility and full reproducibility. For mixture designs, the two main options are R (R Project ) and Python (Python.org ).

R was built for statistics; Python is a general-purpose language that has become a standard for data analysis. The tradeoff is straightforward: full control over the analysis, at the cost of learning to program. Both provide dedicated libraries that cover most of what commercial software offers.

The polymer yarn example used six runs to map an entire ternary compositional space. Traditional one-factor-at-a-time (OFAT) approaches would need far more experiments and still miss the interaction effects that drove the key findings.


  1. This is where domain knowledge matters. Measuring the fibers’ color would tell you nothing about toughness. The choice looks obvious here, but picking the right response becomes non-trivial for more complex or completely new systems. ↩︎

  2. The mixture constraint creates perfect multicollinearity. If you know the values of q1q-1 components in a qq-component mixture, the last component is completely determined: xq=1i=1q1xix_{q} = 1 - \sum_{i=1}^{q-1} x_{i}. This means that the design matrix is singular, and therefore the coefficients cannot be estimated through standard least squares. ↩︎