What are mixture experiments?

Design of experiments (DOE) gives us many tools to study physical systems. All these tools share the goal of efficiently identifying and characterizing how specific variables (also known as factors) control the properties (also known as responses) of a system. Factorial designs and fractional factorial designs are two well-known examples of experimental approaches, but many chemical systems don’t play well with these strategies.

Mixtures present one such situation. In these systems, the response depends on the relative proportions of the ingredients, not on their absolute quantities. This constraint applies to many scenarios from pharmaceutical formulations to glass development — basically any system where the components’ proportions must sum to 1.

For these systems, we can use mixture designs. These designs are less discussed than conventional factorial methods, but they can be very effective for studying many chemical systems.

A simplex example

The following example from “A Primer on Experiments with Mixtures” by John Cornell shows what a simple mixture design and analysis looks like. Suppose you’re in the textile business and one of your products is a polymer-based yarn. This type of yarn is used to make draperies, and you want to develop a new formulation that improves their toughness.

So far, you have always produced your yarns using only one type of polymer blend. The blend is a mixture of polyethylene (PE) and polypropylene (PP), but one of your colleagues suggests that substituting polypropylene with polystyrene (PS) might improve toughness. Clearly, you want to know if and how this substitution affects the yarn’s mechanical properties, but what would be an efficient way to get a clear understanding of this effect?

Planning and designing the experiments

First, you need to choose a property that you can measure — your response variable \(y\). You decide to take a closer look at the elongation of fibers produced with different polymer blends, which measures how much you can lengthen your fibers before they break.1

Next, you need to plan your experiments (in DOE terminology: you need to plan your experimental runs). How are you going to vary the proportions of PE, PP, and PS? You need to ensure that you cover the experimental region while minimizing the number of experiments that you’ll run.

Here’s where mixture designs come into play. These designs are specifically developed for situations where the components must sum to a constant — typically 100% or 1. The experimental space is represented using simplex-based geometries (triangular for 3 components, tetrahedral for 4, etc.) and each experimental run is optimally placed inside the simplex, ensuring that the compositional space is properly explored.

The simplest mixture design is the Simplex Lattice Design (SLD). In this type of design, each factor (in this case, PE, PP, and PS) can take \(m+1\) values (\(0, 1/m, 2/m, \dots, m/m\)) where \(m\) represents the number of levels. A level is nothing more than the specific value that a factor is going to take in a given run. The value of \(m\) is also important for later modeling, because it defines the order of the polynomial that can be fitted to the data: \(m=1\) means you’ll only be able to fit first order polynomials, \(m=2\) second order, \(m=3\) third order, and so on.

You decide to keep it simple and explore only pure or binary blend compositions, meaning that each factor is going to take two levels (\(m=2\)). This setup includes three points for each component: 1 for a single-component, 0.5 for the binary blend, and 0 for blends that do not include that component. This approach will allow you to fit a second order polynomial and get an idea of how two components interact with each other.

After calculating the relative proportions of each component and plotting them, your experimental design looks like the following:

SLD design
Simplex Lattice Design (SLD) for a three-component system. The triangle shows the entire compositional space defined by the three components (polyethylene, polypropylene, and polystyrene). The red dots indicate the experimental runs required to map the entire space.

Analyzing the results

Time to execute. You go to the lab, prepare your 6 polymer blends, spin them into yarns and measure their elongation. Writing out the name of each component quickly gets boring, so you introduce some notation: \(x_{1}\) for polyethylene, \(x_{2}\) for polystyrene, and \(x_{3}\) for polypropylene. You repeat each run three times and average the measured elongation, obtaining the following results:

Design point\(x_{1}\)\(x_{2}\)\(x_{3}\)Average elongation
110011.7
21/21/2015.3
30109.4
401/21/210.5
500116.4
61/201/216.9

At this point, you’ll want to fit this data into a model that can be used to extrapolate the behavior of the system. There is one, problem though. Consider a standard polynomial with an intercept \(\beta_{0}\):

$$ y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \beta_{3}x_{3} + \dots $$

Where \(\beta_{1}\), \(\beta_{2}\), and \(\beta_{3}\) are the regression coefficients of each component. In mixture space, when all components approach zero we have \(x_{1} \rightarrow 0\), \(x_{2} \rightarrow 0\), \(x_{3} \rightarrow 0\). However, this violates the mixture constraint, since the sum of all components \(\sum_{i} x_{i}\) must equal 1. There’s literally no point in mixture space where all \(x_{i} = 0\), so the intercept \(\beta_{0}\) has no physical meaning.2

To solve this problem, you are going to use Scheffé polynomials. These are special polynomial forms that respect the mixture constraint. For a ternary mixture system, the second-order Scheffé polynomial is:

$$ y = \beta_{1}x_{1} + \beta_{2}x_{2} + \beta_{3}x_{3} + \beta_{12}x_{1}x_{2} + \beta_{13}x_{1}x_{3} + \beta_{23}x_{2}x_{3} $$

Here, \(\beta_{1}\), \(\beta_{2}\), and \(\beta_{3}\) represent the expected response when component 1, 2, and 3 respectively comprise 100% of the mixture, whereas the terms \(\beta_{12}\), \(\beta_{13}\) and \(\beta_{23}\) represent binary interactions. These latter terms have physical meaning and represent synergistic or antagonistic effects between components.

The linear model becomes:

$$ y = 11.7 \cdot x_{1} + 9.4 \cdot x_{2} + 16.4 \cdot x_{3} + 19.0 \cdot x_{1}x_{2} + 11.4 \cdot x_{1}x_{3} -9.6 \cdot x_{2}x_{3} $$

This is already a great result, because with just six experiments you obtained a model that describes the elongation for any binary blend obtained by mixing the three polymers. But there’s more. You can go one step further and plot the model over the simplex, like so:

Polymer response surface
Response surface modeled on the collected data

This type of visualization is called response surface, and it allows to easily understand how a response (the elongation) changes throughout the entire compositional space. Each point within the triangle represents a unique three-component composition, while the distances from the three sides correspond to the proportions of each component. The contour lines show predicted elongation values, highlighting trends that wouldn’t be obvious from the raw experimental data.

You can see that your colleague was absolutely correct about the benefits of substituting polypropylene with polystyrene. However, this is true only under specific conditions. The highest elongation values (16-17 range) occur along the PE-PP edge, with the maximum near the 50:50 PE:PP blend (16.9 elongation from design point 6).

The PE-PS combination also shows strong synergistic effects: the binary blend achieves 15.3 elongation, which is 45% higher than the simple average of the pure components (10.55). However, PS-PP combinations show antagonistic behavior, particularly in the lower-right region of the triangle.

What’s important is that this model provides clear guidance for optimizing your yarn properties. If maximum elongation is your primary goal, aim for compositions between 40-60% PE and 40-60% PP with minimal PS content. If other considerations require PS incorporation, keep it below 20% while maintaining high PE content to leverage the beneficial PE-PS interactions. The relatively flat response surface around the PE-PP optimum also suggests these formulations will be more robust to minor mixing variations during production.

Where to learn more

The previous example barely scratched the surface of everything there is to know about mixture designs and models. For example, some very important topics that I haven’t considered here are:

If you’re interested in learning more about this topic and designing your own mixture experiments, below you’ll find some resources to get you started.

Note

I am not affiliated with any of the following publishers or software companies.

Books

Software

Let’s say you are ready to crunch some numbers. Where do you start? You have two main options, each with different tradeoffs between ease-of-use and cost:

  1. Buy a license for a proprietary DOE program.
  2. Use R or Python and do some programming.

Commercial solutions

If you’re looking for an all-in-one solution with a graphical interface, you have several options:

If you’ve never used any of these programs, think of them like Microsoft Excel on steroids. Like with everything else, there’s going to be a learning curve to overcome with each of them. However, they are typically straightforward to use.

I am not a fan of using this kind of software for two fundamental reasons:

  1. They’re expensive. For example, at the time of writing a single one-year license subscription for Design-Expert costs 1140 USD, while a single one-year license subscription for Minitab is priced at 1851 USD.
  2. They provide black-box solutions. Being closed-source means you can’t inspect what a certain function in the program actually does. This might not be a concern for you now, but it might be in the future when you need to understand exactly how a result was obtained.

Free coding alternatives

All functionality provided by paid software can be replicated using programming languages. Coding your own analyses gives you greater flexibility and provides better reproducibility, since you’re not locked into using proprietary software. When it comes to mixture designs, there are two main options that I am aware of: R (see R Project ) and Python (see Python.org ).

Both R and Python are free, open-source, and versatile programming and scripting languages. R was developed specifically for statistics and data analysis, while Python is a general-purpose language that has become the de facto standard for data analysis, thanks to its ease of use and relatively simple syntax.

The upside of using one of these languages for your experimental design and analysis is that you are in full control over what you do and how. The downside is that you need to learn how to program, which takes time and for many people might not be that straightforward to do (hence why there are paid programs available).

However, you don’t actually need much to get started. Both programming languages provide dedicated libraries that allow you to do a lot (if not more) of what is possible in specialized software, from generating experimental designs to analyzing results.

Conclusions

Mixture experiments provide a powerful framework for optimizing formulations where component proportions are the key variables. The Scheffé polynomials and simplex designs might initially seem complex, but the practical benefits are clear: fewer experiments, better understanding of component interactions, and clear direction for further development.

The key is to start experimenting with these methods on your own systems. The polymer yarn example shows that even a simple six-run design can give insights that would be difficult to obtain through traditional one-factor-at-a-time (OFAT) approaches.

Remember that the choice of response variables and experimental constraints will largely determine the success of your optimization efforts. Domain knowledge remains central for interpreting results and translating statistical models into practical guidelines.


  1. This is where domain knowledge becomes crucial. It would make very little sense to measure the fibers’ color, when all you really care about is how tough they are. The choice might look obvious in this example, but picking the right response quickly becomes non-trivial when you need to study more complex (or completely new) systems. ↩︎

  2. The mixture constraint creates perfect multicollinearity. If you know the values of \(q-1\) components in a \(q\)-component mixture, the last component is completely determined: \(x_{q} = 1 - \sum_{i=1}^{q} x_{i}\). This means that the design matrix is singular, and therefore the coefficients cannot be estimated through standard least squares. ↩︎