Introduction
Invention and Innovation
‘Invention’ is the generation of new knowledge. ‘Innovation’ is the application of new knowledge for benefit. Alternatively, invention turns money into knowledge, and innovation turns knowledge into money. This paper is about innovation in mineral processing , and in particular how to make judgements about the success or otherwise of incremental innovation.
Paradigm change—this alters the landscape to such an extent that all mining projects affected by the change increase significantly in value, and some previously unviable projects become viable. Paradigm change is rare and high-risk, and is sometimes driven by crisis. Examples include the invention of flotation in the early 20th century, and the development of CIP and CIL processing for gold in the mid- to late-20th century.
Inventive change—invention of new devices and systems leads to improved processing performance in specific applications on a moderate scale. Such developments have innovation time constants typically around 15–20 years, require substantial investment over long periods, and are also high-risk. Impacts are lower than that of paradigm change but the risks are similar. Examples include the HPGR, stirred mills such as the IsaMill, the Jameson flotation cell, the Reflux classifier, the MLA and QEMSCAN automated mineralogy systems, and the JKMRC process simulators JKSimMet and JKSimFloat, which were preceded by the unit process simulation packages such as Modsim and Microsim at the University of the Witwatersrand.
Incremental change—this uses existing systems and technologies and finds ways of making them work better. It is the commonest form of innovation, is low-risk, costs the least, and generates small to modest returns. Time constants are months to small numbers of years. Examples include different flotation reagents or SAG mill liner profiles, and implementing optimising process control .
All these forms of innovation operate all the time but over greatly differing time scales, investment levels and effectiveness of outcome. Mining companies have historically struggled with how to encourage, manage and even recognise these different forms of innovation. One large mining company some years ago publically eschewed incremental change in favour of a policy of major investment in paradigm change, a policy subsequently reversed after substantial expenditure with small return.
If no formal attempt is made to improve operations incrementally, then almost certainly process performance will decline due to changes in ore type, equipment condition and many other factors. It takes effort just to stand still.
The cost of incremental improvement is low (and rarely capital-intensive) and the probability of success is high, unlike other forms of innovation.
Focus on Incremental Gains
The mining and metallurgical processing industry is capital-intensive, and is associated with lengthy project schedules in the development of an ore resource, together with an appropriate mineral processing flowsheet to effectively treat the ore across its life of mine. The delivery of optimum metallurgical performance, or “entitlement”, for this resource amounts to asset management as a business, or a going concern. Once the operations have been successfully commissioned—a much anticipated event—there is invariably a programme at site to improve the metallurgical process in incremental change, as described above, by way of different reagents, flotation circuit layouts, grinding strategies etc. [6].) These address small, or marginal, gains in concentrate grade and recovery which of themselves potentially amount to significant financial gains in the asset management , because these gains are made off marginal, not total, operating costs.
There are two parts to this strategy of pursuing performance entitlement. The first is the formulation of improved treatments at laboratory scale, for example by formulating a mixed collector suite for improved selectivity , concentrate grade and paymetals recovery (for example, [3, 7]. The second is the successful demonstration of these gains at operating, or plant, scale (for example, [9]. In this paper, the second part, i.e. design and execution of plant trials for the demonstration of small performance gains, especially the setting of the hurdle rate, or confidence level, is discussed.
This ongoing improvement programme becomes easily challenged by the variation, or variance, in the plant data from a standard operating platform. This reality clouds the issue of pursuing small performance gains, since many fear that the small difference will be obscured by the noise in the “before’ and “after” data. Fortunately, the field of statistics was developed for this very reason, and by suitable test design at both laboratory and plant scales, together with correct data interpretation, clear proof of small but real differences in performance can be made. Within this scope lies the all-important question of setting the hurdle rate, or significance level, at which the decision should be made.
Measuring the Differences
In all innovation projects, whether incremental or otherwise, points will be reached in the innovation timeline where a decision is required as to whether to permanently implement a change. The decision is usually based on data collected in laboratory experiments or plant trials whose purpose is to assess whether the change in beneficial, and if so the quantum of the performance improvement realised. A cost-benefit analysis against company investment criteria will then determine whether the change should be implemented. Here we will assume that a plant trial of a new condition is to be conducted, though the principles we discuss would be equally applicable to laboratory experiments or pilot plant testing . A typical example would be the test of a new flotation reagent, with its performance being compared against that of the current reagent.
It is critically important to design and conduct these experiments according to proper statistical criteria to ensure that the correct decision is reached as soon as possible and with the minimum amount of resources expended. The correct protocols for such experiments have been described elsewhere [4, 5, 9–12]. They are based on the powerful concept of the hypothesis test which provides a formal context for quantifying the risk of making the wrong decision. In such tests the number of repeats (repeated pairs, in the case of paired experiments) is key to a successful outcome. If the right number of repeats is conducted, calculated from a simple formula, this will protect against two risks: incorrectly concluding that a process improvement has occurred when it has not, and incorrectly concluding that no improvement has occurred when in fact it has. The first mistake (called by statisticians a ‘Type 1 error’) leads to the implementation of a change that has no benefit and may even be adverse, if only in cost. The second mistake (called a ‘Type 2 error’) leads to a failure to implement a beneficial change, i.e. an opportunity lost.
It is important to understand that the inherent uncertainty in all data collected under whatever circumstances guarantees that these risks can never be zero. The trick is to get them down to an acceptable level, which is a professional not a statistical decision.
These principles are well known in statistical methodology and have been successfully used in most numerate disciplines for decades. Mineral engineers are now also starting to understand their power and value in guiding the management and implementation of beneficial innovation, though it is not yet in our profession’s DNA as it should be.
Choosing the Hurdle Rate
- 1.
We are 97% confident that there has been a real improvement in flotation recovery using the new reagent.
- 2.
The best estimate of the improvement in recovery with the new reagent is 2% (the mean difference between the old and new reagents).
- 3.
The uncertainty in this value, expressed as a 95% confidence interval, is ±1%. Thus we are 95% confident that the true value lies in the range 1–3%.
- 4.
We are 95% confident that the improvement is at least 1.5%.
Each of these four conclusions throws a different light on the outcome of the trial, and taken together comprises everything that a decision-maker needs to know: do we have an improvement (yes), what is the likely magnitude (2%), and what is the uncertainty in that value. The last of the four statements is in a sense the most powerful. It represents the worst case scenario—we can say that the improvement is at least 1.5% with a chosen level of confidence (95% in this case). If 1.5% is enough to pass the company’s investment criteria, then in a sense the company is obliged to implement the change in order to act in the shareholders’ best interests (other things being equal). A good statistical design and analysis has de-risked the decision.
Note that we have concluded that the range of uncertainty in the effect is 1–3%, but that the improvement is at least 1.5% (a lower limit). These statements are not incompatible. They flow from the statistical principles of one- and two-sided risks. In the former case we have no interest in the upper limit, so restricting ourselves to a single (lower) limit gives us more confidence in the result at the chosen confidence level. In the latter case we want to know both limits simultaneously so the result is necessarily less precise. Taken together, these conclusions are extraordinarily powerful, but require the correct protocols to be followed in order to exploit this power. Doing so is not difficult.
In planning the experiment, we need to know the acceptable risks of making Type 1 and Type 2 errors so that we can determine the appropriate sample size (number of repeats).
In interpreting the result we need to decide on a confidence level hurdle rate by which we decide whether the effect is statistically significant or not. Do we have a benefit or not?
We need to choose confidence levels with which to quote the two-sided confidence interval on the effect measured in the experiment, and the lower confidence limit (the ‘worst case’).
None of these confidence levels need to be the same, and their choice is essentially arbitrary. There is no ‘right answer’. It is up to the experimenter to choose them. But a large body of practice over many decades has led to some conventions which form a useful guide and are generally followed.
Recall that ‘confidence’ is essentially the complement of ‘risk’ in this language. If we have a high confidence in an effect, there is a correspondingly low risk that it is in fact ephemeral. Risk is measured by a P-value (‘P’ for probability) which is calculated in the statistical analysis of the data, such as a t-test. P is the risk of being wrong in concluding that an effect exists when in fact it does not (the Type 1 error). So if the analysis returns P = 0.05, there is a 5% risk that we will be wrong in concluding that a real effect exists. (In the rigorous language of the hypothesis test, if we did the experiment many times, a result such as ours would only arise on 5% of occasions if there was no effect). So we can reject the null hypothesis that there is no effect and conclude that in fact there is an effect, with a risk of only 5% of being wrong. Remember that the risk (P) will never be zero. It’s just a question of whether the risk is low enough to be accepted.
100(1-P) % is the confidence with which we can draw this conclusion. If P = 0.05, then we are 95% confident that the effect is real and has not simply arisen by chance as a consequence of the ubiquitous experimental error. But of course there is still a 5% chance that we are wrong and in fact the apparent improvement is an illusion. The choice of P-value (or confidence level) for making decisions is always a compromise between risk and reward. If we choose too high a confidence level (too low a risk) then it will be difficult to make useful decisions as the hurdle rate is simply too high. Conversely if we choose a low confidence level (high risk), then we are going to make more mistakes.
The most frequently used confidence level for judging the significance of an effect (the second of the dot points above) is 95%, and we have found this to be appropriate in mineral processing optimisation. 95% leaves us with a 1-in-20 chance of incorrectly concluding that an improvement has occurred when in fact it has not. So if we stick to this value all the time then on average we will be wrong 5% of the time. We just have to hope that when we are, the consequence is not career-limiting.
The huge influence of RA Fisher, who recommended it in a paper in 1926 [2].
The fact that it seems to work, based on evidence from decades of use in a wide range of scientific and technical disciplines.
Because of its history, it is important to understand what Fisher actually said and why. Sir Ronald Fisher (1890–1962) was one of the greatest statisticians of the 20th century, who made fundamental contributions to statistical method but who also had a deep understanding of the practical problems faced by experimenters, gained during his 14 years as a statistical adviser to the Rothamsted agricultural research station in England. He was a prime mover in the development of hypothesis testing , invented the analysis of variance (ANOVA) and the F-test (named after him), and published the first and perhaps most influential book on experimental design [1], which went through 14 editions over 45 years.
In the 1926 paper he imagined two 1-acre test plots, one of which had received a manure fertiliser and the other had not. The fertilised plot yielded a crop 10% greater than the un-fertilised plot. The question then is: what reason is there to attribute the improvement to the fertiliser? Perhaps the improvement had simply arisen by chance, due to other uncontrolled factors and simple variation (experimental error). Fisher then said: “If the experimenter could say that in twenty years experience with uniform treatment the difference in favour of the acre treated with manure had never before touched 10%, the evidence would have reached a point which may be called the verge of significance; for it is convenient to draw the line at about the level at which we can say: ‘Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials’. This level, which we may call the 5% point, would be indicated… by the greatest chance deviation observed in twenty successive trials… If one in twenty does not seem high enough odds, we may… draw the line at one in fifty (the 2% point) or one in a hundred (the 1% point). Personally, the writer prefers to set a low standard of significance at the at the 5% point, and ignore entirely all results which fail to reach this level.” He then explains how the experimental data itself can be used to calculate the chance of obtaining a ‘significant’ result given the natural variation in the data (the ‘error’), rather than waiting 500 years (his figure) to accumulate the necessary experience.
He is comfortable with the 5% risk point (95% confidence), based on his own experience.
He regards this as a “low standard of significance”, and recommends that as a form of risk insurance we should then “ignore entirely all results which fail to reach this level”. The literal interpretation of this statement is that if the P-value from the hypothesis test is 0.051 (94.9% confidence) then we should regard the result as non-significant, but if P = 0.049 (95.1% confidence) then we should conclude that a change has actually occurred. So it is black or white—no grey allowed.
In view of the accumulated experience with using the 95% level of confidence, one could argue that we would have to have a remarkably good reason not to use it, and we have found it generally appropriate in mineral processing plant trials. The choice of confidence level essentially depends on the context of the decision. If there is no downside to the change (e.g. no extra cost) then one might relax the hurdle rate to 90%.
Case Studies
We will now illustrate these ideas with some real case studies in mineral processing plants. Full details are reported and available in the respective original publications.
Raglan Concentrator 2010
Summary of test results by treatment group—base metals
Trial block | Dates | Nickel | Copper | ||||
---|---|---|---|---|---|---|---|
Feed | Conc | Tails | Feed | Conc | Tails | ||
Base case PAX | 20 Feb–21 Apr | 2.77 | 17.18 | 0.41 | 0.80 | 4.54 | 0.21 |
PIBX n = 20 | 23 Apr–6 May | 2.58 | 17.56 | 0.35 | 0.74 | 4.60 | 0.17 |
PAX n = 22 | 9–19 May | 2.71 | 17.07 | 0.43 | 0.76 | 4.35 | 0.20 |
PIBX n = 20 | 30 May–12 Jun | 2.62 | 17.89 | 0.36 | 0.73 | 4.54 | 0.17 |
PAX n = 26 | 15 Jun–12 Jul | 2.97 | 17.23 | 0.39 | 0.77 | 4.23 | 0.17 |
Summary of test results by treatment group—precious metals
Trial block | Dates | Platinum | Palladium | ||||
---|---|---|---|---|---|---|---|
Feed | Conc | Tails | Feed | Conc | Tails | ||
Base case PAX | 20 Feb–21 Apr | 0.90 | 3.22 | 0.51 | 2.18 | 13.03 | 0.39 |
PIBX n = 20 | 23 Apr–6 May | 0.90 | 3.36 | 0.46 | 2.03 | 13.26 | 0.33 |
PAX n = 22 | 9–19 May | 0.95 | 3.22 | 0.57 | 2.26 | 13.69 | 0.39 |
PIBX n = 20 | 30 May–12 Jun | 0.82 | 3.20 | 0.44 | 2.03 | 12.85 | 0.29 |
PAX n = 26 | 15 Jun–12 Jul | 0.85 | 2.75 | 0.47 | 2.09 | 11.25 | 0.29 |
Observed differences in recovery
Paymetal | Recovery by PAX | Recovery by PIBX | Difference % |
---|---|---|---|
Ni | 87.46 | 88.50 | 1.04 |
Cu | 79.56 | 80.57 | 1.01 |
Pt | 49.58 | 52.35 | 2.67 |
Pd | 85.65 | 87.44 | 1.79 |
Testing with ANOVA
Paymetal | Trial F | Tail area | Comments |
---|---|---|---|
Ni | 17.39 | <0.1% | Significant |
Cu | 14.73 | <0.1% | Significant |
Pt | 28.13 | <0.1% | Significant |
Pd | 37.89 | <0.1% | Significant |
Eland Concentrator 2011
Summary of plant trial data
Collector | Feed grade g/t PGE | Conc grade g/t PGE | Tails grade g/t PGE | Chrome in Conc % | Grind % −75 µm |
---|---|---|---|---|---|
PIBX | 3.30 | 159.3 | 1.00 | 2.04 | 77.8 |
Exp 820 | 2.97 | 183.6 | 0.95 | 1.77 | 77.9 |
The tailings grades were used as the basis for ANOVA testing , producing a trial F value of 2.86 against a critical value of 2.39 at the 90% confidence level. So the observed drop in tailings grade from the mixed collector Exp 820 was at least 90% significant.
Trial of a Flotation Enhancement Device in a Base Metal Concentrator
Data from flotation reagent plant trial
Statistic | Old reagent | New reagent |
---|---|---|
Recovery | 69.02 | 70.83 |
Recovery increase (%) | 1.81 | |
Significance | P = 0.004, confidence = 99.6% | |
2-sided 95% Cl on increase (± %) | 1.33 | |
Lower 95% limit of increase (%) | 0.69 |
Given the data, these calculations take about a minute in Excel. Following the sequence of conclusions identified earlier, the mean improvement was 1.8% recovery , we are 99.6% confident that this is not zero (i.e. there has been an improvement), the 95% confidence limits on the improvement are ± 1.3% (so the interval is 0.5–3.1%), and we are 95% confident that the improvement is at least 0.7%. If the 0.7% pays for any increase in cost, and there are no deleterious downstream implications, then the decision must be made to switch to the new reagent. A properly conducted and analysed trial has provided the quantitative risk management required for informed decision-making.
Discussion
The Value of Hypothesis Testing
At this point, we would like to introduce some important discussion regarding the value of hypothesis testing , and clarify some of the more common misunderstandings that we see in the mineral processing discipline. In hypothesis testing , we estimate the likelihood that the observed difference arose by chance, which if sufficiently likely, leads us to discredit the idea that the two treatments produced different results. The choice of “sufficiently likely” is a key decision that must be appropriately made in the context of the plant trial. But even if this is set at a very low level, e.g. less than 1%, there is still a 1 in a 100 chance that the observed differences did not arise from the different treatments. In other words, there is never a zero chance of getting the analysis wrong. It is simply the amount of chance that we can tolerate that must be carefully defined and understood. However, whatever the deficiencies of hypothesis testing , its proper use makes a big difference to project outcomes in which good decisions are made. The failure to use appropriate hypothesis testing in a plant trial often leads to an indifferent outcome at huge expense.
Another major point is the “power” of the data set to detect a change in performance. Small data sets differ greatly from large ones in that the former are vulnerable to uncertainties in parameter estimation, e.g. the mean and standard deviation. In other words, large data sets are “robust” because of the large number of data, allowing simpler and more accurate parameter estimation. Even though famous statisticians such as Bessel wrote a correction for this in small data sets, as did Sichel (1966) for the lognormal distribution , we should always aim for larger data sets in plant trials, especially designing in an amount of redundancy. Generally, 30 observations seems to be the dividing line between small and large data sets; and in a good plant trial wherein clear and defensible conclusions may be drawn, a three-month operations data set with designed “on” and “off” switching, yielding 180–270 data points (depending on whether two or three shifts are worked in each day) is preferred. Formulae exist to calculate appropriate sample sizes for particular experimental designs [8, 13].
In the analysis of the test data, the observed difference in plant performance (the ‘effect’) should be accompanied by its confidence limits. This is consistent with the above discussion on the non-zero risk that resides in any plant trial data. The confidence limits should be declared at their selected significance levels, e.g. 95% (with a 5% probability of being wrong). For cases where a performance improvement is sought, the lower confidence limit is also called the “worst case scenario” (i.e. this is the least performance gain that we expect to demonstrate).
In all of this, the investigator must still apply his or her common sense to the outcome from the data. The statistical testing can only be a part of the decision-making; once through the calculations with a significant result, the investigator should be drilling down into the process and asking why the difference was observed, deriving a mechanism to explain the change in the process through expert knowledge. In other words, that statistics helps us to state “what” happened, and common sense and logical analysis help us explain “why” it happened. Once both of these are in place, the plant can move on and implement the tested change as a new process. However ongoing monitoring of the improved performance should be in place to ensure that the change is sustainable. The investigator should also beware of small improvements that may be statistically significant but are too small to be of practical value. If the sample size is large enough, almost any change can be judged to be statistically significant.
Concluding Remarks
Incremental improvements are an important part of the metallurgist’s armoury in the task of generating shareholder value. Proving them to be real and not ephemeral requires the data to be collected and analysed properly. These procedures are not difficult. The hypothesis test allows the significance of the observed improvement to be judged against the inevitable process noise, and placing confidence limits on the improvement allows an informed cost-benefit analysis to be made to support the decision as to whether to adopt the change or not. Any other approach is non-optimal, and, one could argue, unprofessional.
Acknowledgements
The authors would like to thanks their many friends and colleagues in the profession, as well as their friends in pure and applied mathematics, for their helpful discussions and suggestions.