Oil Analysis Accuracy

Joined
Jul 1, 2024
Messages
45
What is the accuracy of an oil analysis? Does anyone have any documentation from Blackstone or SPEEDiagnostix or any other oil analysis lab showing the accuracies for each element tested for, e.g. +- 1 ppm for iron, etc?
 
What you raise in your question are actually several topics rolled into one.

There's a concept called "Gauge Repeatability and Reproducibility" (R&R). This takes into account not just the machines, but the human interactions with the machines. Though the ICP machine is automated, the samples still have to be prep'd, calibrated, validated, maintained and operated by humans. There are a lot of variables in this process.

As a generalization, based on my visit to Blackstone Labs about a decade ago, the ICP machines themselves are actually quite accurate. But the data they put out is only reported in whole numbers (5ppm; 10ppm; 2ppm; etc ...). It's unclear to me how well the refinement is in the measurements themselves and that went unanswered at the time. I don't believe any of the labs have formal public "accuracy" statements.

I always wanted to do a true DOE R&R for a lab, but BS at the time didn't have the time to allot to such a long testing protocol. I'd want to do at least 45 samples; min 3 operators and min 15 unique samples. I'd be fascinated to find out how good (or bad) each service is.
 
This is exactly what I am curious about. If they report 5ppm of iron, is that +-1ppm or +-5ppm, and how reproducible are the results (rhetorical)? This would be important to know in order to determine whether or not the results between one oil and another are statically significant. They seem to imply it is +-1ppm, but without documentation, it’s just an assumption.
 
What is the accuracy of an oil analysis? Does anyone have any documentation from Blackstone or SPEEDiagnostix or any other oil analysis lab showing the accuracies for each element tested for, e.g. +- 1 ppm for iron, etc?
Short answer, not very. Industry standard for EPA certified labs is no more than +/- 10%. Given the cost of a UOA and throughput of oil anyalysis labs, expecting less than 10% is unrealistic. They can't afford the time or cost to get optimum results for each element, although 10% at all but the lower concentrations is achievable with just a bit of care.

Every element is going to have a different accuracy and precision based on the instrument conditions for each element and the concentration of the element. Visualize a funnel with a dowel centered through it. The dowel represents the true analytical value. The wide end will be the +/- % for the lowest concentration reported. The +/- % tapers down as the concentration increases until the precision becomes a relatively constant value(ideally 10% or less) above a certain concentration.

As has been stated it extremely complicated and expensive to calculate those numbers. It's not practical or necessary for intended scope of an oil analysis. Before retiring I was a research chemist doing atmospheric chemistry analysis. Every analytical run that I did had all of the QA/QC samples needed to fully calculate the accuracy and precision of the instrument throughout the reported concentration range. There were often more QA/QC samples than unknowns. Every single value that I reported had an "uncertainty" value associated with it. That was the +/- value(error bars) for each value. It was calculated based on the accuracy and precision of the instrument at that specific concentration on that analytical run. Theoretical values from the manual or literature were't used. I calculated those numbers from the real time performance of the instrument during each analytical run. You don't get that for $35.

I used to exasperate my boss every time he'd come in and ask what the detection limit for X was when preparing a proposal. I always told him I wouldn't know until I'd run the samples.

Here's a thread posted by Pablo where he sent the same sample to four different labs. It's pretty ugly from where I sit.
https://bobistheoilguy.com/forums/threads/20k-mile-amsoil-asl-4-lab-comparison.15383/

Ed
 
Last edited:
This is exactly what I am curious about. If they report 5ppm of iron, is that +-1ppm or +-5ppm, and how reproducible are the results (rhetorical)? This would be important to know in order to determine whether or not the results between one oil and another are statically significant. They seem to imply it is +-1ppm, but without documentation, it’s just an assumption.

Honestly, send the same sample off twice with a few weeks in between and see how different the results are. I'm betting up to 20%
 
Honestly, send the same sample off twice with a few weeks in between and see how different the results are. I'm betting up to 20%

What happens if one is taking vitamin or Iron supplement and contaminates the sample? 🤣
 
A note of clarity for this discussion:
Do not confuse accuracy with calibration; these and other concepts would need to be well defined prior to judging anything in this effort.

Accuracy is often an implied concept which includes many things I already mentioned above. A machine can be capable of repeating its results to a very high degree, but it may not be calibrated well enough for the intended effect.

Example 1: Think of a rifle with a scope. If the rifle can make a nice, tight group of less than one MOA, but places it's bullets two inches high and right of the intended POI, that indicates that the accuracy is very high, but the calibration is off.

Example 2: Think of a weight scale. In the effort to lose weight, a person may weigh him/herself every morning. The scale may not be calibrated well, but it could be fairly accurate. Hence, it may not tell you the true total weight, but it would be able to "accurately" display the weight lost. If you started at 220 lbs and dropped to 190 lbs, that's a 20 lb loss. Conversely, if you started at a true 215 lbs and dropped to 195 lbs, it's still a 20 lb loss. The true value is set by calibration, but the amount of loss is the accuracy of measurement.

The fact that Pablo sent some samples into various labs (many years ago) and got varied results should be in no way surprising.

Whatever lab someone chooses to use, it's important to stick with that lab for your UOAs to try and reduce the variation of variable inputs.


And before anyone whines about the inherently inaccuracy of UOAs from unknown levels of various inputs, I'd ask you to tell me another method of judging wear that only costs $30 or less, and only takes very little effort to collect at the OCI time. The alternatives of measuring wear, such as component measurement upon engine tear down, are WAY more intrusive, WAY more time consuming, and WAY more prone to measurement errors because of R&R problems. I often hear people say that tear-down analysis is the only reliable way to measure wear, and frankly those folks are diluted with theory that they've never seen applied in a lab. For several years, I was the supervisor in a quality analysis lab, and I dealt with machine and instrument calibration tracking and efforts for our entire facility on a daily basis. The topic of R&R is of the utmost concern, and most folks just assume that their favorite/preferred analysis is the "best" with no understanding of the challenges behind each methodology.

Are UOAs perfect? Nope, but nothing is. But they are a reasonably reliable means of measuring wear that has a low cost and low effort requirement. Nothing else comes close to the ROI of a UOA.

.
 
Last edited:
Accuracy and precision are two different things and are measured in two different ways. Accuracy (how well shots cluster on a target regardless of location on the target) vs. how close to the shots are to the bullseye (precision). For the lab, you measure precision using standard/certified reference materials. This tells you if 4.5 ppm Fe is really 4.5 ppm Fe - the question here. Accuracy is typically handled through duplicates and ideally, all fo this is done blind so the lab doesn't know which samples are standard or duplicates. So sure, send BS 6 samples of the same oil (is it really the same? How was it collected?) and see how they plot up. In this case, these duplicates are also measuring how well the sampling method represents the oil. You can also have a lab do their own dups where they would take a single sample, homogenize and split it to get duplicate results. Standards would need to be constructed using a new oil with some added anlayte (Fe) in a known concentration and tested. Typically, you have the lab in question run dozens of these standards to develop the mean/SDs to then measure against. You can also "round-robin" these standards out to other labs for comparison obviously assuming the exact same method is used. Labs using equipment like ICP will also have their own internal calibration and QA/QC standards they should be running at some determined interval. I'm sure BS can provide that info if requested like any good lab to give end-users confidence that 4.5 is....4.5.

While I make my living at this point helping clients with questions like this for mineral exploration datasets (I'm an independent consulting geologist) and QA/QC results are a big part of whether I can sign off on their Mineral Resources as a Competent Person so you as investors can be confident that the company has XYZ tons at ABC grade is in fact, as reported. For all the UOAs I've done, I've given zero consideration to QA/QC b/c it's just not that critical for this purpose in my opinion but it raises some questions.
 
Last edited:
What you raise in your question are actually several topics rolled into one.

There's a concept called "Gauge Repeatability and Reproducibility" (R&R). This takes into account not just the machines, but the human interactions with the machines. Though the ICP machine is automated, the samples still have to be prep'd, calibrated, validated, maintained and operated by humans. There are a lot of variables in this process.

As a generalization, based on my visit to Blackstone Labs about a decade ago, the ICP machines themselves are actually quite accurate. But the data they put out is only reported in whole numbers (5ppm; 10ppm; 2ppm; etc ...). It's unclear to me how well the refinement is in the measurements themselves and that went unanswered at the time. I don't believe any of the labs have formal public "accuracy" statements.

I always wanted to do a true DOE R&R for a lab, but BS at the time didn't have the time to allot to such a long testing protocol. I'd want to do at least 45 samples; min 3 operators and min 15 unique samples. I'd be fascinated to find out how good (or bad) each service is.
You left out precision. Duplicate samples only give the accuracy component....so they are repeatable at 4.5ppm +/- .3 ppm but the sample is actually 10ppm. Not good. I'm sure BS and other labs utilize some sort of standard reference materials to calibrate their equipment but would be interesting to see their QA/QC data/procedures for all of their testing equipment/tests. I know I had a viscosity value come back way out of line and I had them repeat the test and it returned a more reasonable result...which was correct?
 
Accuracy and precision are two different things and are measured in two different ways. Accuracy (how well shots cluster on a target regardless of location on the target) vs. how close to the shots are to the bullseye (precision). For the lab, you measure precision using standard/certified reference materials. This tells you if 4.5 ppm Fe is really 4.5 ppm Fe - the question here. Accuracy is typically handled through duplicates and ideally, all fo this is done blind so the lab doesn't know which samples are standard or duplicates. So sure, send BS 6 samples of the same oil (is it really the same? How was it collected?) and see how they plot up. In this case, these duplicates are also measuring how well the sampling method represents the oil. You can also have a lab do their own dups where they would take a single sample, homogenize and split it to get duplicate results. Standards would need to be constructed using a new oil with some added anlayte (Fe) in a known concentration and tested. Typically, you have the lab in question run dozens of these standards to develop the mean/SDs to then measure against. You can also "round-robin" these standards out to other labs for comparison obviously assuming the exact same method is used. Labs using equipment like ICP will also have their own internal calibration and QA/QC standards they should be running at some determined interval. I'm sure BS can provide that info if requested like any good lab to give end-users confidence that 4.5 is....4.5.

While I make my living at this point helping clients with questions like this for mineral exploration datasets (I'm an independent consulting geologist) and QA/QC results are a big part of whether I can sign off on their Mineral Resources as a Competent Person so you as investors can be confident that the company has XYZ tons at ABC grade is in fact, as reported. For all the UOAs I've done, I've given zero consideration to QA/QC b/c it's just not that critical for this purpose in my opinion but it raises some questions.
Even taking one or multiple samples whilst draining from a sump can vary. On industrial engines - the cooling loop has a sample point. The collection is taken with the engine running at full temperature and under typical load conditions …
 
Even taking one or multiple samples whilst draining from a sump can vary. On industrial engines - the cooling loop has a sample point. The collection is taken with the engine running at full temperature and under typical load conditions …
Exactly.
 
You left out precision. Duplicate samples only give the accuracy component....so they are repeatable at 4.5ppm +/- .3 ppm but the sample is actually 10ppm. Not good. I'm sure BS and other labs utilize some sort of standard reference materials to calibrate their equipment but would be interesting to see their QA/QC data/procedures for all of their testing equipment/tests. I know I had a viscosity value come back way out of line and I had them repeat the test and it returned a more reasonable result...which was correct?

I think we are discussing a similar concept using slightly different terms.

Using your word "precision" is akin to me using "calibration". (I think we agree on "accuracy"). We can combine our two different words by stating effort is put into making an instrument more "precise" by "calibrating" it. (Adjusting a rifle scope for windage and elevation, for example). Precise is the word to describe the closeness of accuracy to the intended target, and calibrating is the action taken to ensure that desired result. Unless I misunderstand you, I think we're just using different words for the same concept; that of adjusting the instrument to give a reliably predictable result in the expected area or range. "Calibration" is the act of adjusting the instrument to obtain the "precision" desired.


And yes, reference standards are typical in most labs. These standards can help understand when the "precision" is either on or off target.
 
Last edited:
Quick point of clarification…I understand there is potential variation and error introduced by the customer taking the oil sample and the human performing the analysis, BUT the instruments the labs use should already have known error values for each element they are capable of measuring that were established by the equipment manufacturer. If I send a sample in (assuming no human error) I would like to know how accurate the results are capable of being. The way the results are reported, they seem to imply having +-1 ppm of accuracy for each element, else how could they in good conscience report say 6ppm if the accuracy was in fact only +-5ppm. However, BS’s fuel dilution is already a subject of hot debate, so I think it is entirely reasoned to seek the actual accuracy for all the other measurements. Without this, we are assuming 6ppm really means 6ppm, but it COULD just be marketing. I don’t believe it is, but we should be able to get this info and end speculation.

Also, for precision vs accuracy https://en.wikipedia.org/wiki/Accuracy_and_precision
 
Last edited:
I disagree with the Wiki definitions; I see that as backwards of what is typically practiced. And it seems (if I have read TiGeo's post correctly), he may agree with me. But if not, it doesn't make either of us wrong. It just is an opportunity to adjust our terms and definitions such that agreement is in place to move forward in discussion.


For the lab geeks out there ... I think this helps us have the conversation:

- Accuracy describes the stdev of the data; the lower the stdev, the more accurate the grouping; smaller is better, generally indicating low variation
- Precision is the definition we use to describe how close that grouping is to the intended target, which is typically an agreed reference standard or desired result
- Calibration is the act of adjusting the precision to move the group towards it's reference standard


What is most important is that there is agreement about what the terms mean, so that consistent conversation can take place.
 
For the lab geeks out there ... I think this helps us have the conversation:

- Accuracy describes the stdev of the data; the lower the stdev, the more accurate the grouping; smaller is better, generally indicating low variation
- Precision is the definition we use to describe how close that grouping is to the intended target, which is typically an agreed reference standard or desired result
- Calibration is the act of adjusting the precision to move the group towards it's reference standard

That seem reasonable?

This is how I learned it.

Accuracy is how close a given set of measurements (observations or readings) are to their true value.

Precision is how close the measurements are to each other.

In the shooting example, accuracy would be whether or not you are zeroed to your intended target and precision would be the moa.
 
Last edited:
Again, as far as the labs go, I don't know any of them that have a formal public statement about the accuracy/precision reporting.

We could "assume" that because labs report to the whole number (1,2,3, ...), then they are measuring in the tenths (.1, .2, .3, .....). But without confirmation, it's just a guess.

Feel free to reach out and ask those labs what the ICP reporting threshold means; I think many of us would like to know.
 
Lastly, there is a distinct and important difference of concept here ...

How well some product or process puts out a result is different from how some other device or instrument measures that result.

The Wiki link a few posts back appears to be defining the results from a process or product. It's not defining how that is measured.

IOW ... How "accurate" a gun is at making a tight grouping, and how "precise" the scope is adjusted, is different from how "accurate" and "precise" the instrument and process are which measure those shots.


In this post, we're concerned about how well we can trust the results from the ICP machines; not how well the engine makes wear particles.
 
What you raise in your question are actually several topics rolled into one.

There's a concept called "Gauge Repeatability and Reproducibility" (R&R). This takes into account not just the machines, but the human interactions with the machines. Though the ICP machine is automated, the samples still have to be prep'd, calibrated, validated, maintained and operated by humans. There are a lot of variables in this process.

As a generalization, based on my visit to Blackstone Labs about a decade ago, the ICP machines themselves are actually quite accurate. But the data they put out is only reported in whole numbers (5ppm; 10ppm; 2ppm; etc ...). It's unclear to me how well the refinement is in the measurements themselves and that went unanswered at the time. I don't believe any of the labs have formal public "accuracy" statements.

I always wanted to do a true DOE R&R for a lab, but BS at the time didn't have the time to allot to such a long testing protocol. I'd want to do at least 45 samples; min 3 operators and min 15 unique samples. I'd be fascinated to find out how good (or bad) each service is.
Yeah this would be the way, but as always on a volume-based business that’s not evaluated/held to a specific standard, their basic machine calibrations are all they really care about. To really get crazy on this idea, multiply your 45 samples across 5-6 testing labs and then you’d be able to look at variance not only between machines but also companies. It would also give great insight with a Student’s t-test if there were statistically significant differences, or if it was just noise.

I’d like to believe the majority of differences are just noise and not statistically significant, but obviously we just have to take the labs at their word for now.
 
I think we are discussing a similar concept using slightly different terms.

Using your word "precision" is akin to me using "calibration". (I think we agree on "accuracy"). We can combine our two different words by stating effort is put into making an instrument more "precise" by "calibrating" it. (Adjusting a rifle scope for windage and elevation, for example). Precise is the word to describe the closeness of accuracy to the intended target, and calibrating is the action taken to ensure that desired result. Unless I misunderstand you, I think we're just using different words for the same concept; that of adjusting the instrument to give a reliably predictable result in the expected area or range. "Calibration" is the act of adjusting the instrument to obtain the "precision" desired.


And yes, reference standards are typical in most labs. These standards can help understand when the "precision" is either on or off target.
Calibration is done to ensure instrument precision. The difference in using strandard/certified reference materials to check a lab's calibration/precision is that you the customer are controlling this by insertion of these samples of known concentration blindly to check/verify (trust but verify) their precision. Unless you have that information or info from the lab and you trust it, you can't comment on their precision and only can assume which for a commerical lab is a reasonable thing to do.
 
Back
Top Bottom