The world of statistical analysis and statistical modeling is way too vast for me to try and hold a seminar here on the inter-webbie.
But, I can at least shine some light onto this at least in a cursory manner.
I am going to discuss statistical
analysis; crunching data from real world collection. Statistical
modeling takes only a few samples and then predicts possible outcomes. I have training in both, but I prefer analysis for things like UOAs because we have a huge repository of data to use, so why not analyze real world numbers rather than model a guesstimate? The upside to modeling is that (when samples are few and far between, (or samples are uber costly to generate) modeling can be reasonably inexpensive because it uses far fewer samples to generate data. The downside to modeling is that if you choose the wrong confidence interval (CI) or leave things out (on purpose or by accident) that affect the real nature of the subject interaction, you can get a wildly bad data set. The upside to analysis is that it's based on true, historical performance, so "reality" is truly "real world" experiences. Downside to analysis is that it can consume an inordinate amount of time and money because it requires large data sets.
For UOA data, it's real world, and it's paid for by each individual, so "micro" take a long time and costs a lot, but "macro" data comes from a whole host of folks, so it comes quick and its "per person" cost is very low overall. This is why I use UOA macro data; it's real, it's cheap, it's quick; it's accurate.
I will start here ...
In this image below, you see two axis; the X is samples and the Y is a representation of the value of one standard deviation, expressed as a percentage. It's important to note that the Y axis is NOT a quantifiable magnitude, but rather a concept of the presumed real value of your sigma expression.
As you can see, with low sample quantities, the value of your standard deviation (aka stdev, sigma, node) can vary WIDELY. At 10 samples, the "real" value of your stdev could be anywhere from .7 all the way up to 1.8. IOW, there is a huge amount of variation unknown because even at 95% confidence interval, you don't have enough data to accurately define the amount of variation in the process you study. As you get to 30 samples, it's at least reasonable. At 50 samples, it's about as good as it needs to be for most things, and adding more samples really does not gain much accuracy. Past 100 samples, the change is accuracy is nearly moot; the lines would converge at infinity, but who needs that???? No one.
For more clarity, let's put some numbers to this to make sense.
If the "real" (true) stdev was 2.5 in magnitude, consider if you only took 10 samples, the mathematical range for your calculated stdev would be anywhere from 1.75 to 4.5. But at 30 samples it would be much tighter; 2.0 to 3.25. More samples? Even tighter. These numbers are only true for this CI at 95%. If we chose a higher CI, the range would narrow further. If we choose a lower CI, the graph would ever broaden in disparity.
This is why 30 samples is a minimum needed to do ANY analysis decently. Doing only a few samples will easily tell you an average value, but it will tell you nothing credible about the amount of natural variation in the process being studied.
**************************************************************************************
That all in mind, we can now look at the differences of micro vs. macro data.
Review this image:
This chart represents data for Fe samples where the OCIs were 7.5k miles.
The color coding represents different vehicles.
The x axis in the graph is sample and the y axis is Fe ppm.
Micro column represents 30 samples from one subject vehicle.
Macro column represent 3 samples from the one subject vehicle, plus 27 others.
Now look at the Fe average and stdev.
See how the average Fe is very similar between the two, but the stdev is wider in the macro study?
The reason the stdev is wider for the macro group is because they represent a much more diverse use; different grades, brands, viscosity, severity factors, temps of operation, etc. ALL manner of daily life variation across our continent is represented in the "macro" data, because it comes from many different sources. When you use macro data, you don't have to make adjustments for "other" influences; they are already included in the data! Those that object to seeing their UOA compared/contrasted to macro data have no idea of why their objection is moot! When I look at UOA data from macro sources (remember ... I have about 15k UOAs in my database now, and I have access to Blackstone's entire catalog file as we still collaborate from time to time), I can tell you with great certainty how much variation there is across the "real world". Alaska or Arizona, steamy FL or dry SD, towing a 15k pound RV or driving light, city or highway, short trips or long hauls, it's all in there!
When I look at UOA data, using my thousands of UOAs, I can look at an engine/tranny/diff/gearbox and I have a VERY good idea what the stdev truly is, because often I will have 100 or more UOAs (sometimes up to 600 UOAs) for any given application. Don't misunderstand me; I don't have UOAs for every vehicle under the sun because there are too many to track and/or many I just don't care about. I don't have data on Chinese cars, old Yogos, Argentinian trucks .... I track mainstream vehicles in North America.
What we can do, using macro analysis, is compare/contrast your one or two UOAs to the mass population. We can know what is "normal" for nearly anyone. At this point, I recommend you go read (or re-read) my Normalcy article.
https://www.bobistheoilguy.com/used-oil-analysis-how-to-decide-what-is-normal/
The difference between micro and macro data is that you don't have to spend a lot of time and money if you use macro data. You will have a wider variation expectation, but that's totally OK.
Looking at the chart above, see how the "UL" (upper limit) changes with the stdev? Broader inputs will result in broader expectations. But that is totally normal and represents real world life.
If you want to know how well your personal single vehicle runs on a particular lube, you will have to run 30 UOAs. And if you want to compare/contrast that one lube to a different lube, you have to run another 30 UOAs. At 5k miles per UOA, and 60 samples, you're up to 300k miles of experimentation! That is why micro data analysis is not really practical for Joe Bitog. But ...
You can look at your UOA against many others, and see how "normal" your situation truly is.
Anytime your results come back within three stdev, then your results are totally "normal"; they are within the expected variation of the study group and process. Anytime something is "normal", it is improbable (if not impossible) to determine which product would be "better" or "best", because the normal variation will make all experiences bounce over time. Most of the time, our members UOAs here are often within one or two stdev's, and so they are not only "normal", but actually centered quite close to average and very well "controlled".
This is why I tell folks whom post only a few UOAs of their own, that they cannot pronounce a "best" lube brand/grade/base-stock. They just don't have enough data to know with any certainty. There is so much variation with low sample quantities that it's foolish to do so. But, when you take your data and review it against others from large sample quantities, you have a very good idea how your data stacks up to "real world" info. And, as I've proven many times, outputs are far more important than inputs. Between my Dmax UOAs and my MGM UOAs, I've shown that most of the time, synthetics and grades are moot for any moderate OCI duration. Macro data often cannot allow us to conclude that one lube is better than another, but it most certainly help us show that neither lube is typically better than another. When all test subjects return "normal" results, then the input conditions (syn vs dino, thick vs thin) don't have much effect. If macro data shows there is no correlation between use and trends, then causation cannot be present. It is impossible to have causation if no correlation exists!
It is totally possible for any BITOGer to actually find out if Mobil 1 is "better" than Amsoil, or if 5w-30 is "better" than 5w-20. All it takes is 300k miles and a LOT of $$$$$. But since most of us just cannot do that, we must rely on macro data to determine normalcy bounds. If you pay close attention to what I state many times, I pick on folks who make unfounded, unproven claims. I often ask them to PROVE their assertions with real data. I don't give one single hoot of care about inputs; don't talk to me about vis, grade, base-stock, etc. Don't inform me about how much Ca or Mg or Phos is in the bottle; it is impossible for me to care less. What I care about, what I spend my time on, is what actually comes out of the crankcase. What I have learned over many years is based on real world data from folks like all of us, all over North America. Every one of you and others that pay for UOAs help me decide what is "normal". Thank you! And in return, I offer my data analysis as proof of what I claim:
As long as your UOAs are within "normal" bounds, your lube did no better or worse than any other lube that also exists in those same bounds.
Again - UOAs are not perfect; they are a tool that has benefits and limitations. But so do other methods of measuring "wear". UOAs, however, are cheap, quick, and easily studied. When the equipment is in good operating condition, they are an excellent means of tracking "normal" wear trends. And, as I said before, there is good correlation between UOA wear data, and other methods such as electron bombardment, component weight, etc. UOAs are not perfect, but they are not as flawed as some think.
Hope this clears things up.
Class dismissed!