If you'd like to delineate between my terminology of 'bad data' and your terminology of 'data of little use', especially in the context of a what-if scenario that you readily admit is implausible and/or unlikely, then go right ahead.
My focus here is on the quality of the data.
Typically a person seeks means and standard deviations of data sets that exhibit a normal distribution (unless you're using non-normal analysis techniques such as Box-Cox, Johnson transformation, Weibull, etc....) By definition, the data in question is skewed and can be considered a statistical flyer.
Statistical fliers are never deleted from the data set. They are identified as such and removed from the analysis. The problem though is that you don't know how much of the wear numbers are from true wear on the engine and how much is from the manufacturing process.
You have zero insight as to the quality of your data. So you're darned if you leave it in (and over report true wear) and you're darned if you take it out (and under report true wear).
Like I said earlier, the data is confounded by uncontrolled external variables. It is bad data.
Again, I think we're mainly on the same page, but I just disagree with some of your wording.
If we wanted to know how ALL factory-fill OCIs are doing in any engine series produced (this engine; a 3.5L EB, a 3.0L Camry, etc), then we could take a group of ALL FF known UOAs and they could give us a "normal" distribution because if we ONLY include FF loads, that data which you called skewed won't be seen as substantially different.
I get what you're saying and generally agree; the inference I take is that you are implying that the FF UOA data will be wildly different from UOA data two or three UOAs later. That is totally true and I agree with it.
But I disagree that the FF data is "bad" or useless. It just should not be included in data sets AFTER an engine has established it's wear pattern once "broken in". If we wanted to know what is normal during "break in", we'd want to use this data. But that's not really what we're after, so this data is pretty much ueseless to our quest for "normal" operational wear trends.
For example, if we wanted to know the mean and stdev values of the typical teenage male regarding height at age 16-19, it's perfectly OK to "exclude" data from when he was 6-12 years old. That younger age data is not "bad" data; it's data that does not fit the study parameters.
Again - this UOA isn't "bad" data; it's data that has no value to our desired goal. We don't exclude it because it's "bad" (erroneous or tainted), it just does not represent the protion of wear we want to understand. There is nothing wrong with purposely excluding data when you properly define the parameters of what you seek.
I am in agreement completely in that FF data is pretty much worthless. All it will do is confirm that break-in is occuring. It should NOT be included in "normal" data sets, because we know for a fact that it does not represent "normal" wear. It would be foolish to include it in a trend line. But that does not make it "bad" data; just useless for our tasks.
NOTE: it is true to say that FF UOAs can show things such as fuel dilution and other parameters not directly affected by the age of the engine. Vis, FP, etc can also be gleaned from any UOA. I generally don't give a hoot about these because they are inputs which may or may not reveal themselves in the results of wear.