If you exclude data, it's no longer a universal average.
But don't let that cloud your thinking.
It depends on what you are trying to average. Including data of engines run on anal grease and kangaroo sperm for 25,000 miles is certainly going to skew the results, are you trying to get an average of every unit on the road whether it was run on hot pocket drippings fortified with meth or Redline changed at 2K intervals, or are you looking for averages run on a specific range of lubes and of a specific viscosity that aligns more similarly to your usage profile, maintenance regimen and what you are planning to do to establish a like comparison?
Worded differently:
If I'm collecting ballistic data on Savage rifles, do I include ones run over by a tank, dropped into a sandbox and then fed Chinese surplus, or do I specifically scope what is being included to be meaningful and useful?
Averages from a large fleet of vehicles run on the same lubricant, changed at the same interval, certainly provides some valuable macro data for that equipment in that usage profile, run on that product. If your operating profile aligned similarly with the fleet, and you were considering a similar lubricant choice, this could be quite valuable data. If you are planning on running Supertech and doing WOT pulls to Walmart and dumping it once a month, probably not. Conversely, knowing that your engine is throwing better numbers than a sample group from within which quite a few were fed the wrong viscosity and tossed a rod is pretty low value. If we exclude those and the comparison significantly changes, that value is arguably of higher value.