...  One thing that will never be done is a real side by side test of different detectors. ...
Side-by-side tests of detectors are done all the time.  The issue is:  The interpretation of the results.
Example:  I can tell you types of detectors that will get any signal you show them .... WITH ROOM TO SPARE !   *But the devil is in the details* :  Can the operator tell you the difference between that, versus the 100 other signals he just hear in the same area ?  Ie.: if "everything sounds the same" , then what benefit have you gained ?
Agree! Many moons ago I was testing two of mine side by side because everyone was posting one beat the other and no contest to the results. In my ground I was getting signals from both- at the limits of their detection. NEITHER prompted me to dig! One was tone/visual ID and the other beep/dig.
Good post.  
I remember back when the Cz6 was still fairly new.  And the guys who were using them were admittedly kicking the b*tts of the Whites & Garrett guys on depth.   for beaches, open fields, etc..., the difference was plain to see.  I was green with envy.   So I took my Fisher Cz6 buddy with me to a turfed park where I knew very deep silver still existed.  Albeit at the fringe of depth to hear.   And I flagged a few suspected deepies , so that we could compare.  Lo & behold he could get any signal I showed him LIKE THE BELLS OF NOTRE DAME!   I was very impressed.   
However, the devil was in the details.  He'd move over 2 ft. in any direction, and get THE EXACT SAME SIGNAL (utterly no tone/sound differentiation) on signals everywhere else too.  Ie.: much less sense of deep vs shallow, high vs mid, bent nail vs conductor, etc....
Which is fine, I suppose, if you're relic or beach hunting.  But in a junky turfed park, where a bit of cherry picking is in order , I would think this would be a handicap.  Not that it can't be done, and/or tricks-to-be-learned, but ... just sayin' .... that there's more to raw depth when it comes to machine comparisons.