This comment/ essay by *rgbatduke* on **WUWT** is well worth reading and digesting.

“this is a point that is stunningly ignored — there are a lot of different models out there, all supposedly built on top of physics, and yet no two of them give anywhere near the same results!”

A professional taking amateurs to task!

(Note! *See also his follow-up comments here and here. * rgbatduke

*would seem to be Professor R G Brown of Duke University?)*

rgbatduke says:

Saying that we need to wait for a certain interval in order to conclude that “the models are wrong” is dangerous and incorrect for two reasons. First — and this is a point that is stunningly ignored — there are a

lotof different models out there, all supposedly built on top ofphysics, and yetno two of them give anywhere near the same results!This is reflected in the graphs Monckton publishes above, where the AR5 trend line is the

averageover all of these models and in spite of the number of contributors thevarianceof the models ishuge. It is also clearly evident if one publishes a “spaghetti graph” of the individual model projections (as Roy Spencer recently did in another thread) — it looks like the frayed end of a rope, not like a coherent spread around somesupported result.physicsNote the implicit swindle in this graph — by forming a mean and standard deviation over model projections and then using the mean as a “most likely” projection and the variance as representative of the range of the error, one is treating the differences between the models as if they are

uncorrelated random variatescausing>deviation around a true mean!.Say what?

This is such a horrendous abuse of statistics that it is difficult to know how to begin to address it. One simply wishes to bitch-slap whoever it was that assembled the graph and ensure that they never work or publish in the field of science or statistics ever again. One cannot generate an

ensembleofindependent and identically distributed modelsthat havedifferent code. One might, possibly, generate a single model that generates an ensemble of predictions by using uniform deviates (random numbers) to seed

“noise” (representing uncertainty) in the inputs.What I’m trying to say is that the variance and mean of the “ensemble” of models is

completely meaningless, statisticallybecause the inputs do not possess the most basic properties required for a meaningful interpretation. They are not independent, their differences are not based on a random distribution of errors, there is no reason whatsoever to believe that the errors or differences are unbiased (given that theonlyway humans can generate unbiasedanythingis through the use of e.g. dice or other objectively random instruments).So why buy into this nonsense by doing linear fits to a function — global temperature — that has

never in its entire historybeen linear, although of course it has always beenapproximately smoothso one can always do a Taylor series expansion in some sufficiently small interval and get a linear term that — by the nature of Taylor series fits to nonlinear functions — isguaranteed to failif extrapolated as higher order nonlinear terms kick in and ultimately dominate? Why evenpay lip serviceto the notion that or for a linear fit, or for a Kolmogorov-Smirnov comparison of the real temperature record and the extrapolated model prediction, has some meaning? It has none.Let me repeat this.

It is indefensible within the theory and practice of statistical analysis. You might as well use a ouija board as the basis of claims about the future climate history as the ensemble average of different computational physical models that do not differ by truly random variations and are subject to all sorts of omitted variable, selected variable, implementation, and initialization bias. The board might give you the right answer, might not, but good luck justifying the answer it gives on some sort of rational basis.It has no meaning!Let’s invert this process and actually apply statistical analysis to the distribution of model results Re: the claim that they all

correctly implement well-known physics. For example, if I attempt to do ana prioricomputation of the quantum structure of, say, a carbon atom, I might begin by solving a single electron model, treating the electron-electron interaction using the probability distribution from the single electron model to generate a spherically symmetric “density” of electrons around the nucleus, and then performing a self-consistent field theory iteration (resolving the single electron model for the new potential) until it converges. (This is known as the Hartree approximation.)Somebody else could say “Wait, this ignore the Pauli exclusion principle” and the requirement that the electron wavefunction be fully antisymmetric. One could then make the (still single electron) model more complicated and construct a Slater determinant to use as a fully antisymmetric representation of the electron wavefunctions, generate the density, perform the self-consistent field computation to convergence. (This is Hartree-Fock.)

A third party could then note that this still underestimates what is called the “correlation energy” of the system, because treating the electron cloud as a continuous distribution through when electrons move ignores the fact that

individualelectrons strongly repel and hence do not like to get near one another. Both of the former approaches underestimate the size of the electron hole, and hence they make the atom “too small” and “too tightly bound”. A variety of schema are proposed to overcome this problem — using a semi-empirical local density functional being probably the most successful.A fourth party might then observe that the Universe is really

relativistic, and that by ignoring relativity theory and doing a classical computation we introduce an error intoallof the above (although it might be included in the semi-empirical LDF approach heuristically).In the end, one might well have an “ensemble” of models, all of which are

based on physics. In fact, thedifferencesare also based on physics — the physicsomittedfrom one try to another, or the means used to approximate and try to include physicswe cannot includein a first-principles computation (note how I sneaked a semi-empirical note in with the LDF, although onecanderive some density functionals from first principles (e.g. Thomas-Fermi approximation), they usually don’t do particularly well because they aren’t valid across the full range of densities observed in actual atoms). Note well, doing the precise computation is not an option. We cannot solve the many body atomic state problem in quantum theory exactly any more than we can solve the many body problem exactly in classical theory or the set of open, nonlinear, coupled, damped, driven chaotic Navier-Stokes equations in a non-inertial reference frame that represent the climate system.Note well that solving for the exact, fully correlated nonlinear many electron wavefunction of the humble carbon atom — or the far more complex Uranium atom — is

trivially simple(in computational terms) compared to the climate problem. We can’t compute either one, but we can come a damn sight closer to consistently approximating the solution to the former compared to the latter.So, should we take the

meanof the ensemble of “physics based” models for the quantum electronic structure of atomic carbon and treat it as thebest predictionof carbon’s quantum structure? Only if we are very stupid or insane or want to sell something. If you read what I said carefully (and you may not have — eyes tend to glaze over when one reviews a year or so of graduate quantum theory applied to electronics in a few paragraphs, even though I left out perturbation theory, Feynman diagrams, and ever so much more:-) you will note that I cheated — I run in asemi-empiricalmethod.Which of these is going to be the winner? LDF, of course. Why? Because the

parameters are adjusted to give the best fit to the actual empirical spectrum of Carbon. All of the others are going to underestimate the correlation hole, and their errors will besystematically deviantfrom the correct spectrum. Their mean will be systematically deviant, and by weighting Hartree (the dumbest reasonable “physics based approach”) the same as LDF in the “ensemble” average, you guarantee that the error in this “mean” will besignificant.Suppose one did not know (as, at one time, we

did not know) which of the models gave the best result. Suppose that nobody had actually measured the spectrum of Carbon, so its empirical quantum structure was unknown. Would the ensemble mean be reasonable then? Of course not. I presented the models in the wayphysics itselfpredicts improvement — addingbackdetails that ought to be important that are omitted in Hartree. One cannot be certain that adding back these details will actually improve things, by the way, because it is always possible that the corrections arenotmonotonic (and eventually, at higher orders in perturbation theory, they most certainly are not!) Still, nobody would pretend that the average of a theory with an improved theory is “likely” to be better than the improved theory itself, because that would make no sense. Nor would anyone claim that diagrammatic perturbation theory results (for which there is a cleara prioriderived justification) are necessarily going to beat semi-heuristic methods like LDF because in fact they often do not.What one would do in the real world is

measure the spectrum of Carbon, compare it to the predictions of the models, andNot the other way around. And since none of the winners is going to bethenhand out the ribbons to the winners!exact— indeed, for decades and decades of work, none of the winners was even particularlycloseto observed/measured spectra in spite of using supercomputers (admittedly, supercomputers that were slower than your cell phone is today) to do the computations — one would then return to the drawing board and code entry console to try to do better.Can we apply this sort of thoughtful reasoning the spaghetti snarl of GCMs and their highly divergent results? You bet we can! First of all, we could stop pretending that “ensemble” mean and variance have any meaning whatsoever by

not computing them. Why compute a number that has no meaning? Second, we could take theactual climate recordfrom some “epoch starting point” — one that does not matter in the long run, and we’llhaveto continue the comparison for the long run because in any short run from any starting point noise of a variety of sorts will obscure systematic errors — and we can just compare reality to the models. We can then sort out the models by putting (say) all but the top five or so into a “failed” bin andstop including them in any sort of analysis or policy decisioning whatsoeverunless or until they start to actuallyagreewith reality.Then

realscientists might contemplate sitting down with those five winners and meditate upon what makes them winners — what makes them come out the closest to reality — and see if they could figure out ways of making them work even better. For example, if they are egregiously high and diverging from the empirical data, one might consider adding previously omitted physics, semi-empirical or heuristic corrections, or adjusting input parameters to improve the fit.Then comes the hard part. Waiting. The climate is

notas simple as a Carbon atom. The latter’s spectrum never changes, it is a fixed target. The former is never the same. Either one’sdynamical modelis never the same and mirrors the variation of reality or one has to conclude that the problem is unsolved and the implementation of the physics iswrong, however “well-known” that physics is. So one has to wait and see if one’s model, adjusted and improved to better fit the past up to the present, actually has any predictive value.Worst of all, one cannot

easilyuse statistics to determine when or if one’s predictions are failing, because damn, climate is nonlinear, non-Markovian, chaotic, and is apparently influenced in nontrivial ways by a world-sized bucket of competing, occasionally cancelling, poorly understood factors. Soot. Aerosols. GHGs. Clouds. Ice. Decadal oscillations. Defects spun off from the chaotic process that cause global, persistent changes in atmospheric circulation on alocalbasis (e.g. blocking highs that sit out on the Atlantic for half a year) that have a huge impact on annual or monthly temperatures and rainfall and so on. Orbital factors. Solar factors. Changes in the composition of the troposphere, the stratosphere, the thermosphere. Volcanoes. Land use changes. Algae blooms.And somewhere, that damn butterfly. Somebody needs to squash the damn thing, because trying to ensemble average a

small samplefrom achaoticsystem is so stupid that I cannot begin to describe it. Everything works just fine as long as you average over an interval short enough that you are bound to a given attractor, oscillating away, things look predictable and then — damn, you change attractors.Everything changes!All the precious parameters you empirically tuned to balance out this and that for the old attractor suddenly requirenew values to work.This is why it is actually wrong-headed to

acquiescein the notion that any sort of p-value or Rsquared derived from an AR5 mean has any meaning. It gives up the high ground (even though one is using it for a good purpose, trying to argue that this “ensemble” fails elementary statistical tests. But statistical testing is a shaky enough theory as it is, open to data dredging and horrendous error alike, and that’s when itreally isgoverned by underlying IID processes (see “Green Jelly Beans Cause Acne”). One cannot naively apply a criterion like rejection if p < 0.05, and all that means under the best of circumstances is that the current observations are improbable given the null hypothesis at 19 to 1. People win and lose bets at this level all the time. One time in 20, in fact. We make a lot of bets!So I would recommend — modestly — that skeptics try very hard not to buy into this and redirect all such discussions to questions such as why the models are in such terrible disagreement with

each other, even when applied to identicaltoy problemsthat are far simpler than the actual Earth, and why we aren’t using empirical evidence (as it accumulates) toreject failing modelsand concentrate on the ones that come closest to working, while alsonot using the models that are obviously not workingin any sort of “average” claim for future warming. Maybe they could hire themselves a Bayesian or two and get them to recompute the AR curves, I dunno.It would take me, in my comparative ignorance, around five minutes to throw out all but the best 10% of the GCMs (which are still diverging from the empirical data, but arguably are well within the expected fluctuation range on the DATA side), sort the remainder into top-half models that should probably be kept around and possibly improved, and bottom half models whose continued use I would

defundas a waste of time. That wouldn’t make them actually disappear, of course, only mothball them. If the future climate ever magically popped back up to agree with them, it is a matter of a few seconds to retrieve them from the archives and put them back into use.Of course if one does this, the GCM predicted climate sensitivity plunges from the totally statistically fraudulent 2.5 C/century to a far more plausible and

stillpossibly wrong ~1 C/century, which — surprise — more or less continues the post-LIA warming trend with a small possible anthropogenic contribution. This large a change would bring out pitchforks and torches as people realize just how badly they’ve been used by a small group of scientists and politicians, how much they are the victims ofindefensibleabuse of statistics to average in the terrible with the merely poor as if they are all equally likely to be true with randomly distributed differences.rgb

Tags: climate models, Physics, Statistics