"Notice that HT-assist is a performance killer in 2P configurations: you remove two times 1 MB of L3-cache, which is a bad idea with 8 VM’s hitting your two CPUs."
BIOS guidance suggests that HT Assist be disabled by default on 2P systems, and enabled only for specialized workloads. So that begs the question: Were vAPUS tests performed with or without HT Assist in the 2P configuration? It was not clear.
I assume AMD-V and RVI were enabled for ALL workloads in ESX 3.5 and 4.0 (forced for 32-bit workloads.) Is this accurate? Based on the number of ESX 3.5 installations out there, this probably should be clearly stated...
I do want to take issue with your memory sizing and estimates on vCPU loading. Let me put it this way: while Nehalem-EP has better memory bandwidth and SMT threads, Opteron has access to abundant memory. Therefore, it does not make sense - for example - to be OK with enabling SMT but then constrain the benchmark to 24GB due to a Xeon memory limitation.
I would urge you to look at 48GB configurations on Xeon and Istanbul for your comparison systems. By the way, in consolidation numbers, this makes a significant reduction in $/VM with only a minor increase in per-system CAPEX.
Another interesting issue you touched on is tuning and load balance. Great job here. These are "black magic" issues that - as you noted - can have serious effects on virtualization performance (ok, scheduling efficiency.) Knowing your platform's balance point(s) is critical to performance sensitive apps but not so critical for light-load virtualization (i.e. not performance sensitive.)
It sounds like your learning - through experimentation with vAPUS - that virtualization testing does not predict similar results from "similarly configured machines" where performance testing is concerned. In fact, the "right balance" of VM's, memory and vCPU/CPU loading for one system may be on the wrong side of the inflection point for another.
All and all, a very good article.