kovacm
Čuven
- Učlanjen(a)
- 28.01.2005
- Poruke
- 8,607
- Poena
- 870
cemu to sluzi ? Oce moci igre napokon da se igraju na OSX-u sa PC hardwerom ? (ozbiljno pitam, cemu ovo sluzi ?)
Cider sluzi za to
cemu to sluzi ? Oce moci igre napokon da se igraju na OSX-u sa PC hardwerom ? (ozbiljno pitam, cemu ovo sluzi ?)
33% brzi kod od ionako sporog GCC-a i to je sve. Nigde vektorizacije, nigde paralelizacije -- jednom recju i dalje rudimentarno i retardirano u odnosu na ICL.
Nije Apple napravio AliteVec SIMD, nego Motorola.![]()
Usput, SIMD je postojao mnogo pre nego sto se pojavio u mikroprocesorima na desktopu.
heise.de je napisao(la):What's unique (so far) about Larrabee is that it's entirely made up of x86 processing cores. The Larrabee is likely to have 32 x86 processing cores. Here's a surprise: These processing cores are based on the design of Pentuim P54C, a 13+ year old x86 processor. This processor will be miniaturised to the 45nm fabrication process, they will be assisted by a 512-bit SIMD unit and these cores will support 64-bit address. Gelsinger says that 32 of these cores clocked at 2.00 GHz could belt out 2 TFLOPs of raw computational power. That's close to that of the upcoming AMD R700. Heise also reports that this GPU could have a TDP of as much as 300W (peak).
jel ovde lepo pise http://developer.apple.com/hardwaredrivers/ve/index.html (uradi search) o auto vektorizaciji... ?
Optimize for SSE!
Vector programming doesn't stop with PowerPC. Learn how to do SIMD vector programming for MacOS X for Intel.
void scale(float *array, int count, float factor)
{
for (int i = 0; i < count; i++) {
array[i] *= factor;
}
}
mov eax, dword ptr [array]
mov ecx, dword ptr [count]
scale:
movss xmm0, dword ptr [eax]
mulss xmm0, dword ptr [factor]
movss dword ptr [eax], xmm0
add eax, 4
sub ecx, 1
jnz scale
mov eax, dword ptr [array]
mov ecx, dword ptr [count]
movss xmm1, dword ptr [factor]
shufps xmm1, xmm1, 0
mov edx, ecx
shr ecx, 2 ; count / 4
and edx, 3 ; ostatak
scale:
test ecx, ecx
jz scale_tail
movaps xmm0, dword ptr [eax]
mulps xmm0, dword ptr [factor] ; mnozi 4 elementa odjednom
movaps dword ptr [eax], xmm0
add eax, 16
sub ecx, 1
jmp scale
scale_tail:
test edx, edx
jz scale_end
movss xmm0, dword ptr [eax]
mulss xmm0, dword ptr [factor]
movss dword ptr [eax], xmm0
add eax, 4
sub edx, 1
jmp scale_tail
scale_end:
...
void scale(float *array, int count, float factor)
{
for (int i = 0; i < (count & -4); i += 4) {
array[i + 0] *= factor; // ovo
array[i + 1] *= factor; // se
array[i + 2] *= factor; // izvrsava
array[i + 3] *= factor; // paralelno
}
for (int i = (count & -4); i < count; i++) { // ovo je "rep"
array[i] *= factor; // za slucaj da count nije deljiv sa 4 bez ostatka
}
}
sta ces tu sa sse da radis
Dobrodoso nazad AUDIO![]()
A feature introduced in GCC 4.0.1 is the ability to automatically generate AltiVec (Velocity Engine) or SSE instructions for some types of scalar code.
for (int i = 0; i < N; i++) {
a[i] = i;
}
bravo, napravili su (tj. napravice :d) ono sto je IBM napravio 2005. u 2009. (ista prica kao i ona sa AltiVec i SSE...)
a to sto se skalira linearno, znaci gledacemo 2010. realtime raytracing (kao onaj na CB-u) sa 20 Larrabee procesora :d
i niko ne rece da li je Larrabee-u potreban i klasican x86 CPU... ??
Podržava x64.
Za SSE je već pitanje da li će vektorska jedinica koristiti neku ekstenziju SSEa ili neki potpuno novi instruction set (po meni ono prvo ima više logike).
Follow along with the video below to see how to install our site as a web app on your home screen.
Napomena: this_feature_currently_requires_accessing_site_using_safari