problém je že u BD nejde o SMT, takže ta funkce SMT parking tomu zase tolik nepomůže. Ten hlavní důvod této funkce je ten, že virtuální jádra nejsou rovnocená s fyzickými jádry a tato funkce má na starosti jen to, aby se postupně nejdříve využila všechna fyzická jádra a teprve potom ta virtuální. (jinak by to ten win scheduler přehazoval střídavě z fyzických jader na virtuální i když by to nebylo potřeba a zbývaly by nevyužité fyzické jádra)
U BD jsou ale obě jádra fyzická, a jsou rovnocená , jak výkonem tak funkcemi. To je prostě něco jiného IMHO.
Yuri: nevrdím, že je všechno jen počtem ALU/AGU, je to jen část problému, nicméně to že to jedno jadýrko je slabší/menší než klasické jádro je zřejmé, ty cache jsou také problém, stejně jako nesymetrická FPU nebo 16B Fetch pro jádro/32B na modul. Je toho víc co by měli pilovat...
Bulldozer is characterized by high latencies of everything - caches, memory, instruction execution cycles, pipeline length. This goes contrary to what IBM has done with Power 7 and Intel with Sandy Bridge; both winning designs are optimized for latencies. In case of Intel even addtitional structures like L0 cache are used for this purpose. In fact past AMD designs, like K6 were fast at integer processing thanks to their low latencies.
But a high-latency design might still be a winner, if it has a lot of per-core SMT threads to cover these latencies. Like Sparc T4. So AMD allergy to traditional SMT cost them a bad decision of not utilizing it on a per-core basis.
This way they have a processor that is not good at anything. It is too slow at single threaded performace for desktops. Especially gamers may suffer from bad minimum framerates, which means choppy experienc. It's too hot for laptops (only next generation is supposed to have power improvements). Has too little threads to compete with throughput processors and Power 7 on number of users served.
I guess L1 cache is the worst part of Bulldozer. It is small, and write through, that is slow at writes. There's a mention of next generation cores "10 to 15 percent speed up [...] one-third will come from IPC improvements like structure size increases". The L1 cache seems the most likely candidate for such increase.Bulldozer is characterized by high latencies of everything - caches, memory, instruction execution cycles, pipeline length. This goes contrary to what IBM has done with Power 7 and Intel with Sandy Bridge; both winning designs are optimized for latencies. In case of Intel even addtitional structures like L0 cache are used for this purpose. In fact past AMD designs, like K6 were fast at integer processing thanks to their low latencies.
But a high-latency design might still be a winner, if it has a lot of per-core SMT threads to cover these latencies. Like Sparc T4. So AMD allergy to traditional SMT cost them a bad decision of not utilizing it on a per-core basis.
This way they have a processor that is not good at anything. It is too slow at single threaded performace for desktops. Especially gamers may suffer from bad minimum framerates, which means choppy experienc. It's too hot for laptops (only next generation is supposed to have power improvements). Has too little threads to compete with throughput processors and Power 7 on number of users served.
I guess L1 cache is the worst part of Bulldozer. It is small, and write through, that is slow at writes. There's a mention of next generation cores "10 to 15 percent speed up [...] one-third will come from IPC improvements like structure size increases". The L1 cache seems the most likely candidate for such increase.