RealWorldTech - PhysX87: Software Deficiency
Zajímavé jsou zejména poslední dvě kapitoly článku...
Podle tohoto článku by implementace podpory SSEx instrukcí a multi-threadingu mohla zvýšit výkon CPU PhysX 2x až 3x na dnešních CPU (Nehalem/Westmere) a dokonce se vyrovnat GPU PhysX. To víceméně souhlasí se závěry, které uvádí JeGX....The truth is that there is no technical reason for PhysX to be using x87 code. PhysX uses x87 because Ageia and now Nvidia want it that way. Nvidia already has PhysX running on consoles using the AltiVec extensions for PPC, which are very similar to SSE. It would probably take about a day or two to get PhysX to emit modern packed SSE2 code, and several weeks for compatibility testing. In fact for backwards compatibility, PhysX could select at install time whether to use an SSE2 version or an x87 version – just in case the elusive gamer with a Pentium Overdrive decides to try it...
...Realistically, Nvidia could use packed, single precision SSE for PhysX, if they wanted to take advantage of the CPU. Each instruction would execute up to 4 SIMD operations per cycle, rather than just one scalar operation. In theory, this could quadruple the performance of PhysX on a CPU, but the reality is that the gains are probably in the neighborhood of 2X on the current Nehalem and Westmere generation of CPUs. That is still a hefty boost and could easily move some games from the unplayable <24 FPS zone to >30 FPS territory when using CPU based PhysX...
...Not only would this physics solver comparison reveal the differences due to x87 vs. vectorized SSE, it would also show the impact of multi-threading. A review at the Tech Report already demonstrated that in some cases (e.g. Sacred II), PhysX will only use one of several available cores in a multi-core processor. Nvidia has clarified that CPU PhysX is by default single threaded and multi-threading is left to the developer. Nvidia has demonstrated that PhysX can be multi-threaded using CUDA on top of their GPUs. Clearly, with the proper coding and infrastructure, PhysX could take advantage of several cores in a modern CPU. For example, Westmere sports 6 cores, and using two cores for physics could easily yield a 2X performance gain. Combined with the benefits of vectorized SSE over x87, it is easy to see how a proper multi-core implementation using 2-3 cores could match the gains of PhysX on a GPU...


