Q: Speaking about better library designs, do you think it would benefit you to move to AMD's tools/libraries/processes later on?
A: I won't comment on future projects, but that is now a real possibility.
Q: ...the merger with AMD presented situations that might have in some ways impacted the roll out of R600...
A: When we heard about it last summer, the chip was pretty much finished and on its way to manufacturing, or even back. ... We do interact with a lot of new engineers now ... From a time to launch, marketing, PR standpoint -- It's possible those were affected...
Q: Is there a difference in those considerations when moving from say, 90->80nm or 90->65nm?
A: Well, all physical aspects depend on the technology. This includes the elements that make up ASICs (stdcell, memories, macros, etc...). We need to re-characterize and update all libraries and models associated with those. This can mean new synthesis, new P&R rules, new macro designs, etc... This affects everything from the netlist on down, but it can even have ramifications up through the architecture, as you might need to change some functionality, re-pipeline, change your memory usage, for example. It's a lot of work. - 1/2 nodes are a little easier, as most things can be simply optically shrunk -- You have to design with that in mind, but it can be easier. Famous last words.
Q: Do they have a group within ATI strictly working on future technologies?
A: We sort of did at the time we started R600, but we clearly have a group for that now.
Q: Which team at ATI creates the requirements and how soon in the development process are they created? 1 year? 2 years? 3 years?!?
A: A new architecture like R6xx took over 3 years of design work. 36 months for a new design is probably a minimum, at this time, considering the shear number of transistors (and associated design) involved. Follow on products, typically follow within 6 months or so, and don't need a huge amount of redesign. In the last year of design, very little change occurs -- Configs, features, etc... are all locked down. Only clock can change a little afterwards.
Q: A13 revision was fabricated in January, but released in May...
A: Most of the work was on the drivers -- 2 OS's, new arch, DX9, DX10 and OGL. Work is still ongoing. I'm not sure on the silicon dates, off hand. But at some point, it just made more sense to launch all at the same time.
Q: So, can we conclude, that resolving AA samples in INT type (non-HDR type) of surfaces is actually under-utilizing overall throughput capacity of moving data, inside the chip?
A: In a way, yes. But it's not simple -- Some busses are under utilized, but that doesn't mean you could easily go faster. For example, we only filter in float; but having int doesn't make it go faster, since the number of units doesn't change (we actually promote most things to float).
Q: I saw that in tech specs for HD2900 there is listed "bicubic fitlering"...
A: Bicubic is supported, and was intended for some OS font features (I think). It's not exposed at this time. I don't believe that DX allows it to be exposed. It does not operate at the same rate as bilinear. Might show up in OGL.
Q: AF drop is very bad...
A: The AF hit can be more significant on R600 at this time. It should show higher quality in exchange (not on the LOD isotropy front, but on the mipmap transition front). I believe some of that will be addressed, in part, in future drivers. I can't promise a % improvement at this time.
Q: why does the MSAA algortihm cost so much frames? The memory interface is much wider than R580, the compression algoritm are better and the ROPs can do more color/z compares than R580.
A: The actual number of units working on AA wrt to R580 isn't substantially different - thought the new ones are focused on 64b pixels. There are cases where moving the resolve to the shader has hurt performance -- That's typically where the frame rate is very high (i.e. > 100fps). Those are cases we were aware of and only really show up in benchmarks, not so much in real game play. It sort of caps the max fps to 200~300 or so, depending on resolution. Again, not something we felt would seriously impact real game play. It can also be mitigated by selecting the proper AA modes.
As for workloads in various apps, yes, it can be improved in some cases. Without specifics, it's hard to say. Overall, I expect the *scaling* of AA to be similar to slightly worst than R580, for "regular" pixels, but still to be higher than R580 in absolute terms.
But anytime 64b pixels will be invoked (i.e. HDR games), I expect performance of R600 to be substantially higher than R580.
Q: However, you can't deny that R5xx/RV5xx are a perfect example of what you were talking about with features taking die space and going unused.
A: Nope, it's generally simply being late to the game. By then, the channel is full and the OEMs have their deals. Getting deals is simply not possible (most OEMs for mainstream and below) or very difficult (above mainstream, you need to show a huge delta). It's not a competition when you are late. R5xx was fine and price/performance was quite excellent. It was, at least, 4 months late.
Q: The implication is that R600's design and hardware attributes are fantastic and the drivers are poo-pooing all over performance.
A: I did not mean to imply that. It's a good and reasonable design. The drivers are very stable and deliver the expected performance in many cases. There are certainly quite a few left to address. If someone is expecting a 2x boost in all apps, well, I'm sorry, but it won't happen. There's a couple of places where that kind of delta is possible, and a lot where 10 ~30% is possible too. But we are shipping a good product at an awesome price. From a functional aspect, there's more to do too (such as more custom filters, fixing some of the video stuff, etc...)
Q: Where has R6xx demonstrated that its ALU:TEX ratios are more optimal than the competition's?
A: In yesterday's games, and even a lot of today's game, the ALU:TEX ratio selected is probably overkill. But when I look at some of the newest games, or some coming down the pipe, or even some of our demoes, a ratio of 4:1 seems pale -- Shaders are coming with 10:1, 20:1 ratios -- Even with the most advance filters, those apps are still ALU bound. That is more where we shot for.
Q: Was there ever a set of simulations run comparing R600 + R580 RBEs vs the final design, and if so what were the interesting datapoints that came out?
A: Yes, there were hundreds. Yes, a few things came out in the final design, but, in general, it was as expected. -- Resolve was moved into the shader, so that can't really get compared.
