Nvidia 2080 has better traditional performance than FLOPS figure tells us.
Table of Contents
- 1. Nvidia is hyped about its new RTX tech.
- 2. Nvidia wants to hype RTX to Developers.
- 3. What performance increase are available without code changes by developers.
- 4. Old TFlops vs new TFlops and Tips
- 5. Why new name?
- 6. What benefit we can expect from this.
- 7. Why this change.
- 8. What this (can) mean for miners vs gamers.
- 9. Cost of GPU DIE related to Area.
- 10. About buying this card.
1. Nvidia is hyped about its new RTX tech.
It is clear to me that Nvidia has just released its long time dream. This means Nvidia is really excited about it and wants to show it to everyone. It also meant that increase in traditional architecture way got side tracked in their presentation, simply because the other thing was so dominating in their own minds.
2. Nvidia wants to hype RTX to Developers.
Nvidia has brought lots of features for developers to use, by concentrating on them instead of traditional performance Nvidia hopes to get them to implement RTX technology in the new games.
3. What performance increase are available without code changes by developers.
40% increase in Memory bandwidth.
TFLOPS->TIPS+TFLOPS 30-60% improvement in computational resources.
4. Old TFlops vs new TFlops and Tips
Traditionally shader code has been run everything in single pipeline, that pipeline handles floating point operations and integer operations. In 20 series nVidia said those operations are split to two different pipelines where one handles floating point and one handles integer operations in the shaders. It simply means going 1 instruction to 2 instructions per cycle, per cuda core. Where one instruction is 2FLOPs and another is integer op. And getting that parallerism doesn't require any additional work from developers. As Multiply accumulate is TWO flops, a single floating point instruction does two operations per cycle.
Technically Nvidias saying about architecture can be intrepreted 3 ways in which one is most precise and one would mean clearly missleading, and third which is between those. Floating point pipeline is capable of handling TWO flops per cycle via FMAC instruction, so previously single integer operatian took TWO FLOPS worth of performance away, but single integer instruction most likely isn't two operations, which makes some Nvidia performance claims different.
PIPELINES | TFLOPS | TIPS=FLOPS | TIPS+TFLOPS=2*TFLOPS | likelihood |
---|---|---|---|---|
1*Float+1*Int | true | false | false | high |
1*(Float+Int)+1*Int | true | true | false | high |
Float+2*Int | true | true | true | low |
F=FLoating point pipeline
I=Integer pipeline
Likelihood of last interpretation I consider unlikely simply because is has unnecessary hardware costs without similar performance increase coming with it.
Performance vice each executed integer operation previosly has taken a slot that could execute two flops, so in performance vice compared to previous generation for shaders even the 1F+1I should be considered reasonable, in the context of how ALL GPU manufacturers have given their numbers previosly.
5. Why new name?
Flop has quite specific meaning, it means floating point operation, the new pipeline is not capable of handling floating point. So calling it that would definitely be in the missleading marketing material and a basis for a lawsuit.
6. What benefit we can expect from this.
The integer operations are used in memory address calculations and in control flow operations so they are used in floating point programs also.
Nvidia said 50% performance increase and it is right around my expectation, while theoretically you have twice the number of instructions there are never perfect 50/50 divide which would mean 100% improvement, so one of the pipelines is always not going to be fully utilized and it depends on the balance of instruction between pipelines.
If Floating point pipeline has removed its capability of handling simple integer operations it means that ONLY floating point programs see this performance increase since the other pipeline is idle in Integer only programs such as calculating hashes. I do expect that Integer multiplies are still handled in floating point pipeline, while rest of integer operations are handled in integer pipeline.
If hash function has a multiplication, we can expect performance increase. So this can look like nVidia trying to make a gaming card instead of Mining card. Of course if Floating point pipeline is also capable of handling integer operations then it is definitely superior in Mining.
7. Why this change.
32 bit integer adder is about 700 transistors. 32bit multiplier is about size of 32 hardware adders. This simply means that cost of adding lot of multipliers is higher than adding lot of simple ALU:s. The overall cost difference isn't this high, since moving data to said ALU:s and multipliers also have significant costs associated with it but it is clear that nVidia thought benefits outweighting the costs of adding this.
This makes me think, that Nvidia has single pipeline capable of all multiplies and floating point multiplies, and single pipeline of integer instructions. Only really unknown for me is that did they kept a simple integer alu inside floating point pipeline also. Another potential question is do the pipelines have different latency, as having separate pipelines for integer allows to reduce their latencies to speed up dependent operations inside single shader.
8. What this (can) mean for miners vs gamers.
The old ethereum algorithm used only TIPS instructions, and not multiplication instructions Except ONE. Here are few things, it all depends what float pipeline is capable of handling, and is the algorithm compute vs memory limited on GPU. If float pipeline isn't capable for simple integer operations it looks like Nvidia tried to improve gaming performance without giving similar increase in mining performance. On the other hand if both pipelines are capable of Integer operations then it has twice the computation capability of previous generation for mining, while having 40% higher memory bandwidth.
9. Cost of GPU DIE related to Area.
\[ DiesPerWafer=\Pi*(\frac{diam}{2})^2/DieArea - \Pi*diam\sqrt{2*DieArea}-testDies \] \[ dieYield=WaferYield*(1+\frac{defectDensity*DieArea}{\alpha})^{-\alpha} \]
DieCost is there fore roughly (Die area)4
Saving partially broken dies has reduced the costs below this, but this should tell you that doubling the area more than double the cost.
10. About buying this card.
As soon as masses realize the real performance of the new architecture I do expect them to be sold out, and prices going up. The prices are high, but it comes with both performance increase and cost increase. The 50% increase in legacy performance, and AI support and Raytracing support makes these over 50% more valuable than previous generation cards of same tier.
These should be great cards for people who do raytracing outside games, since the software vendors need to add support for RTX to remain competitive against their competitors. And I think hollywood will snap all the cards they can get, and architects and other who do similar visualizations need these cards also. So demand is going to be quite high, even with high prices since for these people slower cards can cost them thousands of dollars in lost productivity.
Then again these cards are expensive and out of many peoples price range. At same time they clearly give good value related to prices of 10 series cards. Personally cannot afford one, while would develope my programs to run on one.
Another issue is that 7nm process alone should give substantial performance increase, when it comes. However for cost reduction, the increased defect density of new processes and increased wafer costs would mean that until defect density has been reduced low enough these cards would cost even more in that process.