When you compute flops you do

Clock Speed X Core count X 2

Think of the RX 480 being like a 10 core processor running at 200mhz thats 2,000mhz total. Where the 1060 can run like a 5 core processor at 300mhz for 1500 total.

The GTX 1060 has 1280 cores if we assume 1750mhz we get 4,480,000 megaflops
The RX 480 has 2304 cores if we assume 1250mhz we get 5,760,000 megaflops

Now flops are not everything. Nvidia wins in geometry tasks AMD wins in compute.

AMD made huge improvements in geometry on GCN 1.2 even when the cards had less compute than the 1.0 cards and 1.1 cards they performed better in games due to being closer to nvidia in geometry performance.

GCN 4 (prob gunna be called 1.3 or 2.0) has huge improvements to the command processor and we saw benchmarks showing the 470 beating the 290 in games like Dota, Overwatch, League of legends, and we say the RX 480 dominate in Fallout 4.

It seems like GCN 1.0 was raw power, 1.1 added some features + power efficiency and 1.2 worked on memory compresssion & geometry improvements.

Polaris seems to have reduced CPU bottlenecks. However there was a tradeoff. The new command scheduler took up die space so AMD gave up 4 ace engines which hurts its Async compute a little bit, but helps the rest of the card. I feel this was a great tradeoff for 2 reasons.

  1. If async is less of an adavantage for AMD nvidia won't block it

  2. AMD had too much CPU overhead on DX9 games & many DX11 games.

So even if we assumed the archetectures were the same (gaming is kinda hard to pin down to a few things) higher core count + lower frequency can be better.

If you compare similar architectures like the 1060 vs the 1070 u can do the formula I stated above and calculate performance of each card to about 3-5% margin of error (due to memory tasks not scaling perfectly with the rest of the cores)

Awesome explanation. Learning something new everyday.

As a general rule of thumb clock speed is a good indicator of performance within the same architecture only.

What you are confusing is the vastly differently architectures used by nvidia and amd. Since they each go about their job in a different way you cannot directly compare the speed at which they perform it. Even within the same vendor (amd or nvidia), but different generations, performance cannot be directly compared from a mhz or clock speed perspective alone.

The reason nvidia cards achieve higher clock speeds is in large part because they have a simpler architecture that does not have hardware set aside for asynchronous compute tasks. This allows them to scale the frequency up much higher and somewhat brute force processing.

Conversely, AMD's more dense architecture that has additional compute hardware can do more work per clock (albeit it at a slower clock rate). But this more complex hardware comes at a cost in terms of heat, power, frequency and complexity (DX 12 async compute is really required to fully utilise the hardware that has been set aside for this purpose).

This is a very high level answer that doesn't take into consideration differences in memory interfaces, bus speed / bandwidth, ROPS or the number of compute units and shaders. All of which effect the overall performance.

it varies per gpu and cpu

each gpu has a specific IPC or instructions per cycle (basically how much data it can process per clock cycle)

MHz is a rating of how many clock cycles it can do per unit of time

while the Nvidia card may run at a faster clock. it is doing less work per clock cycle

where the AMD card is doing more

that's the short oversimplification of it

But the real question is, Can it maintain boost clock at long durations

you can see that with rx480 when you undervolt a little it will keep its boost clock for longer