cogman10 2 hours ago

What an annoying article to read. "The AI workload of AI in a digital AI world that the AI world AI when it AIs. Also the bandwidth is higher. AaaaaaaaaIiiiiiiiiiii".

90% of the article is just finding new ways to integrate "AI" into a purely fluff sentence.

  • cogman10 an hour ago

    Ok, I should be fair, it's 4 paragraphs of fluff, 6 paragraphs of specs, then a fluff conclusion. It's almost like 2 different unrelated articles smashed into 1.

    Still makes for an annoying read.

    • ep103 an hour ago

      Sounds like the sorta thing AI would write

  • Cthulhu_ an hour ago

    AI only appears 7 times in the article's 11 paragraphs though. I mean I'm sure it's fluffed out, I glazed over and lost interest, but still.

  • rsynnott an hour ago

    > 90% of the article is just finding new ways to integrate "AI" into a purely fluff sentence.

    I mean, to be fair, that’s half the industry right now. Hard to blame them all that much.

    • skyyler an hour ago

      >that’s half the industry right now

      Isn't that a bad thing?

Retr0id an hour ago

PAM3 is 3 levels per unit interval (~1.58 bits), not 3 bits per cycle as reported in this article. Although I suppose if you count a cycle as both edges of the clock it's 3.17 bits.

alberth 2 hours ago

Is I/O starvation the bottleneck with GPUs?

I didn't think it was.

  • hmottestad 2 hours ago

    Memory bandwidth is the bottleneck for LLM inference. That's my understanding at least.

    • littlestymaar an hour ago

      Isn't it only the case when inference isn't batched?

      • Tostino an hour ago

        Even in a local setting, batched inference is useful to be able to run more "complex" workflows (with multiple, parallel LLM calls for a single interaction).

        There is very little reason to optimize for just single stream inference at the expense of your batch inference performance.

    • moffkalast 2 hours ago

      That's correct, but also compute to some degree. The larger the model the more of a bottleneck memory becomes.

      There are some older HBM cards with very high bandwidth like the Radeon Pro VII which has 1TB/s of bandwidth like the RTX 3090 and 4090, but is notably slower at inference for smaller models since has less compute in comparison. At least I think that was the consensus of some benchmarks people ran.

  • mmoskal 2 hours ago

    With a typical transformer and a GPU the batch size that saturates the compute is at least hundreds. Otherwise (including typical size of 1 for local inference) you're memory bound.

  • corysama an hour ago

    GPUs have much more memory bandwidth than CPUs. Meanwhile, the ALU:bandwidth ratio of both GPUs and CPUs has been growing exponentially since the 90s at least. So, the FLOPs per byte required to not be starved on memory is really large at this point. We’re at a point that optimization is 90% about SRAM utilization and you worry about the math maybe at the last step.

  • alwayslikethis 2 hours ago

    For inference, it often is. Though for most consumer parts the bigger concern is not having enough VRAM rather than the VRAM not being fast enough. Copying from system RAM to VRAM is far slower.

ilaksh an hour ago

So it's almost twice the performance? That's great. But AI could actually easily use 10 times.

Anyone heard anything about memristors being in a real large scale memory/compute product?

vdfs 2 hours ago

Not even a mention of Blockchain

  • trollbridge 2 hours ago

    Can’t attract VC money with it anymore

    • Tostino 2 hours ago

      That's good IMO. So much money wasted over the past decade.

      • mnky9800n an hour ago

        [flagged]

        • Tostino an hour ago

          I wasn't even talking about the people "investing" in crypto. Just the VC / business side of things.

          Just a massive waste, on people who had just about no plan going in other than "disrupt the status quo" and "decentralized".

blackoil 2 hours ago

If 5090 comes with 32GB of this RAM. That should be substantial boost over 4090!! Hope that isn't reflected in the price.

  • formerly_proven 7 minutes ago

    > Hope that isn't reflected in the price.

    lmao

    AMD has even officially announced at this point that they will not compete on high-end consumer and workstation GPUs for years to come. Intel can’t (Gaudi is not general-purpose, so too limited appeal for that market).

  • moffkalast 2 hours ago

    Nvidia: You're getting 2GB of VRAM and you're gonna act like you like it!

hmottestad 2 hours ago

"With this new encoding scheme, GDDR7 can transmit “3 bits of information” per cycle, resulting in a 50% increase in data transmission compared to GDDR6 at the same clock speed."

Sounds pretty awesome. I would think that it's going to be much hard to achieve the same clock speeds.

  • inportb an hour ago

    If it could really do that, then it wouldn't be DDR, right?

    • formerly_proven 6 minutes ago

      DDR just says symbols are centered on (both) edges, doesn’t say what the symbols are.

sva_ an hour ago

Trying to figure out how this compares to HBM3/e

octocop an hour ago

What does "48 Gigatransfers per second (GT/s)" mean?

  • smolder 42 minutes ago

    It reflects the data rate. Since DDR memory transfers data on both the up and down part of the clock signal, DDR RAM on a 3000Mhz clock signal is said to make 6000 Megatransfers per second, in normal usage. 48 GT/s would imply a 24Ghz clock if it were normal DDR, which seems absurd.

    Edit: It seems GDDR6 is in reality "quad data rate" memory, and GDDR7 packs even more bits in per clock using PAM3 signaling, so if I'm reading this right maybe they're saying the chips can run at up to 8Ghz base clock? 8ghz * 6 bits per cycle * 32 bit bus / 8 bits per byte = 192GB/s.

    Edit again: It seems I undercounted the number of bits/pin per cycle of base clock and it's more like 12 (so 4ghz max base clock) or more, which puts seems a lot more reasonable.

grahamj 2 hours ago

Well, yeah

Any bets on when it gets renamed AIDDR? Only partly joking

  • the-rc an hour ago

    More like NeuralRAM? We have precedents. Back in the 90s, Sun and Mitsubishi came up with 3DRAM, which replaced the RMW cycle in Z-buffering and alpha blending with a single (conditional) write, moving the arithmetic into the memory chips.

  • burnte an hour ago

    DDR with Copilot!