Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request
I'm looking to build a new multi-gpu 3090 workstation for deep learning. My original idea was to go with Threadripper 3960x and 4x Titan RTX, but 1) NVidia released RTX 3090, and 2) I stumbled upon this ASRock motherboard with 7 PCIe 4.0 x16 slots. I'm reasonably comfortable building PCs and DIY, but server stuff is a bit new and I'm worried I'm missing something obvious, hence this post.
The requirements (driven by the nature of compute I'm doing):
-
NVidia GPUs with largest possible memory - but not Tesla/Quadro
-
approx 4 cores / 8 threads and 32GB mem per GPU
-
fit as many GPUs into the rig as possible - compute across PCs is more tricky
The build:
-
ASRock ROMED8-2TMotherboard-
do you think this motherboard will actually support 7x GPUs? -
why: this is the only mobo I could find with 7x PCIe slots for modern affordable CPU with a lot of PCIE lans. Based onthis review.
-
-
Supermicro Motherboard
-
why: much more reputable brand in server space
-
-
AMD EPYC ROME 7402P Processor
-
mobo explicitly lists it as supported, so should work ok, no?
-
why: based on this wiki page, I looked at Rome faily (current one), selected the cheapest 24 core one (3.4 cores per GPU)
-
-
-
is it going to mount on EPYC socket? -
why: there seem to be only two AIOs specifically for TR4 socket, second one is Enermax with super bad reputation. Need AIO because tower cooler will obstruct GPUs in my planned case.
-
-
-
why: dedicated for EPYC, low profile, not water
-
-
Samsung 32GB 3200MHz ECC DDR4 RDIMM Memory
-
I don't know anything about server memory, please help?
-
why: mobo spec says it supports RDIMM, LRDIMM and NV DIMM. This article says up to 32GB modules you should use RDIMM. 8x32GB should be sufficient for all GPUS. NV DIMM seems not applicable here.
-
-
-
why: I like EVGA due to good customer support, air because easy to swap between rigs, exact model TBD after release
-
plan: build will start with 4x 2080 ti (already have), then add 1x 3090 after initial reviews, confirm no unexpected issues, then keep adding up
-
-
LINKUP PCIe 4.0 X16 Riser Cable
-
zero experience with risers, so any info would be good, is 20cm ok? 50cm too far?
-
why: this is the only PCIE 4.0 riser cable I found
-
-
Samsung 980 (when they release)
-
why: good experience with 970
-
-
-
zero experience with dual PSUs, any advice or good resources would be super appreciated!
-
plan: 1st PSU: Mobo/CPU/3x3090, 2nd PSU: 4x3090
-
-
Two options for the case:
-
-
why: mobo directly below GPUs - so can use short risers (good?), but have to use AIO cooler (f** me if it leaks!)
-
-
Hydra II Rev. B or similar
-
why: mobo in rear compartment - so needs longer risers (50cm, bad?), but can fit air tower cooler (they don't leak!)
-
-
both are 6u rack, which is 47cm wide, so it should fit 7 triple slot cards (7 * 6cm = 42cm)
-
heat will be a problem..., no? :X
-
-
-
3x intake + 3x outtake. Tell me heat won't be a problem...
-
Sorry for the super long post. It would be super helpful is sb more experienced double checked the build. I kind of expect one liner reply: "won't work because of X you silly bumer", which would be super appreciated.
All the best!
EDIT #1:
In case someone is researching this as well, 7 GPU is not a new thing. Found more resources, more motherboards and more info. Most similar build is this one (forum thread). More builds here, here, here, here and of course Linus series "7 Gamers. 1 CPU" part 1, 2, 3 (8 GPU) and his 6 GPU rendering station. Now I really have to do some work.
EDIT #2:
I did more research and it seems this build is a scratch, or at least not suitable for a reliable production system.
The main reason is power supply. Largest decent quality PSU I could find are 2000W [1, 2], this is sufficient for "only" 4x 3090. Based on my (rather minimal) electrical eng. experience you should not bridge PSUs unless they were specifically designed to do so. Person on homelab subreddit raised same concern, I spoke to two electronics engineers and both said the same "might work but don't do it". Miner use dual PSUs but their PCIE x1 extenders might be more isolated (?).
In the end decided to purchase 2x 2nd gen Threadripper 2950x, price slightly higher than single 3960x or EPYC listed above, but seems like much safer way to get 8x GPUs. The CPU power, iter-GPU transfers (now over network) and PCI 4.0 are non factor for me.
Server builds are more than capable to support 8-10GPUs and should work with 2080ti / 3090 as far as I know, server PSUs are specifically designed to operate in parallel / redundant mode. The downside is servers are extremely loud, too loud even for my lab backroom.
Next time!
Comments Section
You don't want 3090(ga102), you want big ampere, ga100. The pie version is called "A100" They are quite different actually. Ga102 spends area on things gamers need, ga100 is much bigger and spends area on compute needs.
For some compute tasks ga102 may be fine, but for some it will be severely limited.
This is the most absurdly cool compute build I've seen in a while, wow!
You should really consider the proper server approach to this. Redundant power supplies, RAM should be bought as a kit not in single sticks for validation (and that RAM is listed as "Sample" and not in production yet). EPYC and Threadripper use the same physical socket, SP3, so cooler's good.
Really consider talking to a sub like r/homelab. While HPC isn't really their subject, they know a thing or two about putting parts in racks without killing 20K in electronics.
Edit: This is more like 30-40K in electronics, holy shit. Not even Puget has/will have a workstation like this AFAIK.
Agree, this is going to be expensive even if it's DIY. If this is for work and not home, maybe look at a proper server: https://www.supermicro.com/en/Aplus/system/4U/4124/AS-4124GS-TNR.cfm
I'm not certain about the 3090, but that box will take 8x P100/V100's so it doesn't seem unreasonable.
7 gpu? Of cos heat will be a problem !
Honestly everything looks pretty good only thing I would change is the 2 1600 watt psu there are some psus now do all the way up to 2000 watts and that should be enough for 4 GPUs and your cpu.
At this scale you might as well get a split unit & cool the entire room down. Probably will need to anyway given how much heat this is going to pump out haha
Do note, American regular 120v outlets are limited to 1500w each. So you will need either a dedicated 20a outlet or to have an electrician run a second 15a circuit to a second outlet for your second power supply.
You should build it and send it over to me to test to see if it works