- RDMA support (RoCE V2 if ethernet) - Compatibility with existing adapters (connectx-5 edr/100gbe, connectx-6 hdr100/100gbe, connectx-6 hdr/200gbe, connectx 5 edr/100gbe MCX555A-ECA, connectx 6 hdr100/100gbe MCX653105A-ECA, connectx 6 hdr/200gbe MCX653105A-HDA) - Minimum 1.6tbps bandwidth from each access switch back to spine (or whatever aggregating layer) - 100us or better average latency with a 1M message size between the furthest hosts on the fabric as measured with ib_read_lat and ib_write_lat - Mix of 100G and 200G connectivity to existing hosts, with possible 400G future hosts - Must be compatible with third party optics and cables Compatibility with our cluster's "pod" layout, where traditional compute, gpu, and storage/service/misc nodes are separated into dedicated pods in the datacenter. - Sufficient connectivity to serve 301 hosts in compute pod (predominantly 100G (4x25G nrz, and 2x50G pam4, some 200G capable) - Sufficient connectivity to serve 28 existing 100G/200G(mixture of 4x25G nrz, 2x 50G pam4, 4x50G pam4) single interface hosts in infrastructure pod with capacity for 48 or so (in total) as purchased. Assume 200G for new nodes. - Sufficient connectivity to serve 50 existing 100G/200G (mixture of 2x50G PAM4, and 4x50G PAM4) single link connections in GPU pod with room for 64 (in total) for initial purchase. Assume 200G for new systems. - Redundant spine switches - Switches must have redundant power supplies compatible with 200-240v 60hz ac power and C14 cables - Switches will be oriented such that their interfaces are facing the rear of the racks (hot aisle), airflow must exit from the interface side of the switch - While the datacenter is generally kept around 22C, switches must be able to tolerate 30C intake temperatures without issue - CLI interface into swtiches, with ssh access - Automation compatible with widely used tools (ansible) - 5 year warranty period at minimum