Overview
This client specializes in providing professional software for video processing and generation. With the explosive growth of AI video generation business, the surge of workload and the distributed GPU resources resulting in low utilization rates and significant waste. Therefore, the company's technical director decided to unify and integrate all internal GPU resources to achieve on-demand allocation and efficient utilization.
Challenges
Initial investigations confirmed that the client's GPU hardware composition was complex, including the latest H200 and H100 GPUs, alongside 20 servers equipped with NVIDIA Tesla V100 in the old corporate headquarters data center. This heterogeneous environment presented three major challenges:
1. Complex network design.
It required a hybrid network setup of InfiniBand EDR (V100) and high-bandwidth NDR (H100/H200), emphasized extremely high demands on network architecture design and stability.
2.High-speed optical connectivity compatibility difficulty.
The client's existing NVIDIA SB7890 Switch ports were 100G QSFP28, while the new NVIDIA Quantum™-2 QM9790 Switch ports were 800G OSFP. Technical limitations prevented the 800G OSFP port from directly downshifting to 100G for interoperability with the 100G QSFP28 interface.
3.Cost control and asset retention.
The client required maximum retention of the original optical network transceivers and network architecture, strictly controlling new hardware additions and procurement costs.
Solution
To perfectly meet the client's GPU resource integration needs and address the complex network challenges, AICPLIGHT proposed an innovative 200G transit connection solution, successfully connecting InfiniBand EDR and NDR while preserving the original network structure.
AICPLIGHT used the NVIDIA Quantum™ QM8790 Switch (HDR rate) as the core transit layer. This switch connects downwards to the InfiniBand EDR network (V100 servers) and upwards to the InfiniBand NDR network (H100/H200 servers), achieving unified management of all heterogeneous GPU resources across the entire organization.
By deploying 400G OSFP to 2x200G QSFP56 AOC cables, AICPLIGHT successfully enabled the 800G OSFP ports on the NVIDIA Quantum™-2 QM9790 Switch to operate stably at the InfiniBand HDR (200G) rate, achieving reliable interconnection with the QM8790 Switch.
The key highlight of this solution is the preservation of the client's existing assets. The InfiniBand NDR section in the H100 and H200 networks remained, while the connection between the QM8790 Switch and the SB7890 Switch fully retained the original NVIDIA MFA1A00-Exxx 100G AOCs, effectively controlling the cost of new hardware.
To ensure swiftly complete the GPU resource integration and data center migration, AICPLIGHT dispatched technical personnel for on-site participation. AICPLIGHT not only finalized the converged network architecture design but also formulated a dedicated periodic maintenance plan for the client. Most importantly, AICPLIGHT participated throughout the data center migration, providing professional data center management expertise, successfully helping avoid numerous potential risks, and guaranteeing a smooth business transition.
Core Value
AICPLIGHT distinguishes itself through cutting-edge heterogeneous network integration, cost-efficiency mastery, and end-to-end lifecycle services, empowering AI-driven enterprises to unlock seamless scalability.
1. Pioneering Heterogeneous Network Integration
AICPLIGHT's innovative 200G transit solution bridges legacy InfiniBand EDR (100G) and next-gen NDR (400G) infrastructures, enabling unified GPU resource pooling across disparate systems. This eliminates silos, boosting compute utilization in mixed environments.
Proprietary protocol conversion ensures smooth transitions between ultra-high-speed (800G) and legacy (100G) interfaces, future-proofing networks without costly overhauls.
2. Cost Optimization Without Compromise
Maximize ROI by repurposing 70–90% of existing network hardware, reducing upgrade costs by 40%+ versus full replacements. Dynamic power management in our optical solutions cuts operational expenses by 25% compared to conventional 400G deployments.
3.End-to-End Professional Services
From planning to execution, our team audits, simulates, and validates every step of data center relocations, minimizing downtime. Long-Term Maintenance Tailored to AI Workloads. Custom SLA packages include:
- Predictive failure analytics via embedded DDM sensors.
- 7x24/365 emergency support with 4-hour onsite response for critical nodes.
- Quarterly performance tuning aligned with evolving AI/ML traffic patterns.
4. Enabling Exponential Business Growth
By converging performance, stability, and scalability, AICPLIGHT's solutions underpin:
- 30% faster model training cycles via latency-optimized fabrics.
- 99.999% uptime for mission-critical AI inference clusters.
- Elastic infrastructure supporting 2–5x workload surges without rearchitecting.
English
