SysML@ICL - Roger Waleffe - MariusGNN: Training Graph Neural Networks over Billion-Scale Graphs on a Single Machine

Thursday, April 20, 9:00am - 10:00am (EDT)

Time shown in-04:00 America, New York

Zoom Link: https://us02web.zoom.us/j/85119887982?pwd=dU9udXRoc1lJNEhQR2xwOVN3SEJxZz09

--- ABSTRACT ---
In this talk, I will present MariusGNN, a system for training Graph Neural Networks (GNNs) on a single machine. Using only one GPU, a machine with 60GB of RAM, and a large SSD, MariusGNN can learn vector representations for all 3.5B nodes (web pages) in the Common Crawl 2012 hyperlink graph containing 128B edges (hyperlinks between pages). To support such billion-scale graphs, MariusGNN utilizes pipelined mini-batch training and the entire memory hierarchy, including disk. This architecture requires MariusGNN to address two main challenges. First, MariusGNN optimizes GNN mini-batch preparation to make it as efficient as possible on a machine with fixed resources. Second, MariusGNN employs techniques to utilize disk storage during training without bottlenecking throughput or hurting model accuracy. This talk will highlight how MariusGNN with one GPU can achieve the same level of model accuracy up to 8x faster than existing industrial systems when they are using up to eight GPUs. By scaling training using disk storage, MariusGNN deployments on billion-scale graphs are up to 64x cheaper in monetary cost than those of the competing systems.

Add to Calendar 2023/04/20 14:00:00 2023/04/20 15:00:00 Europe/London SysML@ICL - Roger Waleffe - MariusGNN: Training Graph Neural Networks over Billion-Scale Graphs on a Single Machine Zoom Link: https://us02web.zoom.us/j/85119887982?pwd=dU9udXRoc1lJNEhQR2xwOVN3SEJxZz09

--- ABSTRACT ---
In this talk, I will present MariusGNN, a system for training Graph Neural Networks (GNNs) on a single machine. Using only one GPU, a machine with 60GB of RAM, and a large SSD, MariusGNN can learn vector representations for all 3.5B nodes (web pages) in the Common Crawl 2012 hyperlink graph containing 128B edges (hyperlinks between pages). To support such billion-scale graphs, MariusGNN utilizes pipelined mini-batch training and the entire memory hierarchy, including disk. This architecture requires MariusGNN to address two main challenges. First, MariusGNN optimizes GNN mini-batch preparation to make it as efficient as possible on a machine with fixed resources. Second, MariusGNN employs techniques to utilize disk storage during training without bottlenecking throughput or hurting model accuracy. This talk will highlight how MariusGNN with one GPU can achieve the same level of model accuracy up to 8x faster than existing industrial systems when they are using up to eight GPUs. By scaling training using disk storage, MariusGNN deployments on billion-scale graphs are up to 64x cheaper in monetary cost than those of the competing systems. Huxley 315 false MM/DD/YYYY 30 OPAQUE awvXLYpqfzLynXdCmmba171438

Huxley 315

Pedro Silvestre, pmfsilvestre@gmail.com

Link: