Design Space Exploration of Graph Neural Networks for Inductive Link Prediction
April 2023
Preliminary Content
Preface
This is an example of a thesis setup to use the reed thesis document class (for LaTeX) and the R bookdown package, in general.
Abstract
Link prediction is a task with a variety of important applications, such as recommendation systems and data mining. Recent approaches have leveraged powerful Graph Neural Networks (GNNs) to achive state-of-the-art performance by optimizing unique representations for each node in a graph (transductive learning) or refining existing numeric attributes of each node (inductive learning). While these GNN-based approaches are undoubtedly effective, their assumptions of the underlying data can make them difficult to apply in realistic scenarios. In domains where the distribution of data may shift, such as a social media platform constantly registering new users, transductive approaches will not have optimized representations for unseen nodes and will likely struggle. This problem is mitigated by inductive approaches which utilize and refine existing node attributes, but many graphs are completely unattributed, preventing the application of inductive GNNs. An ideal solution for link prediction in realistic settings would remedy these challenges by automatically and inductively learning node representations end-to-end using the structure of the graph, such that the same model maintains effectiveness in the presence of new nodes, edges, and even across entirely different graphs without depending on fixed node representations or the presence of node attributes. In this thesis, we construct and comprehensively evaluate a family of GNN-based models for link prediction satisfying these properties in order to identify the key factors in building efficient, generalizable link prediction models. We propose a standardized framework consisting of several “design dimensions” based on state-of-the-art graph representation learning methods, which we evaluate on the challenging cross-graph inductive link prediction task. We complete an in-depth investigatation of the generalization abilities of models within our framework, both at the population level as well as broken down by design dimension, and examine global trends such as the presence of a reduced subspace of optimal models and the effects of the underlying data on generalization ability. By providing insight into what factors lead to well-performing and generalizable models, the results of our studies serve as a starting point to accelerate future work in the still-young area of inductive link prediction.