Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

Paper · arXiv 2402.15332 · Published February 23, 2024
Mechanistic Interpretability

We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory— precisely, the universal algebra of monads valued in a 2-category of parametric maps—as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.

Introduction. One of the most coveted aims of deep learning theory is to provide a guiding framework from which all neural network architectures can be principally and usefully derived. Many elegant attempts have recently been made, offering frameworks to categorise or describe large swathes of deep learning architectures: Cohen et al. (2019); Xu et al. (2019); Bronstein et al. (2021); Chami et al. (2022); Papillon et al. (2023); Jogl et al. (2023); Weiler et al. (2023) to name a few. We observe that there are, typically, two broad ways in which deep learning practitioners describe models. Firstly, neural networks can be specified in a top-down manner, wherein models are described by the constraints they should satisfy (e.g. in order to respect the structure of the data they process). Alternatively, a bottom-up approach describes models by their implementation, i.e. the sequence of tensor operations required to perform their forward/backward pass.

Discussion / Conclusion. Our framework gives the correct definition of numerous variants of structured networks as universal parametric counterparts of known notions in computer science. This immediately opens up innumerable avenues for research. Firstly, we have seen that monad algebras—which generalise equivariance constraints—can be parametric, and lax. As a consequence, the kinds of equivariance constraints we can learn become more general: we hypothesise neural networks that can learn not merely conservation laws (as in Alet et al. (2021)), but verifiably correct logical argument, or code. This has ramifications for code synthesis: we can, This is made possible by our framework’s generality: by choosing polynomial functors as endofunctors we get acces to containers (Abbott et al., 2003; Altenkirch et al., 2010), a uniform way to program with and reason about datatypes and polymorphic functions. By combining these insights with recent advances enabling purely functional differentiation through inductive and coinductive types (Nunes & V ́ak ́ar, 2023), we open new vistas for type-safe design and implementation of neural networks in functional languages. Our framework offers a proactive path towards equitable AI systems.