Jack Cook, Danyal Akarca, Rui Ponte Costa, and Jascha Achterberg
Below is a short summary of the project. The full preprint can be found on: https://arxiv.org/abs/2506.02813
The brain is made up of a vast set of heterogeneous regions that dynamically organize into pathways as a function of task demands. Examples of such pathways can be seen in the interactions between cortical and subcortical networks during learning. This raises the question of how exactly brain regions organize into these dynamic groups. In this work, we use an extension of the Heterogeneous Mixture-of-Experts architecture, to show that heterogeneous regions do not form processing pathways by themselves, implying that the brain likely implements specific constraints which result in reliable formation of pathways. We identify three biologically relevant inductive biases that encourage pathway formation: a routing cost imposed on the use of more complex regions, a scaling factor that reduces this cost when task performance is low, and randomized expert dropout. When comparing our resulting Mixture-of-Pathways model with the brain, we observe that the artificial pathways match how the brain uses cortical and subcortical systems to learn and solve tasks of varying difficulty. In summary, we introduce a novel framework for investigating how the brain forms task-specific pathways through inductive biases which may make Mixture-of-Experts architectures in general more adaptive.
The brain contains heterogeneous regions that dynamically organize into processing pathways and subnetworks based on task demands, like cortical-subcortical interactions during learning [1] and the multiple demand system for complex tasks [2].
Here we ask: (1) Under which conditions do regions in neural networks form pathways, and (2) can we use such pathway models to model the systems architecture of the brain?
We use models with Heterogeneous Mixture-of-Experts [3] layers, which dynamically route information to experts of varying complexities, as the basis for our investigations. We study models trained on 82 cognitive tasks from ModCog, an extension of NeuroGym.
Definition: Models trained on the same tasks should solve each task with similar routing patterns.
First observation: Baseline model does not form replicable pathways without further inductive biases.
So we develop a loss function that incentivizes usage of small experts:
Definition: Disabling experts outside of an active pathway should minimally impact the model’s performance.
We observe:
So we train our model with dropout applied to experts with small routing weights:
Resulting in:
Tasks cluster into distinct pathways based on their routing patterns across three temporal phases (pre-stimulus, stimulus, response), with our model showing rich pathway dynamics during the response phase.
1st brain comparison: Complex cortical regions solve more complex tasks.
2nd brain comparison: Complex networks learn hard tasks, then turn them into habits. Simple tasks use simple networks for learning.
Heterogeneous experts do not form dynamic, brain-like pathways automatically. Two biologically-motivated inductive biases are required: (1) scaled metabolic cost penalties for using larger experts, and (2) expert dropout.
Neuroscience Outlook: Extend to recurrent connectivity between layers to model brain's loop structures, link routing mechanisms to thalamic nuclei function, and integrate region-specific models (e.g., hippocampus).
ML Outlook: Scale complexity-aware routing to larger networks and language models. This framework can help with hardware-deployment optimization [4] and load balancing by dynamically allocating computational resources based on input complexity.
[1] Dolan & Dayan. Neuron. 2013.
[2] Duncan. Neuropsychologia. 2025.
[3] Wang, et al. ‘Heterogeneous Mixture-of-Experts’. arXiv. 2024.
[4] He, et al. ‘Expertflow’. arXiv. 2024.