Skip to content

2017 NeuroAI 2

Deep Learning

  • 1943, McCulloch and Pitts: NN that could compute logical functions
  • 1949, Hebb: efficiently encode environmental statistics in an unsupervised fashion
  • 1958, Rosenblatt: NN learn incrementally via supervisory feedback
  • 1980, Fukushima: early NN models of visual processing
  • 1985, Rumelhart: backprop
  • 2006, Hinton: deep belief networks
  • 2009, Deng: introduction of large datasets inspired by research on human language
  • 2012, Hinton: Dropout regularization, motivated by the stochasticity that is inherent in neurons that fire with Poisson-like statistics
  • 2015, LeCun: sentences can be represented as vectors
  • 2016, Yamins and DiCarlo: CNNs incorporate nonlinear transduction, divisive normalization, and maximum based pooling of inputs
    • 1959, Hubel and Wiesel: single-cell recordings from the mammalian visual cortex revealed how visual input is filtered and pooled in simple and complex cells in area V1
    • Replicates the hierarchical organization of mammalian cortical systems, with both convergent and divergent information flow in successive, nested processing layers

SOTA 2025

  • LLMs

Reinforcement Learning

  • TD methods

    • Real-time models that learn from differences between temporally successive predictions, rather than having to wait until the actual reward is delivered.
    • Of particular relevance was an effect called second-order conditioning, where affective significance is conferred on a conditioned stimulus (CS) through association with another CS rather than directly via association with the unconditioned stimulus.
    • TD learning provides a natural explanation for second-order conditioning and indeed has gone on to explain a much wider range of findings from neuroscience.
  • TD based RL: DQNs, A3C, PPO (TD for value estimation), SAC (TD for Q-function updates)

    • DQN uses TD learning by bootstrapping from its own predictions \(Q_{\text{target}}\) to update Q-values in real time.
    • Unlike Monte Carlo, it doesn’t need to wait for final outcomes — it learns from temporal differences between successive predictions.
    • The "real-time" aspect comes from the fact that every step generates a TD error, which is used to improve the policy immediately.

SOTA 2025

  • DreamerV3: Best for model-based RL from pixels
  • MuZero: Combines planning + learning without knowing environment rules
  • SAC: SOTA for continuous control
  • PPO: Widely used, stable, scalable

Attention

  • Traditionally, CNN models worked directly on entire images, with equal priority given to all pixels at the earliest stage of processing
  • The primate visual system works differently. Visual attention shifts strategically among locations and objects, centering processing resources and representational coordinates on a series of regions in turn
  • Attentional mechanisms have been a source of inspiration for AI architectures that take "glimpses" of the input image at each step, update internal state representations, and then select the next location to sample

    • One such network was able to use this selective attentional mechanism to ignore irrelevant objects in a scene, allowing it to perform well in challenging object classification tasks in the presence of clutter
    • 2014. DeepMind. Recurrent Models of Visual Attention
  • While attention is typically thought of as an orienting mechanism for perception, it can also be focused toward the contents of internal memory, this has helped provide recent successes in machine translation and memory + reasoning tasks

  • One further area of AI where attention mechanisms have recently proven useful focuses on generative models that mimic the structure of examples presented during training

    • For example, in one SOTA generative model known as DRAW, attention allows the system to build up an image incrementally, attending to one portion of a "mental canvas" at a time

SOTA 2025

  • LLMs

Episodic Memory

  • Allow experiences to be encoded rapidly in a content-addressable store

    • Associated with medial temporal lobe, (including hippocampus)
  • Animal learning is supported by complementary learning systems in the hippocampus and neocortex

    • The hippocampus acts to encode novel information after a single exposure (one-shot learning), but this information is gradually consolidated to the neocortex in sleep or resting periods that are interleaved with periods of activity. This consolidation is accompanied by replay in the hippocampus and neocortex, which is observed as a reinstatement of the structured patterns of neural activity that accompanied the learning event
    • This theory was originally proposed as a solution to the well-known problem that in conventional neural networks, correlated exposure to sequential task settings leads to interference (catastrophic forgetting)
    • The replay buffer in DQN is like a primitive hippocampus, permitting complementary learning in silico
    • Enhanced when replay of highly rewarding events is prioritized (hippocampal replay seems to favor events that lead to high levels of reinforcement)
  • DQN exhibits expert play on Atari video games by learning to transform image pixels to a policy

    • Experience replay is critical to maximizing data efficiency, avoids the destabilizing effects of learning from consecutive correlated experiences, and allows the network to learn a viable value function even in complex sequential environments (video games)
  • Episodic Control

    • Experiences stored in a memory buffer can not only be used to gradually adjust the parameters of a deep network toward an optimal policy, as in DQN
    • Can also support rapid behavioral change based on an individual experience. Neuroscience has argued for the potential benefits of episodic control, whereby rewarded action sequences can be internally re-enacted from a rapidly updatable memory store (hippocampus). Advantageous when limited experience has been obtained
    • Recent AI research has drawn on these ideas to overcome the slow learning in deep RL
    • These networks store experiences (e.g., actions and reward outcomes associated with particular game screens) and select new actions based on the similarity between the current input and memories, taking the reward associated with previous events into account
    • Striking gains in performance over deep RL. Further, they are able to achieve success on tasks that depend heavily on one-shot learning, where typical deep RL architectures fail
    • In the future, it will be interesting to harness the benefits of rapid episodic-like memory and more traditional incremental learning (Imagination and planning)

SOTA 2025

  • Problems with Episodic Control

    • You can't store all the experience.
    • Similar inputs don’t always lead to similar outcomes.
  • How World Models solves them

    • Compression & Generalization: World models summarize and compress many experiences into learned patterns
    • Variance Reduction: world models learn structure and smooth out noise in the data
    • Predictive Imagination: World models allow simulation of counterfactuals: What if I try a different action in this situation?

Working Memory

  • Human working memory

    • Thought to be instantiated within the prefrontal cortex and interconnected areas.
    • Classic cognitive theories: depends on interactions between a central controller and separate, domain-specific memory buffers
  • Began with RNN displaying attractor dynamics and rich sequential behavior, directly inspired by neuroscience

  • One can see close parallels between the learning dynamics in these early, neuroscience-inspired networks and those in LSTM networks. LTSMs allow information to be gated into a fixed activity state and maintained until an appropriate output is required. The functions of sequence control and memory storage are closely intertwined instead of separate

  • Differential neural computer (DNC) involves a neural network controller that attends to and reads/writes from an external memory matrix.

    • This externalization allows the network controller to learn from scratch (i.e., via end-to-end optimization) to perform a wide range of complex memory and reasoning tasks that currently elude LSTMs, such as finding the shortest path through a graph-like structure
    • These types of problems were previously argued to depend exclusively on symbol processing and variable binding and therefore beyond the purview of neural networks
  • Although both LSTMs and the DNC are described here in the context of working memory, they have the potential to maintain information over many thousands of training cycles and so may thus be suited to longer-term forms of memory, such as retaining and understanding the contents of a book.

SOTA 2025

  • Transformers (implicit memory)

Continual Learning

  • Neuroscience

    • Decreased synaptic lability (lower rates of plasticity) in a proportion of strengthened synapses, mediated by enlargements to dendritic spines that persist despite learning of other tasks.
    • Theoretical models: memories can be protected from interference through synapses that transition between a cascade of states with different levels of plasticity.
  • Elastic Weight Consolidation (EWC)


Intuitive Understanding of the Physical World

SOTA 2025

  • 2021. UC Berkeley. Decision Transformer: Reinforcement Learning via Sequence Modeling

    • An architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer
    • Matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks
  • 2023. Meta. Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

    • Grounded in the fact that humans learn an enormous amount of background knowledge about the world just by passively observing it.
    • At a high level, the JEPA aims to predict the representation of part of an input (such as an image or piece of text) from the representation of other parts of the same input.
    • Because it does not involve collapsing representations from multiple views/augmentations of an image to a single point, the hope is for the JEPA to avoid the biases and issues associated with another widely used method called invariance-based pretraining.

Efficient Learning

Transfer Learning

  • Progressive Neural Networks: a new class of architecture

  • Neuroscience

    • How humans or other animals achieve this sort of high-level transfer learning is unknown, and remains a relatively unexplored topic in neuroscience
    • At the level of neural coding, this kind of transfer of abstract structured knowledge may rely on the formation of conceptual representations that are invariant to the objects, individuals, or scene elements that populate a sensory domain but code instead for abstract, relational information among patterns of inputs (lack direct evidence)
    • One recent report: neural codes thought to be important in the representation of allocentric (map-like) spaces might be critical for abstract reasoning in more general domains
    • In the mammalian entorhinal cortex, cells encode the geometry of allocentric space with a periodic "grid" code, with receptive fields that tile the local space in a hexagonal pattern (Rowland et al., 2016)

SOTA 2025

Imagination and Planning

SOTA 2025

Virtual Brain Analytics

SOTA 2025

Extra