Understanding Memory Storage in Deep Neural Networks: A Research Exploration

Introduction

Neural networks are powerful tools for processing sequential and structured data, but how they internally store and retrieve information remains a fascinating open question. This research project aims to investigate how deeply connected neural networks encode and retain information from text inputs, while optimizing network architectures for efficiency. Below, we outline the goals, methodology, and potential implications of this work.

Research Objectives

Memory Storage Analysis:
How do neural networks represent sequential text data in their hidden layers? Specifically, how is information distributed across neurons, and how does connectivity influence retention?
Architecture Comparison:
Compare different neural network types (e.g., RNNs, LSTMs, Transformers) to determine which requires the fewest neurons to reliably store input data.
Topology Optimization:
Enhance neural network architectures by refining layer connections, activation functions, or neuron behaviors to improve memory efficiency.

Methodology

1. Input Encoding

Text-to-Float Conversion: Each ASCII character (0–255) is mapped to a unique floating-point value (e.g., scaled to [0, 1]).
Sequential Processing: Text is fed one character at a time into 256 input neurons. For example, the word “hello” would be processed as 5 sequential inputs.
Illustration: ASCII Encoding Example

2. Network Architectures Under Study

Recurrent Neural Networks (RNNs): Process sequences with loops to retain memory.
Long Short-Term Memory (LSTM): Improved memory via gated cells.
Transformers: Use self-attention for parallelized sequence modeling.
Dense Feedforward Networks: Fully connected layers with no recurrence.

3. Metrics

Minimal Neuron Count: The smallest number of neurons required to reconstruct input text from hidden states.
Reconstruction Accuracy: Compare outputs to original inputs after training.

How Information is Stored in Neural Networks

Neural networks store information through synaptic weights and activation patterns. In sequential models like RNNs, hidden states act as memory buffers. However, the exact mechanism varies:

RNNs: Compress history into hidden states, but suffer from vanishing gradients.
LSTMs: Use forget/input gates to manage long-term dependencies.
Transformers: Store context via attention weights across tokens.

Optimizing Network Topology

To minimize neurons while preserving memory:

Pruning: Remove redundant connections.
Dynamic Activation Functions: Use context-aware activations (e.g., SIREN).
Sparse Connectivity: Prioritize critical pathways.
Hybrid Architectures: Combine strengths of RNNs and attention.

Academic References

Memory in RNNs:
- Hochreiter & Schmidhuber (1997). LSTM Paper
Attention Mechanisms:
- Vaswani et al. (2017). Transformers Paper
Neural Network Efficiency:
- Frankle & Carbin (2019). The Lottery Ticket Hypothesis

Implications

AI Interpretability: Understanding memory storage could demystify “black-box” models.
Efficient Models: Optimized topologies reduce computational costs for edge devices.
Neuroscience Parallels: Insights into biological neural memory systems.

Conclusion

This research bridges theoretical neuroscience and machine learning, offering a systematic approach to studying memory in neural networks. By quantifying memory efficiency and refining architectures, we aim to build models that are both compact and powerful.

Connect with the author: guillaume@denemlabs.com

Reviewed and published by Simon Heilles.

You can also find us on Linkedin, visit our website, or email us at hello@denemlabs.com.

DENEM Labs

Our mission is to ensure that AI serves individuals committed to building a better world for all.

While we can’t regulate every use of AI, we actively contribute to detecting fraudulent applications, enhancing the safety of AI deployment in production environments, and strengthening AI models against cybersecurity threats.

About US