NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule
NVIDIA's Gated DeltaNet-2 decouples erase and write operations in linear attention by using separate channel-wise gates, enabling more flexible memory editing without losing information. At 1.3B parameters, it outperforms Mamba-2, Gated DeltaNet, and KDA on standard benchmarks while maintaining constant-memory decoding.
MarkTechPost · 15 min read
Research