Blog

Parallelizing Nonlinear RNNs with the Ungulates: DEER and ELK

This blog post is a colab notebook that shows how we can use parallel Newton methods (DEER) and parallel trust region methods (ELK) to parallelize nonlinear RNNs. This blog is a simple exposition of these parallel Newton methods that doubles as quick-start guide for our codebase!

A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems

This research paper is not a blog post, but it serves as a gentle introduction to a variety of fixed-point methods—Newton, Picard, and Jacobi—that have been used to parallelize a variety of state space models—including RNNs, MCMC, sampling from diffusion models, and more!

Also worth checking out is Appendix B, which serves as a gentle and user-friendly introduction to the powerful “parallel scan” algorithm. The parallel scan algorithm lets us multiply a sequence of \(T\) matrices together in \(\mathcal{O}(\log T)\) computational depth. The parallel scan is what is used to parallelize deep SSMs like S5 and Mamba. It is also a core ingredient of the parallel Newton methods for parallelizing nonlinear RNNs and MCMC.