Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving

Wonung Kim, Yubin Lee, Yoonsung Kim, Jinwoo Hwang, Seongryong Oh, Jiyong Jung, Aziz Huseynov, Woong Gyu Park, Chang Hyun Park, Divya Mahajan, Jongse Park

International Symposium on Microarchitecture (MICRO), 2025.