ESTroM: Element-Flow Architecture For Processing Sparse Tractable Probabilistic Models (bibtex)

by Anjunyi Fan, Xuejie Liu, Anji Liu, Qiuping Wu, Jiaqi Yang, Yuchao Qin, Guy Van den Broeck, Yitao Liang and Bonan Yan
Abstract:
Probabilistic Circuits (PCs) models are emerging popular tractable probabilistic models. Their internal connections are represented in the form of directed acyclic graphs (DAGs) with sum nodes and product nodes, ensuring their internal parameter efficiency and model expressiveness in terms of probabilistic inference. Despite these algorithmic advantages, executing PC still faces graph structure deployment issues. PyJuice on GPU with the block-sparse parallel computation methods causes a parallelism-sparsity gap, while DAG-style processing does not take advantage of the repetitive characteristics of PC internal nodes, resulting in low throughput. To address this challenge, this work proposes the ESTroM, an efficient architecture that provides novel graph-element (nodes/edges) parallelism with sparsity-aware compilation. Through analysis of the sum/product node computational requirements, ESTroM core uses compressed matrices for sum/product nodes DAG representations, edge-based dataflow for product node processing, and node-based dataflow for sum node processing. With intra-core rewind and intercore multicast optimizations, we develop a prototype ESTrom chip and a demonstrative system for a PC-based neural lossless compression application. Our ablation experiments show ESTrom offers a speed improvement of 2.11 ~ 3.79 × compared to the state-of-the-art DAG processing unit (DPU)-v2 with the same computing resources. Under various typical PC structures, ESTrom achieves a speedup of 18.7 × compared to DPU-v2 and 3.9 × compared to NVIDIA RTX 4090 GPU with PyJuice framework. In terms of neural lossless compression, ESTroM demonstrates a 1.39 × improvement in compression ratio compared to the industrial-standard Z-standard (Zstd) algorithms with the highest compression level, while offering 16.3 ~ 65.2 × improvement in compression speed compared to Zstd on Intel Xeon Gold 6230. In a nutshell, this work develops novel graph element parallelism and element-flow architecture theory with practical prototype chips and systems, revealing a new hardware-perspective path for the “scaling law” of emerging tractable probabilistic models.
Reference:
Anjunyi Fan, Xuejie Liu, Anji Liu, Qiuping Wu, Jiaqi Yang, Yuchao Qin, Guy Van den Broeck, Yitao Liang and Bonan Yan. ESTroM: Element-Flow Architecture For Processing Sparse Tractable Probabilistic Models, In Proceedings of the 32nd International Symposium on High-Performance Computer Architecture (HPCA), 2026.
Bibtex Entry:
@inproceedings{FanHPCA26,
  title={ESTroM: Element-Flow Architecture For Processing Sparse Tractable Probabilistic Models}, 
  author={Fan, Anjunyi and Liu, Xuejie and Liu, Anji and Wu, Qiuping and Yang, Jiaqi and Qin, Yuchao and Van den Broeck, Guy and Liang, Yitao and Yan, Bonan},
  booktitle = {Proceedings of the 32nd International Symposium on High-Performance Computer Architecture (HPCA)},
  month = 1,
  year={2026},
  keywords  = {conference,selective}
}
Powered by bibtexbrowser