Publications

Detailed Information

Fast simulation of a many-NPU network-on-chip for microarchitectural design space exploration

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

Kang, Jintaek; Yi, Changjae; Lee, Keonjoo; Lee, Seungwook; Ryu, Soojung; Ha, Soon Hoi

Issue Date
2021-01
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
Proceedings - 2021 24th Euromicro Conference on Digital System Design, DSD 2021, pp.131-138
Abstract
© 2021 IEEE.A viable solution to cope with the ever-increasing computation complexity of deep learning applications is to integrate many neural processing units (NPUs) in a chip where a network-on-chip (NoC) is used as the communication fabric. Since the design space of an NoC is huge, the network topology is first selected based on the communication patterns of applications with a high-level performance estimation method. After the network topology is selected, the microarchitectural design space exploration is performed with a cycle-level NoC simulator. However, the existing NoC simulator is so slow that design space exploration of the microarchitecture is usually conducted manually in a narrow space. Since a synthetic trace is used, the simulation accuracy is also limited. To overcome these weaknesses, we present a simulation technique that is fast and accurate enough for microarchitectural design space of an NoC. In the proposed technique, we use the real communication trace from the many-NPU simulation without NoC consideration. To this end, we define the trace format that defines the interface between a many-NPU simulator and the NoC simulator. To accelerate simulation speed, we propose a parallelization technique at the cluster level in the simulation of the hierarchical NoC. The key technique is to manage the timestamps of events at the cluster boundary to do without time synchronization error. And, we adjust the abstraction level of simulation models to reduce the number of modules in the SystemC NoC simulation. With the proposed technique, we could achieve up to 40 times speed-up for 32 NPU system, compared with the FlexNoC simulator.
URI
https://hdl.handle.net/10371/183770
DOI
https://doi.org/10.1109/DSD53832.2021.00029
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share