Publications
Detailed Information
Refine and Recycle: A Method to Increase Decompression Parallelism
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Fang, Jian | - |
dc.contributor.author | Chen, Jianyu | - |
dc.contributor.author | Lee, Jinho | - |
dc.contributor.author | Al-Ars, Zaid | - |
dc.contributor.author | Hofstee, H. Peter | - |
dc.date.accessioned | 2024-05-02T06:01:28Z | - |
dc.date.available | 2024-05-02T06:01:28Z | - |
dc.date.created | 2024-04-23 | - |
dc.date.created | 2024-04-23 | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | 2019 IEEE 30TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2019), pp.272-280 | - |
dc.identifier.issn | 2160-0511 | - |
dc.identifier.uri | https://hdl.handle.net/10371/200539 | - |
dc.description.abstract | Rapid increases in storage bandwidth, combined with a desire for operating on large datasets interactively, drives the need for improvements in high-bandwidth decompression. Existing designs either process only one token per cycle or process multiple tokens per cycle with low area efficiency and/or low clock frequency. We propose two techniques to achieve high single-decoder throughput at improved efficiency by keeping only a single copy of the history data across multiple BRAMs and operating on each BRAM independently. A first stage efficiently refines the tokens into commands that operate on a single BRAM and steers the commands to the appropriate one. In the second stage, a relaxed execution model is used where each BRAM command executes immediately and those with invalid data are recycled to avoid stalls caused by the read-after-write dependency. We apply these techniques to Snappy decompression and implement a Snappy decompression accelerator on a CAPI2-attached FPGA platform equipped with a Xilinx VU3P FPGA. Experimental results show that our proposed method achieves up to 7.2 GB/s output throughput per decompressor, with each decompressor using 14.2% of the logic and 7% of the BRAM resources of the device. Therefore, a single decompressor can easily keep pace with an NVMe device (PCIe Gen3 x4) on a small FPGA, while a larger device, integrated on a host bridge adapter and instantiating multiple decompressors, can keep pace with the full OpenCAPI 3.0 bandwidth of 25 GB/s. | - |
dc.language | 영어 | - |
dc.publisher | IEEE COMPUTER SOC | - |
dc.title | Refine and Recycle: A Method to Increase Decompression Parallelism | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/ASAP.2019.00017 | - |
dc.citation.journaltitle | 2019 IEEE 30TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2019) | - |
dc.identifier.wosid | 000574772800053 | - |
dc.identifier.scopusid | 2-s2.0-85072601520 | - |
dc.citation.endpage | 280 | - |
dc.citation.startpage | 272 | - |
dc.description.isOpenAccess | N | - |
dc.contributor.affiliatedAuthor | Lee, Jinho | - |
dc.type.docType | Proceedings Paper | - |
dc.description.journalClass | 1 | - |
dc.subject.keywordAuthor | Snappy | - |
dc.subject.keywordAuthor | decompression | - |
dc.subject.keywordAuthor | FPGA | - |
dc.subject.keywordAuthor | Acceleration | - |
- Appears in Collections:
- Files in This Item:
- There are no files associated with this item.
Related Researcher
- College of Engineering
- Department of Electrical and Computer Engineering
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.