Publications

Detailed Information

Floating-point support for coarse-grained reconfigurable architectures : 재구성형 연산 구조를 위한 부동소수점 지원

DC Field Value Language
dc.contributor.advisor최기영-
dc.contributor.author조만휘-
dc.date.accessioned2017-07-13T07:04:01Z-
dc.date.available2017-07-13T07:04:01Z-
dc.date.issued2014-02-
dc.identifier.other000000018611-
dc.identifier.urihttps://hdl.handle.net/10371/118999-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 최기영.-
dc.description.abstractWith a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. Besides, supporting floating-point operations on coarse-grained reconfigurable architecture becomes essential as the increase of demands on various floating-point inclusive applications such as multimedia processing, 3D graphics, augmented reality, or object recognition.
This thesis presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. Two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. More specifically, each floating-point operation is performed by two integer processing elements, one for mantissa and the other for exponent. Fabricated using 130nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125MHz clock frequency and 1.2V power supply. Experiments show 11.6x speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications.

This thesis also proposes novel techniques to enhance utilization of integer units for high-throughput floating-point operations on CGRA.
The approach to implementing floating-point operations on CGRA presented in this thesis enables floating-point functionality with less area overhead compared to the traditional approach of employing separate floating-point units (FPUs). However the total latency of a floating-point operation is larger than that of the traditional approach and the data dependency between split integer operations restricts further enhancement in terms of utilization of integer functional units in an operation. In order to overcome such inefficiency, two techniques are proposed in this thesis. One is overlapping two distinct floating-point operations, which increases the efficiency in terms of utilizations of integer functional units in the architecture. Free integer functional units in a floating-point operation can be used for another floating-point operation with this technique. The other is forwarding between two data-dependent floating-point operations, which decreases effective latency of the floating-point operations. The basic idea is to remove unnecessary calculations such as formatting which is normally done in between the two data-dependent floating-point operations. To implement the overlapping or forwarding, FSMs and control paths in each PE are modified and temporal/communication registers are added. Light-weight sub-module such as increment units and registers for intermediate values are added for releasing resource conflict.
Experiment is done with several arithmetic functions that are widely used in floating-point applications. The base architecture and the new architecture implementing the proposed technique are compared in terms of throughput and area overhead. The experimental result shows that the proposed technique increases the throughput by 33.9% on average with 20.9% of area overhead.
-
dc.description.tableofcontentsAbstract i
Contents v
List of Figures ix
List of Tables xv
Chapter 1 INTRODUCTION 1
Chapter 2 TARGET ARCHITECTURE 7
2.1 Overall Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Reconfigurable Computing Module . . . . . . . . . . . . . . . . . 8
Chapter 3 DEGISN OF FLOATING-POINT OPERATIONS 15
3.1 Floating-point Numbers . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Representation of floating-point numbers . . . . . . . . . . 15
3.1.2 Floating-point operations . . . . . . . . . . . . . . . . . . . 19
3.2 FPU-PE Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Construction of FPU-PE Cluster . . . . . . . . . . . . . . . 20
3.2.2 Construction of Array of FPU-PE Clusters . . . . . . . . . 21
3.2.3 Comparing Different FPU-PE Clusters . . . . . . . . . . . 23
3.3 Implementation of Multi-Cycle Operations . . . . . . . . . . . . 26
3.4 Implementation of Floating-Point Operations . . . . . . . . . . . 30
3.5 Implementation of Floating-Point Operations Using Shared Modules . . . 32
Chapter 4 Chip Implementation 35
4.1 Specification of Chip Implementation . . . . . . . . . . . . . . . . 35
4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Experimantal Results . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Performance Comparison . . . . . . . . . . . . . . . . . . . 39
4.3.2 Power Consumption Comparison . . . . . . . . . . . . . . 42
Chapter 5 Comparison with Other Architectures 45
5.1 Preparation for the comparison . . . . . . . . . . . . . . . . . . . 45
5.2 Comparison with PACT XPP . . . . . . . . . . . . . . . . . . . . . 47
5.3 Comparison with Butter Architecture . . . . . . . . . . . . . . . . 50
5.4 Implication of the proposed architecture . . . . . . . . . . . . . . 57
Chapter 6 Enhancement Techniques 63
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Conventional Approach . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.1 Base Architecture . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.2 Utilization of Floating-Point Operations . . . . . . . . . . 65
6.3 Proposed Enhancement Techniques . . . . . . . . . . . . . . . . . 66
6.3.1 Overlapping Technique . . . . . . . . . . . . . . . . . . . . 66
6.3.2 Forwarding Technique . . . . . . . . . . . . . . . . . . . . . 71
6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.4.1 Performance Comparison . . . . . . . . . . . . . . . . . . . 76
6.4.2 Hardware Cost of the Proposed Techniques . . . . . . . . . 77
6.4.3 Utilization Enhancement by the Proposed Techniques . . . 80
6.5 Comparison with Other Architecture . . . . . . . . . . . . . . . . 87
Chapter 7 Conclusion 93
Bibliography 95
국문초록 103
감사의 글 105
-
dc.formatapplication/pdf-
dc.format.extent2929851 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject재구성형 구조-
dc.subject부동소수점-
dc.subject.ddc621-
dc.titleFloating-point support for coarse-grained reconfigurable architectures-
dc.title.alternative재구성형 연산 구조를 위한 부동소수점 지원-
dc.typeThesis-
dc.contributor.AlternativeAuthorManhwee Jo-
dc.description.degreeDoctor-
dc.citation.pagesxvi, 106-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2014-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share