S-Space College of Engineering/Engineering Practice School (공과대학/대학원) Dept. of Electrical and Computer Engineering (전기·정보공학부) Theses (Ph.D. / Sc.D._전기·정보공학부)
Parallelized Implementation of Full High Definition H.264 Decoder on Embedded Multi-core
임베디드 다중 코어에서의 초고화질 H.264 복호기 병렬 구현
- 공과대학 전기·컴퓨터공학부
- Issue Date
- 서울대학교 대학원
- 학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 최진영.
- In this paper, we deal with the problem of parallelization in implementing H.264/AVC(Advanced Video Coding) decoder on embedded multi-core system. For this purpose, we suggest a parallelization strategy for embedded multi-core system. The parallelization strategy can be applied to not only H.264/AVC decoder, but also a general application for parallelization on the embedded multi-core system. In addition, we propose two specific parallelization methods called dynamic load balancing and hybrid partitioning. We show the validities of the proposed methods through the implementation of two embedded multi-core platforms for H.264/AVC decoder. One is dual-core system with 3 hardware accelerators, and the other one is quad-core system with 2 co-processors.
On dual-core system, H.264/AVC decoder is parallelized with a few hardware accelerators by the proposed parallelization strategy. For that system, functional partitioning is selected by the proposed parallelization strategy, which enables simple interface with hardware accelerator and small memory usage for inter-core communication. We also propose dynamic load balancing method for the functional partitioning. The load balancing is achieved by mapping a few selected functions to each core dynamically at macroblock level. In this case, buffer level information is enough for making decision which core runs those functions. Because of this simple decision criterion and mechanism, performance loss for load balancing process can be negligible and it is also possible to extend the proposed load balancing method to multi-core systems easily. Experimental result shows that the proposed load balancing method reduces the waiting overhead dramatically and the reduced amount is 82.3% of the total waiting overhead.
For quad-core system, we propose a new partitioning method called hybrid partitioning by adopting the proposed parallelization strategy. Partitioning is a very important issue for the mapping of application software on multi-core systems. In this paper we propose a hybrid partitioning, mixture of functional and data partitioning methods. Each module is partitioned by functional partitioning or data partitioning depending on the modules features. Compared with functional and data partitioning, the hybrid partitioning is as powerful as data partitioning for load balancing between cores, and it is also as efficient as functional partitioning in the view point of memory requirement. Hybrid partitioning is also free from the macroblock level dependency problem which data partitioning usually has in video decoding. As a result of applying hybrid partitioning, we can reduce 86.0% of waiting overhead compared with functional partitioning. Regarding memory usage, hybrid partitioning requires 51.2% less VLIW (Very Long Instruction Word) program memory and 62.0% less CGRA (Coarse-Grained Reconfigurable Array) program memory than data partitioning. As for SDRAM (Synchronous Dynamic Random-Access Memory) bandwidth, compared with data partitioning, hybrid partitioning uses 11.6% of the whole bandwidth budget of 333MHz SDRAM memory used in experiments