Deep Network Regularization with Representation Shaping

최대영

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Deep Network Regularization with Representation Shaping

DC Field	Value	Language
dc.contributor.advisor	Rhee, Wonjong	-
dc.contributor.author	최대영	-
dc.date.accessioned	2019-05-07T06:34:03Z	-
dc.date.available	2019-05-07T06:34:03Z	-
dc.date.issued	2019-02	-
dc.identifier.other	000000155185	-
dc.identifier.uri	https://hdl.handle.net/10371/152565	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 융합과학기술대학원 융합과학부(디지털정보융합전공), 2019. 2. Rhee, Wonjong.	-
dc.description.abstract	The statistical characteristics of learned representations such as correlation and representational sparsity are known to be relevant to the performance of deep learning methods. Also, learning meaningful and useful data representations by using regularization methods has been one of the central concerns in deep learning. In this dissertation, deep network regularization using representation shaping are studied. Roughly, the following questions are answered: what are the common statistical characteristics of representations that high-performing networks share? Do the characteristics have a causal relationship with performance? To answer the questions, five representation regularizers are proposed: class-wise Covariance Regularizer (cw-CR), Variance Regularizer (VR), class-wise Variance Regularizer (cw-VR), Rank Regularizer (RR), and class-wise Rank Regularizer (cw-RR). Significant performance improvements were found for a variety of tasks over popular benchmark datasets with the regularizers. The visualization of learned representations shows that the regularizers used in this work indeed perform distinct representation shaping. Then, with a variety of representation regularizers, a few statistical characteristics of learned representations including covariance, correlation, sparsity, dead unit, and rank are investigated. Our theoretical analysis and experimental results indicate that all the statistical characteristics considered in this work fail to show any general or causal pattern for improving performance. Mutual information I(z	-
dc.description.abstract	x) and I(z	-
dc.description.abstract	y) are examined as well, and it is shown that regularizers can affect I(z	-
dc.description.abstract	x) and thus indirectly influence the performance. Finally, two practical ways of using representation regularizers are presented to address the usefulness of representation regularizers: using a set of representation regularizers as a performance tuning tool and enhancing network compression with representation regularizers.	-
dc.description.tableofcontents	Chapter 1. Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2. Generalization, Regularization, and Representation in Deep Learning 8 2.1 Deep Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Capacity, Overfitting, and Generalization . . . . . . . . . . . 11 2.2.2 Generalization in Deep Learning . . . . . . . . . . . . . . . . 12 2.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Capacity Control and Regularization . . . . . . . . . . . . . . 14 2.3.2 Regularization for Deep Learning . . . . . . . . . . . . . . . 16 2.4 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Representation Learning . . . . . . . . . . . . . . . . . . . . 18 2.4.2 Representation Shaping . . . . . . . . . . . . . . . . . . . . 20 Chapter 3. Representation Regularizer Design with Class Information 26 3.1 Class-wise Representation Regularizers: cw-CR and cw-VR . . . . . 27 3.1.1 Basic Statistics of Representations . . . . . . . . . . . . . . . 27 3.1.2 cw-CR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.3 cw-VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.4 Penalty Loss Functions and Gradients . . . . . . . . . . . . . 30 3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 Image Classification Task . . . . . . . . . . . . . . . . . . . 33 3.2.2 Image Reconstruction Task . . . . . . . . . . . . . . . . . . . 36 3.3 Analysis of Representation Characteristics . . . . . . . . . . . . . . . 36 3.3.1 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Layer Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4. Representation Characteristics and Their Relationship with Performance 42 4.1 Representation Characteristics . . . . . . . . . . . . . . . . . . . . . 43 4.2 Experimental Results of Representation Regularization . . . . . . . . 46 4.3 Scaling, Permutation, Covariance, and Correlation . . . . . . . . . . . 48 4.3.1 Identical Output Network (ION) . . . . . . . . . . . . . . . . 48 4.3.2 Possible Extensions for ION . . . . . . . . . . . . . . . . . . 51 4.4 Sparsity, Dead Unit, and Rank . . . . . . . . . . . . . . . . . . . . . 55 4.4.1 Analytical Relationship . . . . . . . . . . . . . . . . . . . . . 55 4.4.2 Rank Regularizer . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.3 A Controlled Experiment on Data Generation Process . . . . 58 4.5 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Chapter 5. Practical Ways of Using Representation Regularizers 65 5.1 Tuning Deep Network Performance Using Representation Regularizers 65 5.1.1 Experimental Settings and Conditions . . . . . . . . . . . . . 66 5.1.2 Consistently Well-performing Regularizer . . . . . . . . . . . 67 5.1.3 Performance Improvement Using Regularizers as a Set . . . . 68 5.2 Enhancing Network Compression Using Representation Regularizers 68 5.2.1 The Need for Network Compression . . . . . . . . . . . . . . 72 5.2.2 Three Typical Approaches for Network Compression . . . . . 73 5.2.3 Proposed Approaches and Experimental Results . . . . . . . 74 Chapter 6. Discussion 79 6.1 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.1.1 Usefulness of Class Information . . . . . . . . . . . . . . . . 79 6.1.2 Comparison with Non-penalty Regularizers: Dropout and Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.1.3 Identical Output Network . . . . . . . . . . . . . . . . . . . 82 6.1.4 Using Representation Regularizers for Performance Tuning . 82 6.1.5 Benefits and Drawbacks of Different Statistical Characteristics of Representations . . . . . . . . . . . . . . . . . . . . . . . 83 6.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.1 Understanding the Underlying Mechanism of Representation Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.2 Manipulating Representation Characteristics other than Covariance and Variance for ReLU Networks . . . . . . . . . . . . 86 6.2.3 Investigating Representation Characteristics of Complicated Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.3 Possible Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3.1 Interpreting Learned Representations via Visualization . . . . 88 6.3.2 Designing a Regularizer Utilizing Mutual Information . . . . 89 6.3.3 Applying Multiple Representation Regularizers to a Network . 90 6.3.4 Enhancing Deep Network Compression via Representation Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Chapter 7. Conclusion 93 Bibliography 94 Appendix 103 A Principal Component Analysis of Learned Representations . . . . . . 104 B Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Acknowlegement 113	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject.ddc	004	-
dc.title	Deep Network Regularization with Representation Shaping	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Daeyoung Choi	-
dc.description.degree	Doctor	-
dc.contributor.affiliation	융합과학기술대학원 융합과학부(디지털정보융합전공)	-
dc.date.awarded	2019-02	-
dc.identifier.uci	I804:11032-000000155185	-
dc.identifier.holdings	000000000026▲000000000039▲000000155185▲	-

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Transdisciplinary Studies(융합과학부)
  - Theses (Ph.D. / Sc.D._융합과학부)

Files in This Item:

000000155185.pdf 25.55 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share