Browse

Deep Network Regularization with Representation Shaping

DC Field Value Language
dc.contributor.advisorRhee, Wonjong-
dc.contributor.author최대영-
dc.date.accessioned2019-05-07T06:34:03Z-
dc.date.available2019-05-07T06:34:03Z-
dc.date.issued2019-02-
dc.identifier.other000000155185-
dc.identifier.urihttp://hdl.handle.net/10371/152565-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 융합과학기술대학원 융합과학부(디지털정보융합전공), 2019. 2. Rhee, Wonjong.-
dc.description.abstractThe statistical characteristics of learned representations such as correlation and representational sparsity are known to be relevant to the performance of deep learning methods. Also, learning meaningful and useful data representations by using regularization methods has been one of the central concerns in deep learning. In this dissertation, deep network regularization using representation shaping are studied. Roughly, the following questions are answered: what are the common statistical characteristics of representations that high-performing networks share? Do the characteristics have a causal relationship with performance? To answer the questions, five representation regularizers are proposed: class-wise Covariance Regularizer (cw-CR), Variance Regularizer (VR), class-wise Variance Regularizer (cw-VR), Rank Regularizer (RR), and class-wise Rank Regularizer (cw-RR). Significant performance improvements were found for a variety of tasks over popular benchmark datasets with the regularizers. The visualization of learned representations shows that the regularizers used in this work indeed perform distinct representation shaping. Then, with a variety of representation regularizers, a few statistical characteristics of learned representations including covariance, correlation, sparsity, dead unit, and rank are investigated. Our theoretical analysis and experimental results indicate that all the statistical characteristics considered in this work fail to show any general or causal pattern for improving performance. Mutual information I(z-
dc.description.abstractx) and I(z-
dc.description.abstracty) are examined as well, and it is shown that regularizers can affect I(z-
dc.description.abstractx) and thus indirectly influence the performance. Finally, two practical ways of using representation regularizers are presented to address the usefulness of representation regularizers: using a set of representation regularizers as a performance tuning tool and enhancing network compression with representation regularizers.-
dc.description.tableofcontentsChapter 1. Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 2. Generalization, Regularization, and Representation in Deep Learning 8

2.1 Deep Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Capacity, Overfitting, and Generalization . . . . . . . . . . . 11

2.2.2 Generalization in Deep Learning . . . . . . . . . . . . . . . . 12

2.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.1 Capacity Control and Regularization . . . . . . . . . . . . . . 14

2.3.2 Regularization for Deep Learning . . . . . . . . . . . . . . . 16

2.4 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.1 Representation Learning . . . . . . . . . . . . . . . . . . . . 18

2.4.2 Representation Shaping . . . . . . . . . . . . . . . . . . . . 20

Chapter 3. Representation Regularizer Design with Class Information 26

3.1 Class-wise Representation Regularizers: cw-CR and cw-VR . . . . . 27

3.1.1 Basic Statistics of Representations . . . . . . . . . . . . . . . 27

3.1.2 cw-CR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.3 cw-VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.4 Penalty Loss Functions and Gradients . . . . . . . . . . . . . 30

3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 Image Classification Task . . . . . . . . . . . . . . . . . . . 33

3.2.2 Image Reconstruction Task . . . . . . . . . . . . . . . . . . . 36

3.3 Analysis of Representation Characteristics . . . . . . . . . . . . . . . 36

3.3.1 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Layer Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Chapter 4. Representation Characteristics and Their Relationship with Performance 42

4.1 Representation Characteristics . . . . . . . . . . . . . . . . . . . . . 43

4.2 Experimental Results of Representation Regularization . . . . . . . . 46

4.3 Scaling, Permutation, Covariance, and Correlation . . . . . . . . . . . 48

4.3.1 Identical Output Network (ION) . . . . . . . . . . . . . . . . 48

4.3.2 Possible Extensions for ION . . . . . . . . . . . . . . . . . . 51

4.4 Sparsity, Dead Unit, and Rank . . . . . . . . . . . . . . . . . . . . . 55

4.4.1 Analytical Relationship . . . . . . . . . . . . . . . . . . . . . 55

4.4.2 Rank Regularizer . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4.3 A Controlled Experiment on Data Generation Process . . . . 58

4.5 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Chapter 5. Practical Ways of Using Representation Regularizers 65

5.1 Tuning Deep Network Performance Using Representation Regularizers 65

5.1.1 Experimental Settings and Conditions . . . . . . . . . . . . . 66

5.1.2 Consistently Well-performing Regularizer . . . . . . . . . . . 67

5.1.3 Performance Improvement Using Regularizers as a Set . . . . 68

5.2 Enhancing Network Compression Using Representation Regularizers 68

5.2.1 The Need for Network Compression . . . . . . . . . . . . . . 72

5.2.2 Three Typical Approaches for Network Compression . . . . . 73

5.2.3 Proposed Approaches and Experimental Results . . . . . . . 74

Chapter 6. Discussion 79

6.1 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.1.1 Usefulness of Class Information . . . . . . . . . . . . . . . . 79

6.1.2 Comparison with Non-penalty Regularizers: Dropout and Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.1.3 Identical Output Network . . . . . . . . . . . . . . . . . . . 82

6.1.4 Using Representation Regularizers for Performance Tuning . 82

6.1.5 Benefits and Drawbacks of Different Statistical Characteristics of Representations . . . . . . . . . . . . . . . . . . . . . . . 83

6.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2.1 Understanding the Underlying Mechanism of Representation Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2.2 Manipulating Representation Characteristics other than Covariance and Variance for ReLU Networks . . . . . . . . . . . . 86

6.2.3 Investigating Representation Characteristics of Complicated Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.3 Possible Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.3.1 Interpreting Learned Representations via Visualization . . . . 88

6.3.2 Designing a Regularizer Utilizing Mutual Information . . . . 89

6.3.3 Applying Multiple Representation Regularizers to a Network . 90

6.3.4 Enhancing Deep Network Compression via Representation Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Chapter 7. Conclusion 93

Bibliography 94

Appendix 103

A Principal Component Analysis of Learned Representations . . . . . . 104

B Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Acknowlegement 113
-
dc.language.isoeng-
dc.publisher서울대학교 대학원-
dc.subject.ddc004-
dc.titleDeep Network Regularization with Representation Shaping-
dc.typeThesis-
dc.typeDissertation-
dc.contributor.AlternativeAuthorDaeyoung Choi-
dc.description.degreeDoctor-
dc.contributor.affiliation융합과학기술대학원 융합과학부(디지털정보융합전공)-
dc.date.awarded2019-02-
dc.identifier.uciI804:11032-000000155185-
dc.identifier.holdings000000000026▲000000000039▲000000155185▲-
Appears in Collections:
Graduate School of Convergence Science and Technology (융합과학기술대학원)Dept. of Transdisciplinary Studies(융합과학부)Theses (Ph.D. / Sc.D._융합과학부)
Files in This Item:
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse