S-Space College of Engineering/Engineering Practice School (공과대학/대학원) Dept. of Computer Science and Engineering (컴퓨터공학부) Theses (Master's Degree_컴퓨터공학부)
Utilizing Genetic Algorithm to LambdaMART Forests to Predict Ranking Labels Accurately
랭킹 라벨을 정확히 예측하기 위한 유전 알고리즘의 LambdaMART 포레스트에 대한 적용
- Srinivasa Rao Satti
- 공과대학 컴퓨터공학부
- Issue Date
- 서울대학교 대학원
- 학위논문 (석사)-- 서울대학교 대학원 공과대학 컴퓨터공학부, 2017. 8. Srinivasa Rao Satti.
- In this thesis, principles of genetic algorithm (GA) will be applied to forests of LambdaMART to get more accurate ranking results. Ranking problem is considered one kind of prediction function problems, and various solutions were proposed for the ranking problem. Applying machine learning techniques has improved ranking quality of algorithm. One of the techniques is ensemble of decision tree learning where each tree is trained one by one and these trees are used to predict the result with the given input values.
LambdaMART is a fusion of LambdaRank and MART (Multiple Additive Regression Trees), where gradients of scores are calculated by LambdaRank and multiple trees are generated and trained with predefined steps in MART. LambdaMART is also main contributor for the winner of ``Yahoo! Learning to Rank Challenge (2010)" though the challenge reports that ranking solution performance has reached saturation point. However, LambdaMART might have problems about overfitting to training data, which means it could not predict outcome precisely on other unobserved data after being trained with data. In addition, genetic algorithm can provide greater searching ability for solution space though the ability depends on designing core operations such as crossover, mutation, and so on.
Combining this search ability with LambdaMART could enhance solution's quality and reduce some chance of overfitting to training data. Each LambdaMART forest will become a chromosome in this scheme, and multiple forests will be operands of genetic operations. This scheme shows higher accuracy measure value than original LambdaMART and total training time per forest has also been saved.