THE DISSERTATION DEFENSE FOR THE DEGREE OF Ph.D. IN COMPUTER SCIENCE FOR
Lidan Wang
Web search engines allow users to find information on almost any topic imaginable. To be successful, a search engine must return relevant information to the user in a short amount of time. However, efficiency (speed) and effectiveness (relevance) are competing forces that often counteract each other. It is often the case that methods developed for improving effectiveness incur moderate-to-large computational costs, thus sustained effectiveness gains typically have to be counter-balanced by buying more/faster hardware, implementing caching strategies if possible, or spending additional effort in low-level optimizations.
In this work, I will describe the "Learning to Efficiently Rank" framework for building highly effective ranking models for Web-scale data, without sacrificing run-time efficiency for returning results. I will introduce new classes of ranking models that have the capability of being simultaneously fast and effective, and I will demonstrate how to optimize the models for multiple metrics. The dissertation has the following contributions:
1) Given a desired tradeoff between effectiveness/efficiency, we introduce efficient linear models, which have a mechanism to directly optimize the tradeoff metric and achieve an optimal balance between effectiveness/efficiency.
2) We introduce "ranking under temporal constraints", which returns the most effective ranked results possible under a time constraint imposed on each query.
3) We propose a cascade ranking model for efficient top-K retrieval over Web-scale documents, such that ranking effectiveness and efficiency are simultaneously optimized by the cascade.
4) We propose a constrained version of the cascade that can return results within time constraints by simultaneously reducing document set size and unnecessary features.
5) We generalize the "Learning to Efficiently Rank" framework and discuss another important instance in the broader context of multi-objective learning to rank. We discuss how to construct robust ranking models -- in addition to performing well on average, the model should not perform very poorly for any individual query, relative to a baseline ranking model. We show how to jointly optimize model robustness and effectiveness.
Examining Committee:
Committee Chair: Dr. Jimmy Lin
Dean's Representative: Dr. Carol Espy-Wilson
Committee Members: Dr. Douglas Oard
Dr. Hal Daume III
Dr. Amol Deshpande
Dr. Donald Metzler