log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
MINECORE – MINimizing the Expected COst of Review in Ediscovery
Wednesday, April 20, 2016, 11:00-11:30 am Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

In civil litigation, e-discovery is the process by which a set of documents deemed “responsive” (i.e., relevant) to a certain topic need to be identified within a universe D of documents, and have to be produced by a producing party to a requesting party unless they are also “privileged”, i.e., contain sensitive information that legally allows the producing party to withhold them. In this process the producing party may incur costs of two types, namely, annotation costs (deriving from the fact that human annotators need to be paid for their work) and misclassification costs (deriving from the fact that failing to correctly determine the responsiveness and/or privilege of documents may damage the producing party in various ways). Relying exclusively on automatic classification would minimize annotation costs but bring about significant misclassification costs, while relying exclusively on manual classification would generate opposite consequences.

 

We propose a risk minimization framework in which we try to strike a balance between these two extreme stands. In our framework (a) the documents are first automatically labeled for both responsiveness and privilege, and (b) a portion of the automatically labeled data is then relabeled (i.e., validated) by human annotators, annotating for responsiveness (junior annotators) and privilege (senior annotators), with the overall goal of minimizing the expected cost (i.e., the risk) of the entire process. Risk minimization is obtained by optimizing, for both responsiveness and privilege, (a) the choice of which documents to re-label, and (b) the choice of how many of them to re-label.

Bio

Jyothi Vinjumur is a PhD student at the University of Maryland, College Park, in the College of Information Studies. Her PhD advisor is Dr. Douglas Oard. Her research interests center around the use of Information Retrieval, Information Visualization and Applied Machine Learning Techniques to support end users (eg. Lawyers) seek the information they want to find in a cost-effective manner.

 

Additional information is available at http://terpconnect.umd.edu/~jyothikv/

 

This talk is organized by Naomi Feldman