My research focuses on Document Intelligence where the goal is to develop Artificial Intelligence (AI) systems that can understand, interpret, and extract information contained in semi-structured documents such as digital PDFs, forms, receipts, contracts, infographics, etc. In today’s digitally connected world, documents play an increasingly central role in human communications and workplace productivity. Every day, billions of documents are created, consumed, collaborated on, and edited, however the majority of such interactions are manual or rule-based semi-automated. The real challenge lies in enabling structural and semantic understanding of documents so that users can easily extract relevant information in an automated fashion without the loss of critical information. For example, the extraction of important dates mentioned in a contract and aligning the related events on a timeline may help improve the efficiency of lawyers in their day-to-day job. Similarly, extracting itemized pairs from scanned receipts can help make accounting easier. My research broadly binds together the semantic (document-level information extraction) and structural (document image analysis) aspects of document intelligence to advance user productivity tools. My core thesis focuses on the fact that real-world documents are characteristically long-range, consist of multiple paragraphs, semi-structured, and require contextualization over long sequence lengths and multi-hop reasoning. I tackle these challenges by proposing novel multimodal (linguistic-spatial-visual) deep learning methods for document structure extraction (LayerDoc, WACV ‘23), and explore their applications for language-guided document editing (DocEdit, AAAI ‘22). Subsequently, I explore combining Transformer language models and graph neural networks to solve document-level information extraction tasks in unstructured text for natural language inference (DocInfer, EMNLP ‘22).
Puneet is Ph.D. candidate in CS at the University of Maryland, College Park, advised by Prof. Dinesh Manocha. His research is focused on document information extraction and manipulation, and long-context multimodal understanding. He completed his Masters in Computer Science from UMD in 2021 and Bachelors in Engineering (B.E.) in Computer Engineering from NSIT (Delhi University). He has previously interned at Dataminr, Adobe Research and Meta AI (previously Facebook).