Information retrieval has for decades focused on finding digital documents, including documents that were born digital and those that have been digitized. But there are also enormous collections of physical documents, on paper or microfilm for example, that are not likely to be fully digitized in our lifetimes. For example, The U.S. National Archives and Records Administration (NARA) presently holds 11.7 billion pages, only about 2% of which is presently either in digital or digitized form. This is just one among many thousands of archival repositories; with more than 26,000 such repositories in just the United States, for example. Access to the culturally important materials that these repositories curate is presently mediated largely through high-level descriptions of entire collections that have been written by archivists, along with detailed descriptions of how some of those collections are organized. In this talk, we will describe a project in which we seek to build on that descriptive work, both by leveraging the limited amount of digitization that has been performed and by assembling descriptions of archival content from published materials such as journal articles or books. We’ll describe two sets of experiments. In the first, for U.S. State Department documents stored at NARA we asked whether we could guess which box to look in to satisfy a query, based on having digitized just a few documents from each box. In the second, we asked whether we could find citations to archival materials in scholarly literature. We’ll use the results of these experiments to motivate our broader research program in which we seek to model the content of unseen documents based on multiple sources of evidence about other documents in the same collection, and in which we seek to enrich that evidence by helping scholars who are working in archives to expand what we know about the contents of those repositories. This is joint work with David Doermann, Katrina Fenlon, Diana Marsh and Yoichi Tomiura.