log in  |  register  |  feedback?  |  help  |  web accessibility
Disclosure avoidance in open science: Addressing sociotechnical privacy challenges with a human-centered approach
Thursday, July 9, 2026, 2:00-3:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Social, behavioral, and health scientists are increasingly expected to de-identify and publish data about research participants in order to bolster reproducibility, empower meta-analysis, and create transparency. However, sharing data puts research participants at risk of harm, and disclosure avoidance (also referred to as de-identification) is a difficult task that fundamentally lacks objectively correct solutions for balancing privacy and utility. In this dissertation, I confront the inherent tensions of disclosure avoidance from a practitioner-centered perspective, focusing on scientists who collect and publish data about people. Through this work, I ultimately aim to help scientists—as well as the policymakers, repository curators, research participants, and others involved in the production and publication of data—make more informed decisions that account for privacy, utility, and ethics.

First, to better understand how scientists are currently informed about threats and strategies, I conducted a thematic analysis of 38 recent online de-identification guides. I characterize techniques and attacks, and I identify some concerning patterns around the definition of key terms, coverage of threats, and the usability of guides. Next, to investigate how scientists navigate the tensions surrounding de-identification in practice, I conducted semi-structured interviews with 24 scientists who have de-identified data for publication. I find that scientists account for important risks, but they address them through manual and social processes rather than systematic assessments of risk across the dataset. I explore why scientists take this approach and highlight three main barriers to stronger de-identification of research data related to threat modeling, incentives, and tools. Finally, to explore the design of software tools that implement systematic disclosure avoidance methods, I conducted an exploratory user study of two tools with 23 experienced scientists. I describe how scientists interpret the outputs of these tools, particularly focusing on how they decide whether the de-identified data has acceptable utility and privacy. In particular, I highlight the pressing need for tools to help scientists understand disclosure risk, and I recommend specific disclosure avoidance workflows and interfaces for communicating outcomes that support scientists’ decision-making processes.

To conclude my dissertation, I present a roadmap for bridging the gap between disclosure avoidance theory and practice, by developing new methods and tools and by integrating these resources into scientific infrastructure.

Bio

Wentao Guo is a PhD candidate in computer science at the University of Maryland. As a human-centered security and privacy researcher, he is interested in how people think and behave in a world filled with complex technological threats. His research focuses particularly on how security and privacy tasks are operationalized by practitioners including social and medical scientists, tech journalists, and software developers.

This talk is organized by Sarah Tucker