
Thomson Paris Research Lab, Batiment Campus, 46 Quai A. Le Gallo, 92648 Boulogne-Billancourt Cedex
A network data set represents entities and the connections between them. Network data can describe a variety of domains: a social network describes individuals connected by personal relationships; an information network might describe a set of articles connected by citations. Network data is extremely valuable to analysts seeking to study the structure and function of networks and processes that occur in networks. Network analysts study the influence of individuals in organizations, disease transmission in communities, the operation of computer networks, and the emergent behavior of physical and biological systems. While network data can now be collected in unprecedented scale, it often describes relationships that are sensitive. Releasing the data can result in unacceptable disclosures, and privacy concerns are constraining network science. In this talk, I will describe threats to anonymity posed by published networks, and recent work on resisting these threats. I will focus on the threat of structural re-identification, in which an individual's local relationships can reveal identities of individuals even when names (and other identifiers) are removed from the network. Re-identification risk depends on the power of the adversary and also the naturally-occurring structural diversity in the graph. I will describe models of adversary knowledge and evaluate their impact on anonymity using both empirical results on real networks and theoretical analysis of random graphs. Finally, I will describe an anonymization technique based on graph clustering which can accurately preserve global properties of networks while protecting against anonymity threats. Joint work with M. Hay, D. Jensen, G. Miklau, P. Weis
Laurent Massoulie