Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

Abstract

Social network data are often constructed by incorporating reports from multiple individuals. However, it is not obvious how to reconcile discordant responses from individuals. There may be particular risks with multiply-reported data if people’s responses reflect normative expectations – such as an expectation of balanced, reciprocal relationships. Here, we propose a probabilistic model that incorporates ties reported by multiple individuals to estimate the unobserved network structure. In addition to estimating a parameter for each reporter that is related to their tendency of over- or under-reporting relationships, the model explicitly incorporates a term for mutuality, the tendency to report ties in both directions involving the same alter. Our model’s algorithmic implementation is based on variational inference, which makes it efficient and scalable to large systems. We apply our model to data from 75 Indian villages collected with a name-generator design, and a Nicaraguan community collected with a roster-based design. We observe strong evidence of mutuality in both datasets, and find that this value varies by relationship type. Consequently, our model estimates networks with reciprocity values that are substantially different than those resulting from standard deterministic aggregation approaches, demonstrating the need to consider such issues when gathering, constructing, and analysing survey-based network data.

Publication
Journal of the Royal Statistical Society Series A: Statistics in Society, qnac004
Caterina De Bacco
Caterina De Bacco
CyberValley Research Group Leader

My research focuses on understanding, optimizing and predicting relations between the microscopic and macroscopic properties of complex large-scale interacting systems.

Martina Contisciani
Martina Contisciani
PhD student

My research focuses on the analysis of network data using statistical tools. My background is in Theoretical and Applied Statistics and I am interested in discovering new techniques, approaches and perspectives used in the analysis of data. I have been working on a project focused on modeling covariate information in community detection algorithms and I am involved in investigating the conditional independence assumption, underlying the statistical inference on network data.

Hadiseh Safdari
Hadiseh Safdari
Postdoctoral researcher

My current research revolves around inference and modeling in networks. More precisely, we aim to relax the independence assumptions in generative models by deploying hidden variables, and establishing analytical approximations to make the inference problem tractable.

Diego Baptista Theuerkauf
Diego Baptista Theuerkauf
PhD student

My research focuses on analising graph-based approximations of solutions of optimal transportation problems. We use biologically-inspired models to find transport plans for many different routing frameworks.

Related