NetSys Datasets

Wearing Many (Social) Hats: How Different are Your Different Social Network Personae?

This paper investigates when users create profiles in different social networks, whether they are redundant expressions of the same persona, or they are adapted to each platform. Using the personal webpages of 116,998 users on, we identify and extract matched user profiles on several major social networks including Facebook, Twitter, LinkedIn, and Instagram. We find evidence for distinct site-specific norms, such as differences in the language used in the text of the profile self-description, and the kind of picture used as profile image. ...More

Paper Dataset

Predicting Pinterest: Automating a Distributed Human Computation.

Everyday, millions of users save content items for future use on sites like Pinterest, by "pinning" them onto carefully categorised personal pinboards, thereby creating personal taxonomies of the Web. This paper seeks to understand Pinterest as a distributed human computation that categorises images from around the Web. ...More

Paper Dataset

Social Bootstrapping: How Pinterest and Social Communities Benefit by Borrowing Links from Facebook.

How does one develop a new online community that is highly engaging to each user and promotes social interaction? A number of websites offer friend-finding features that help users bootstrap social networks on the website by copying links from an established network like Facebook or Twitter. This paper quantifies the extent to which such social bootstrapping is effective in enhancing a social experience of the website. ...More

Paper Talk slides Dataset

Sharing the Loves: Understanding the How and Why of Online Content Curation.

This paper looks at how and why users categorise and curate content into collections online, using datasets containing nearly all the relevant activities from during January 2013, and in December 2012. In addition, a user survey of over 25 Pinterest and 250 users is used to obtain insights into the motivations for content curation and corroborate results. ...More

Paper Blog Dataset

Wi-Stitch: Content Delivery in Converged Edge Networks.

Wi-Fi, the most commonly used access technology at the very edge, supports download speeds that are orders of magnitude faster than the average home broadband or cellular data connection. Furthermore, it is extremely common for users to be within reach of their neighbours' Wi-Fi access points. Given the skewed nature of interest in content items, it is likely that some of these neighbours are interested in the same items as the users. We sketch the design of Wi-Stitch, an architecture that exploits these observations to construct a highly efficient content sharing infrastructure at the very edge and show through analysis of a real workload that it can deliver substantial (up to 70%) savings in network traffic. The Wi-Stitch approach can be used both by clients of fixed-line broadband, as well as mobile devices obtaining indoors access in converged networks....More

Paper Dataset

Illuminating an Ecosystem of Partisan Websites.

This paper aims to shed light on alternative news media ecosystems that are believed to have influenced opinions and beliefs by false and/or biased news reporting during the 2016 US Presidential Elections. We examine a large, professionally curated list of 668 hyper-partisan websites and their corresponding Facebook pages, and identify key characteristics that mediate the traffic flow within this ecosystem. We uncover a pattern of new websites being established in the run up to the elections, and abandoned after. Such websites form an ecosystem, creating links from one website to another, and by `liking' each others' Facebook pages. ...More

Paper Dataset

Tweeting MPs: Digital Engagement between Citizens and Members of Parliament in the UK

Disengagement and disenchantment with the Parliamentary process is an important concern in today's Western democracies. Members of Parliament (MPs) in the UK are therefore seeking new ways to engage with citizens, including being on digital platforms such as Twitter. In recent years, nearly all (579 out of 650) MPs have created Twitter accounts, and have amassed huge followings comparable to a sizable fraction of the country's population. This paper seeks to shed light on this phenomenon by examining the volume and nature of the interaction between MPs and citizens. We find that although there is an information overload on MPs, attention on individual MPs is focused during small time windows when something topical may be happening relating to them...More

Paper Dataset

Characterising Third Party Cookie Usage in the EU after GDPR

The recently introduced General Data Protection Regulation (GDPR) requires that when obtaining information online that could be used to identify individuals, their consents must be obtained. Amongother things, this affects many common forms of cookies, and users in the EU have been presented with notices asking their approvals for data collection. This paper examines the prevalence of third party cookies before and after GDPR by using two datasets: accesses to top 500 websites according to, and weekly data of cookies placed in users' browsers by websites accessed by 16 UK and China users across one year...More

Paper Dataset

Stop tracking me Bro! Differential Tracking of User Demographics on Hyper-Partisan Websites

Websites with hyper-partisan, left or right-leaning focus offer content that is typically biased towards the expectations of their target audience. Such content often polarizes users, who are repeatedly primed to specific (extreme) content, usually reflecting hard party lines on political and socio-economic topics. Though this polarization has been extensively studied with respect to content, it is still associates with the online tracking experienced by browsing users, especially when they exhibit certain demographic characteristics...More

Paper Dataset

Characterizing User Content on a Multi-lingual Social Network

Social media has been on the vanguard of political infor-mation diffusion in the 21st century. Most studies that lookinto disinformation, political influence and fake-news focuson mainstream social media platforms. This has inevitablymade English an important factor in our current understand-ing of political activity on social media. As a result, therehave been a very limited number of representative studies ona large section of the democratic world, including the largest,multilingual and multicultural democracy: India....More

Paper Dataset

Wikipedia and Westminster: Quality and Dynamics of Wikipedia Pages about UK Politicians

Wikipedia is a major source of information providing a large variety of content online, trusted by readers from around the world. Readers go to Wikipedia to get reliable information about different subjects, one of the most popular being living people, and especially politicians. While a lot is known about the general usage and information consumption on Wikipedia, less is known about the life-cycle and quality of Wikipedia articles in the context of politics. The aim of this study is to quantify and qualify content production and consumption for articles about politicians, with a specific focus on UK Members of Parliament (MPs)....More

Paper Dataset

Under the Spotlight: Web Tracking in Indian Partisan News Websites

India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters without Borders, we label these websites according to their partisanship (Left, Right, or Centre). We study and compare user tracking on these sites with different metrics: numbers of cookies, cookie synchronizations, device fingerprinting, and invisible pixelbased tracking. We find that Left and Centre websites serve more cookies than Right-leaning websites.....More

Paper Dataset

Differential Tracking Across Topical Webpages of Indian News Media

Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties employ on websites around the world, as well as the intensity of tracking performed on such websites. However, for the sake of scaling to cover a large portion of the Web, most past studies focused on homepages of websites, and did not look deeper into the tracking practices on their topical subpages. The majority of studies focused on the Global North markets such as the EU and the USA. Large markets such as India, which covers 20% of the world population and has no explicit privacy laws, have not been studied in this regard....More

Paper Dataset

Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case Study of WhatsApp

WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, spam on WhatsApp is an important issue. Despite this, the distribution of spam via WhatsApp remains understudied by researchers, in part because of the end-to-end encryption offered by the platform. This paper addresses this gap by studying spam on a dataset of 2.6 million messages sent to 5,051 public WhatsApp groups in India over 300 days. First, we characterise spam content shared within public groups and find that nearly 1 in 10 messages is spam. We observe a wide selection of topics ranging from job ads to adult content, and find that spammers post both URLs and phone numbers to promote material. Second, we inspect the nature of spammers themselves. ...More

Paper Dataset

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

Online forums that allow participatory engagement between users have been transformative for public discussion of important issues. However, debates on such forums can sometimes escalate into full blown exchanges of hate or misinformation. An important tool in understanding and tackling such problems is to be able to infer the argumentative relation of whether a reply is supporting or attacking the post it is replying to. This so called polarity prediction task is difficult because replies may be based on external context beyond a post and the reply whose polarity is being predicted. We propose GraphNLI, a novel graph-based deep learning architecture that uses graph walk techniques to capture the wider context of a discussion thread...More

Paper Dataset