I’ve been fascinated by the public furore about the emotional contagion in Facebook research published this month in the Proceedings of the National Academy of Sciences. I’m an internet ethics junkie and, genuinely, am astonished by the response. Perhaps my reaction to this is because I’ve had the occasion to think and speak so much about this that I forget how little other people know about what’s actually going on to protect the individual when doing this kind of work inside the ivory tower.

So here’s the ethics section from my PhD, which considers many of the issues raised in my own work, which was complicated and (yes) online, and involved a sample of approximately 47,000 users of the virtual world Second Life between 2005-2007. It tackles with informed consent, anonymity, contagion, big data and social network analysis.

Ethical considerations
Online research and sociometry add complications to the study of human subjects that are not already addressed in existing ethical guidelines (Ess & AoIR ethics working committee, 2002). First, online settings like Second Life challenge the concept of identity; anonymity renders the offline self unknown, but the online self is pseudonymous. This has implications for sociometric data gathering techniques and outcomes that use non-anonymised identity indicators to create the tools of analysis. Second, both online and social network research present challenges for gaining consent from secondary sources, those people implicated as relationships in the Second Life network. Third, whole network analyses that use automated data extraction techniques challenge identity ownership for the participants, the virtual world owners and the investigator. These issues are discussed in this section.

Sociometric surveys ask specific questions to provide the most benefit for determining the makeup of relationships (e.g., Who are your three best friends?) or the flow of expertise information (e.g., Who would you go to for information on X?). These questions challenge the ethical prerequisites for anonymity and confidentiality.
In traditional social science, this can be assured using common precautions; for example, attribute information can be collected without the need to know participants’ identities. Further, as Klovdahl (2005) explains, personal network information and contacts’ attributes can also be collected in a way as to assure anonymity of the respondent.

However, Social Network Analysis is predicated on the identities of respondents and the identities of those people they list as friends/acquaintances/sources of information, and therefore the conditions of anonymity become problematic. Anonymity is not possible in the network collection paradigm as identifying information is used to facilitate the development of the network map (Nyblom, Borgatti, Roslakka, & Salo, 2003; Borgatti & Molina, 2005). As Kadushin (2005) argues, “the collection of names of either individuals or social units is not incidental to the research but its very point” (p. 141).

Social network analysts argue that the lack of anonymity in data collection should not deter social network research from going forward, but the onus for establishing the ethical use of sensitive identifying information is on the researcher. Of particular importance is the removal of any identifying features or attributes from the data as soon as possible, and certainly before dissemination of findings. While this has relevance for raw, “as collected” data in databases, this also has implications for network maps. As Nyblom and colleagues (2003) explain, the network graphs used to deduce important information such as position and structure has “a 1 to 1 correspondence with that person’s filled out questionnaire, completely revealing that person’s responses,” (p. 341).

Although in Second Life, participants were not identified by their offline names, pseudonyms function as real names in the online community. Offline, researchers would change the names of participants to protect respondents from potential harm, however, some researchers have assumed that the pseudonym used in the virtual community can be used identifiably in the research (Dibbell, 1999; Walsh, 2004). The results can be both embarrassing and harmful to actors’ social reputations in their communities.

In addition, Internet methodologists have described virtual communities as extremely sensitive; a breach in trust can destabilise the foundations upon which the online group rests, and some research and media activity has caused the undoing of previously thriving online interactions (Whiteman, 2007; White, 2002). Goal-oriented online communities and social networking sites have a stronger sense of stability than social virtual worlds because they are predicated upon the pre-determined goal systems bestowed by the designers or upon relationships developed offline. Social virtual worlds rest upon the social collectives that exist within their boundaries. Ethical transgressions can result in power shifts and mass migrations in protest of research activity to other sites can change the fabric of the community.

In this thesis, all pseudonyms were anonymised. Analyses only referred to unique user IDs, and no identifying features were included to serve as clues for both insiders and for outsiders unfamiliar with the social context.

Secondary sources
Participation in Social Network Analysis is not confined to the primary participants. It is reliant upon the non-anonymous named contacts provided by respondents to social network surveys. While this information presents valuable information for analysis of who relates to whom, it can be argued by ethics committees that the named parties are now participants in the research who have not given their informed consent.

The argument lies with the true ownership of responses to sociometric surveys: Borgatti and Molina (2003) explain that information gleaned about secondary sources is the primary respondent’s perception of a relationship s/he has with another person, “which is clearly something respondents have a right to do: every respondent owns their own perceptions,” (p. 339).

Further support for this position comes from Klovdahl (2005) who explains that under the terms defined by US Institutional Research Boards’ (IRBs) definitions of human subjects, those people named by participants who do not participate themselves do not fall into this category, particularly when attribute information is gleaned from publicly available sources. These perspectives rely upon parallels with observational research, like participant observation and ethnography, arguing that collecting data on individuals is not ipso facto unethical, or studies which utilise these methods would need to gather consent from all parties observed.

However, it is plausible that a relationship is comprised of two actors, and neither party can ethically report on it without the consent of the other (Borgatti & Molina, 2003). Further, while the perceptions of a contact are owned by the primary respondent, Klovdahl (2005) argues that the information which may be divulged in the course of sociometric interview or survey may be believed to be private by the secondary source, and so these parties are human subjects under the IRB’s Common Rule. It then becomes the researcher’s responsibility to obtain consent from all parties included in the analysis (primary respondents and contacts) or to obtain waivers of consent.
Gaining consent from egocentric, large-scale social network research studies like those proposed in this thesis was unfeasible because of the sheer numbers of potential contacts arising from survey responses who may or may have not been contactable. Klovdahl (2005) suggests that there is no convenient way of accessing consent from parties listed by participants unless the collection method is a snowball sample.

In this research, the approach was taken that participants owned the rights to report on their contacts. However, as part of the sampling strategy (see p. 66) a random sample of contacts that were generated in the sociometric surveys was contacted to inform them of their involvement in this research to offer them an opportunity to participate. This was limited by the practicalities of the contact method in Second Life, which featured reduced space for typed messages. Potential participants were directed to the consent forms at the beginning of the online surveys, which offered an overview of the research and an opt-out clause.

The problem remained with those parties who could not be contacted or, when approached for consent, ignored the request. Unless potential participants explicitly opted-out of the entire study, non-response did not mean that their names did not come up in this thesis’ sociometric analysis. In other words, if others included the non-respondent’s name on a list of contacts, s/he was represented in the resulting relationship matrix for analysis. Klovdahl, 2005 argues that in studies of this scale, it is possible to waive consent requirements by meeting four criteria: the research must not involve greater than minimal risk, the research must not be practical without a waiver, waiving consent does not adversely affect participants’ rights and pertinent information is fed back to the community in an anonymous format that does not implicate any single individual in the network (Bruckman, 2002). These were met in this study, as participation was voluntary, with an opt-out option at any time. Participants who chose to opt-out were granted true opt-out: the participant and all record of his or her connections (including those produced by other participants) from the final analysis (Borgatti & Molina, 2005) was removed from the final analysis within 14 days of the receipt of the request (Klovdahl, 2005).

It operated under a confidentiality agreement and no identifying information was released. Further, it was impractical to obtain consent from every account holder in the large dataset of contacts and/or the non-traceable parties in the virtual world. Finally, generalised findings were fed back to the community at the SSRL and via the research website and blog.

Special ethical considerations for automated data collection techniques
The use of Linden Lab data from the databases of the commercial company inspired new ethical questions. The Linden Lab Terms of Service (ToS) did explicitly state that the company collected personal information about their customers and that the company did not disclose personal information to third parties without the permission of the customers. Some of this data was knowingly generated by the customers themselves and other information was generated from their actions during their use of the Second Life application; however, the data collection techniques used in this study required closer inspection of the privacy issues surrounding automatic data extraction, data warehousing and data access for four reasons

The first issue concerned ownership of online community account data (Estivill-Castro & Brancovitch, 1999). Companies collect information about their consumers for market outcomes, but in virtual worlds, where people readily exchange social and economic information, personal data ownership is exchanged for the service in the End-User License Agreement which customers must consent to in order to create an account. Individuals may understand and accept company use of their personal data to an extent, but privacy concerns arise when this data is used for secondary purposes for which the individual has not provided authorisation. Businesses like Linden Lab may claim a right to mine the data, as described in Item 6.2 of their Terms of Service, but their customers may still have viewed this as an infringement of their personal privacy.

Second, automatically extracted data becomes ethically sensitive when the interests of the data users (e.g., businesses and organizations) are not balanced with the interests of the data subjects (e.g., customers). Fule and Roddick (2004) argue that this becomes an ethical issue when the results of analysis are used in decision-making that affects the people concerned, or when the mining compromises privacy.

The data content that was gained from Linden Lab included a combination of the personal data that new customers provided at account creation (e.g., demographics) and Second Life usage data (e.g., network relationships, community involvement). It was accessed from the data warehouse where account information for each customer was stored.

Third, there was concern that this kind of content may have been confounded both in terms of research validity and usefulness when the data accessed was granulated or obscured in such a way as the recipient would have been unable to gain any insight from it (Van Wel & Royakkes, 2004). For this reason, Linden Lab associated avatar names with the data until such point as it could be matched with existing user IDs, or new ones could be created for them. Any association between these IDs and the avatar names was destroyed by the primary researcher, as were the original files that associated the avatar names with the account content.

Finally, the issue of informed consent arose in automatic data extraction. The stipulations outlined by the OECD’s Principles on Data Collection (2009) state that data collection requires that the subjects give their written consent, or that the data may not be processed. Indeed, the sample of account holders of Second Life implicated in the automatic data extraction process were not informed that their data was being used for the purposes of this research. However Linden Lab’s Privacy Policy stated, “Access to your personal information is limited to those Linden Lab employees who require the information in order to provide products or services to you or perform their jobs.” At the request of Linden Lab, the primary researcher had been employed as a contractor during the period of data collection to provide services to the company that utilised the network analyses undertaken in Study 3. The purpose of this employment was to understand the research questions of this thesis. The access was considered acceptable to the Linden Lab team, and within the bounds of their Privacy Policy.

NOTE: This was published in 2009, and so this is now ancient thinking. For more up to date guidelines, look to the British Psychological Association or the Association of Internet Researchers.