Social Archiving

Sometimes It is OK to Forget

For digital archival, there is a core motivation that pushes the too few of us to action. Digital artifacts often counter-intuitively decay more quickly and less noticeably than the more traditional physical items (vases, paintings, etc.) And, well, digital archivists want to keep that from happening.

One of the more difficult concepts for me, as a fledgling archivist, is the full understanding that not everything should be preserved. It is, at first, such an unintuitive concept. The question becomes: "What should we keep?" This often culminates in the timeless struggle of assigning value and, much like physical archives, doing so with some too-casual ignorance about the cultures you are targeting. Oddly antagonistic to this struggle, digital archival in many cases, particularly static websites, has become quite easy and largely been automated by Rhizome's Webrecorder and the infrastructure behind the Wayback Machine. This automation puts archival in the hands of any person with the technical knowledge to operate these tools. Anyone, then, can archive a community without membership or consent of members; which is something we need to keep in mind.

At first glance, social networks are very alluring to preserve. They have very obvious social value. That is especially true when you consider services such as Twitter and Facebook which are infamously used by public figures, particularly politicians. Preserving discourse around such prominent figures and not allowing them to be altered is a laudable goal. Surely, we would love to have a similar record of the candid day to day of historical figures. What kind of 18th century flame war could have been revealed if Jefferson and Hamilton had, beyond what the record already shows, the same access to archival as Trump?

What is Mastodon?

Yet, new social networks have sprung up in protest over these larger services and to fulfill community needs. GNU Social, formerly StatusNet, is a federated social network. This is a class of software you can host and administer yourself to provide a Twitter/Facebook-like service to any community. These servers are operated independently (decentralized) and may have their own rules and conduct. However, they optionally connect to each other (federate) via a common protocol (ActivityStreams/OStatus/ActivityPub) to allow conversations to propagate among them. Other such services include Diaspora, my own, and most recently Mastodon and Pleroma.

When To Forget

What is new to this discussion is how these smaller instances are pushing our time tables. Discussing the idea of a failure of Twitter or Facebook is important, but it is not that likely an event. These small communities, on the other hand, pop up suddenly and may disappear as quickly by their nature of being run by a single person with no stable financial backing. Yet, the fire behind salvaging the larger social networks bleeds into the motivation of preserving the smaller ones, in spite of them being exceptionally different. These places are disappearing! We're losing the social record! Then again, "What should we keep?"

Mastodon and Pleroma instances are generally community driven. Often they revolve around an existing community or a topic, such as for witches and their allies who dislike queer bigotry, and even for folks interested in digital preservation. These groups, unlike the bigger networks, often form around similar experiences and cultures, with their own norms and social expectations. This has been particularly true for vulnerable, and often very young, queer social circles.

This post is specifically considering a recent archival push by folks using tools provided by the Archive Team of the Mastodon instance that has recently shut down. This server has a known queer population including some who are, by law, minors, who are often barred from making accounts on mainstream networks. The archive is consuming the data in an automated fashion offering only the ability to opt-out, although it is unclear how the data is then deleted if one does. The archival push was a surprise to the administrators who call the move "absolutely not acceptable" and note they will be taking action. So, the question here is: "Do we need to keep this?"

Here I apply my I Am Not A Lawyer stamp, as I cannot make any confident statements about the actual legality of archival in any form. I can only say with utter confidence that archival is a wholely uncomfortable and seemingly dangerous task that often takes place between two parties of varied power. Yet archives are increasingly bold when dealing with the gray-area nature of copyright. The Internet Archive, an institution that is noted as not being directly affiliated yet seemingly controls the Archive Team, has had famous success in creating exceptions to the DMCA, the laws that govern digital copyright. Yet, I wonder how the issue being related to personal ownership (versus business ownership) could deal a perhaps justified blow to archives that consume social network content. These are not video games, or productivity tools with bizarre and prohibitive copy protection. These are written works of a personal nature created by thousands of people. Beyond statements made by influential public figures, it is hard to answer the question "Is this worth keeping?" with a confident "Yes."

To that end, there are arguments around the Right to be Forgotten, which is the idea that some data should be deleted so that it cannot be used to trace any particular person. Another viewpoint to consider is social archival as a personal privacy issue, specifically looking at the GDPR, a 2016 European-issued regulation that governs the responsible consent and bounds of storing personal data. With these, governments are clearly becoming concerned with the long-term storage and vague terms and usage of personal data. While the social network itself can navigate consent for its patrons, the archive that consumes the consented data cannot if done as an opt-out process. This gives me pause, and it makes the value of preservation not as high as the penalty of doing it unethically. "Should we preserve this?" Well. No. We should not. We gain nothing but the satisfaction of preservation for its own sake.

Even if you consider the narrow point of view that "the data was on a public forum" or "they should know everything on the internet is public," it does not quickly solve the ethical dilemma. There is the simple dismissal that "public" is not seen nor should ever be the same concept as "permanent." Nor that "public" is so cleanly the antithesis of "private," as there are multiple levels (i.e. Twitter visibility versus a Mastodon instance of a few dozen) of public interaction. Also to that end, the increasingly bizarre idea that one can exist without being on the public internet, which in service of justifying archival means you push an idea that nobody can then exist without being on a permanent internet. This is a world that I find somewhat bleak.

Yet, the most pressing is to consider that many of these patrons are under the age of legal consent. In which case, we cannot legally preserve their content because they cannot consent to it regardless. Furthermore, they are of a vulnerable class, potentially of more than one such class, and may not fully understand the consequences. The destruction of the social network of a vulnerable class is not a hindrance since they can still speak, but the permanent record might be utterly silencing. We cannot, in good conscience, make this choice for them.

The Archive Team

That leads us to consider who actually is making this choice.

In this case, it would be the Archive Team. According to their website, they are "a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage." There is some charm and urgency in this rogue identification. In fact, their tooling is called Warrior, and they call themselves "warriors," which again highlights this intense idea of going into battle. When considering the public data on the internet is predicted to reach over 100 ZiB, yet there are rarely long-term preservation plans, the urgency is certainly understandable. It feels insurmountable! It feels, indeed, like a difficult but worthy battle! However, not every preservation effort or target is equal.

Article text that reads "History is our future... and we've been trashing our history. Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: IT DOESN'T HAVE TO BE THIS WAY.  This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction."

The front page of the Archive Team's website.

After personally making a comment on social media about how archivists sometimes inappropriately justify archiving social networks and calling out the Internet Archive, Jason Scott, founder of the Archive Team and pictured (cropped out) above, chimed in simply saying that he "get[s] the ARGUMENT;" the emphasis his, implying this critique may be frustratingly common. Taking the opportunity, I asked him to give his stance on the opt-out archival of social networks such as Mastodon. He responded as such:

Indeed, they have done quite a bit of good, they have aided the effort to archive 1990s-era internet hosting services such as Angelfire and GeoCities, divisive realms of both piracy and public data distribution such as The Pirate Bay, digital news articles, and some work toward archiving more modern large-scale social networks such as LiveJournal and Tumblr. Yet, good is often a debatable term. They also wish to use archival seemingly as a means to defeat censorship via some centralized version of the Streisand Effect. For example, their effort to archive banned portions of Reddit will certainly share just as much controversy as the bans themselves.


The Reddit wiki page on the Archive Team wiki.

In their coördination of effort detailed above, they note how they fear the potential chilling effect of banning subreddits such as /r/fatpeoplehate which is a forum of over 150,000 solely for the public mocking and harassing of fat people. It was somehow controversially banned at the same time as several Reddit forums with names I do not feel comfortable writing out but are all listed in Wikipedia's Controversial Reddit Communities page. They also note events such as banning /r/gore and /r/watchpeopledie after a mass shooting and /r/piracy as indications of Reddit's impending doom and, conversely, the firing of then CEO Ellen Pao as being beneficial.

Yet, averting doom is part of the Archive Team mission. They are rogue archivists, after all... the implication being they don't ask for permission and they do the dirty work. Otherwise, we would have an apocalyptic world without a record of what life was like in the early 21st century. As Jason Scott said, the mission is to preserve at-risk communities, although the question remains: "What is at risk?" When you consider being motivated by the more hateful groups, it is less about preserving lost speech of the past but a means of protesting a lack of free speech of the present. In this case, the archival motivation is political, which seems fair in a space where you see censorship as its antithesis: a political destruction. In the end, it is not completely without merit, yet, it is also not the best answer to "What should we keep?" or, most generously, "What should be in our public archive?"

If archiveteam is to leave it alone then it needs to not be shutdown at a whim of somebody higher up. I understand what you are saying it is a breach of privacy but a lot of people post stuff they want saved here. ... Also let me remind you ArchiveTeam is a ROGUE band of archivists.

Mastodon/Pleroma Thread

Going back to, how do we answer each question? Is this something we should keep? What is its value? Is it an "at risk" community? If the "at risk" definition is to hold here as "valued and disappearing," then it means they see it as a picture of the underserved: the queer community. Perhaps it is this definition that is assigning value and motivation to the effort. That is, queer discourse is a taboo spectacle, much like fat shaming and the ban-magnet Reddit incel community. Capturing the taboo is the role of rogues. The Archive Team's status as "rogue" was even referenced as justification by one of the people in support of the archival effort. It seems to hand-wave the ethics away. An "ask for forgiveness, not permission" style of archiving. Sure, it is just one or two people, yet that's all it takes. The tooling is easy enough to pilot. And the process is opt-out, meaning individuals who take no effort to self-archive their content may not realize that it has been done for them anyway. They would not be notified, as somebody speaking for the effort said in response to whether or not people were notified they could opt-out:

but what exactly should we have done? Pinged every user on and asked them to opt out? All we did was download public data and put it in the Wayback Machine.

Second Individual in Mastodon/Pleroma Thread

In the end, Jason Scott did ban those working on the archival effort. I think this is a good sign, although the reasoning seems to be more about the way they misrepresented their authority as opposed to the ethics at play. In fact, he emphasizes the rude, rogue-like nature of their collective, although noting that there needs to be some humility as well. He notes that the debate is worthwhile, and I obviously agree. And I'm looking forward to that debate continuing as this situation undoubtedly repeats itself at some point.

Still, the takeaway I have from this discussion is that it is not an uncommon idea among Archive Team warriors that preservation is good largely for its own sake. That any lost community is an impactful loss. Thus, because of this, any effort is necessary effort, and every resistance a hindrance. Moving forward, that worries me. These are, in the very real sense, at risk queer minors, and that should cause pause. Such resistance should inspire time to reflect on preserving "at risk" and how it may put people in the path of risk instead. To that end, the fear of mine is that resistance becomes not a humbling activity that will hone a more honest and virtuous archival practice, but becomes the motivation itself. The question mutates to become "How dare anyone tell us we shouldn't keep it?" I mean, who else would do it? Well, the community preserved their own history on a wiki, which serves as a digital tombstone that interestingly could be a place where their community again grows.

Building Archive-Aware Social Networks (and Social-Aware Archives)

My points essentially stop there: we have to put more thought and potentially more resistance toward archival to be better archivists. However, since I have written a federated social network, contributed to the standardization of its protocols, and written tools and papers around digital archival, this places me in an apt position to talk about some potential best practices for each. Therefore, it seems right to close with some proactive approaches and guidelines for each side or links to people with much more experience.

Archivists have been developing new methods of gathering and conveying consent with regard to folk's awareness and active involvement in the archival or public propagation of their work. In Ed Summers' article, which is a must-read for any social network admin, developer, and/or archivist, Designing for Consent, there is a great discussion of the ramifications of archiving social media. Through their Documenting the Now (DocNow) project and their experience with archiving the social media discourse around Ferguson activism, they experimented with and defined a guideline for determining consent. This Social Humans system encourages active engagement with the archival process by presenting a set of opt-in choices for how their data can be used or altered. This was originally presented to me by designer Alexandra Dolan-Mescal at the recent Personal Digital Archiving conference in Pittsburgh.

Archival Consent Model by Alexandra Dolan-Mescal and the DocNow project. From [Designing for Consent](

Archival Consent Model by Alexandra Dolan-Mescal and the DocNow project. From Designing for Consent.

It is appropriate to make use of these and any other reasonable indicators of consent. As the article describes, there is a growing amount of information about the psychology of interfaces and the ultimate effect they have on people's choices and feelings about their data. When done well, we can indeed keep a record of discourse that is ethical and representative without resorting to vacuuming any content presumed to be public.

This is a responsibility, I feel, that both archives and social network administrators and programmers share. For developers, it may be of growing importance to allow consent markers to be attached to social profiles or individual posts to indicate what type of long-term preservation or usage they desire for their content. And then, naturally, the archivists must respect these wishes and acknowledge that not doing so is an ethical lapse.

The next complication is what is a reasonable default. The Social Humans labels are great when they are taken seriously by every patron of your network. Yet, if they are ignored or never used, what is the default? For me, if you want an opt-in preservation model, your default should be deletion. I am more and more convinced that, particularly for more vulnerable communities, deletion is actually quite a reasonable default policy. Let people mark their accounts or posts such that they are preserved, instead. There are existing tools that enforce this as an opt-in method, instead, such as TweetDelete, TwitWipe, TweetEraser, TweetDeleter, and Cardigan.

The existence of so many competing tools to automatically delete content shows how important archival resistance is to so many people. Things such as microblogs and tweets are, to many, meant to be fleeting ephemeral thoughts. Their value only exists in an instant. There at one moment, ruminated upon, and then lost. And that's fine.

Using a JavaScript front-end can thwart simple forms of automated archival. From [ArchiveTeam's Reddit wiki page](

Using a JavaScript front-end can thwart simple forms of automated archival. From ArchiveTeam's Reddit wiki page.

On the topic of archival resistance, social networks have been recently considering the problem of the rogues and warriors out there. The Archive Team often laments on their wiki when sites engage JavaScript to make scraping text content from websites more difficult. It often means that effort must be manually done to target the specific website in order to pull down and mirror the data. That will merely delay the effort, but if there is a tight window on when the preservation can happen before the data disappears, it can be beneficial to consider.

More pertinent to the future may be applying cryptography and more modern distributed systems designs to federated platforms. I'll get a tad geeky here, but it is very possible to somewhat encode the consent hinting to how the data is principally propagated and stored. For preservation-intended posts, sure, they can be transmitted as you would normally: in plain sight. Your normal public posts may have a time delay where they can no longer be retrieved after a certain point. Anonymized posts can have data be retrievable but the metadata would be inaccessible. Etc.

Much of this is best done by separating data and metadata and building upon a system that requires capabilities, a kind of digital key, in order to retrieve one, none, or both parts. Since location and media are split into their own consent labels, this too should direct the design of social systems to separate these pieces in your data stores and make each optionally accessible in your routes and protocols. And note: people can change their mind! The system should allow for the changing of the metaphorical lock and require actors to retrieve or potentially be denied new keys.

In the end, designs such as these allow for the varied levels of "public" while signaling to archivists that preservation widens the audience and weakens the intention. This is why evolutions in social web protocols to potentially include cryptographic nomadic identity and secure nomadic data (e.g. Spritely/Crystal borrowing from Tahoe-LAFS), where data can be propagated but not read by intermediaries, are so exciting because they place decisions out of administrators hands and into the people themselves. If we build our systems with consent as a forethought, we mitigate many future problems that result from the lack of a mechanism to describe a person's intentions.

There is still merit in the rogue nature of archives at the fringes (and they should remain the fringe.) If you see a community that has a digital space that is about to disappear, it may be, still, altruistic to preserve it on their behalf. I'm not taking a hard and fast stance or making a strong judgment on these actions. Yet, ethically, it is not your role to decide what is kept and what is not when it comes to such personal content. Still, it seems fair that if you are such a rogue, that you do what you can to turn "no notice", as Jason Scott defines "at risk," and make it "more notice." Keep that copy of their community, keep it private, and let members retrieve that data for their personal archives. Then, well, delete it. Not every archive needs to be public.


archival, archival/ethics


If you'd like to comment, just send me an email at or on either Twitter or via my Mastodon profile. I would love to hear from you! Any opinions, criticism, etc are welcome.


If you'd like to make a donation, I don't know what is best for that. Let me know.


All content off of this domain, unless otherwise noted or linked from a different domain, is licensed as CC0