Featured Posts

  • Prev
  • Next

The Distributed Social Network Paradox

Posted on : 19-09-2012 | By : Bozho | In : Opinions

4

When a few weeks ago twitter announced that they are effectively closing their platform, I put a lot of thought on possible alternative to twitter. The obvious choice is distributed social networks, where there is no single owner of the service, and so nobody can decide to enforce rules and cut API access. The two most popular distributed networks are Diaspora and Status.net. Here is how they work, in short:

Everyone can decide to launch an instance (node) of the software. This instance stores their data, and possibly the data of everyone who register on their node. When a user registered on one node needs to follow/communicate/interact with someone registered on another node, the OStatus standard  is used to facilitate that. So, in fact, you can register anywhere, even on your own server, and communicate with the rest of the world. And that sounds great, right? Even if latency is not an issue, I had one question, and I didn’t like the answer – imagine you launch a status.net node and let all your friends register there. But then your service gets popular to a wider audience, and a lot more people join. You have two options – turn that into your revenue stream (you’d have to think of a business model, probably ads), or simply say “I don’t have time for this, I’m shutting it down”. And what happens is that all the user’s data is now gone. True, every user can download their data, and you can keep it for a couple of months until you delete it, but it’s a hassle – they’d need to jump to another node, register and import their data. And while this is OK for tech-savvy people, it certainly can’t get mainstream. You can’t have all of twitter move to status.net nodes, because the bigger ones will need a business model to support themselves, and the smaller ones will be dying every now and then and there will be tons of unhappy users unwilling and not knowing how to move their data around. It’s still better than having a single vendor, but it’s not exactly “distributed” if you have 3 huge nodes. And these nodes might at some point decide on restricting the network, because they have the data at one place. “Yes, but people can move away from them”. People can even move away from facebook – they can download all their data and import it somewhere else. There’s a difference, of course – you can move your status.net data to another node and still communicate with the others, and that’s why status.net and diaspora may turn out to be a good model, but the fact that small nodes just die and loose their data bothered me. (I can’t omit the fact that status.net is not user-friendly at all, and it’s my second failed attempt to use it)

At that point I  thought that there is an even more distributed way of doing social networks, and I even thought of turning welshare into such a piece of software. The point is to have the data distributed (replicated), just as in a database cluster. But the cluster is internet-wide, rather than in a single network. There are enough good approaches to that in various NoSQL databases, Cassandra being the one I like most – you can add and remove nodes at any time, and the data always remains somewhere. This way, even if you decide to shut down your server, your users will be able to login to a different URL (to which you will point or redirect them) without any additional steps. Then each node will develop a business model and the system will thrive. The difference from above is that no data will be lost, and the regular users won’t even notice they are using a distributed network. Optionally, you can even replicate the data to cloud storage providers like dropbox.

And here comes the paradox. You can’t have a fully distributed social network, because you’ll have to distribute (replicate) the data. And everyone will possibly have access to that data. If a node grows big, it will get some user data from other nodes, which will turn into a privacy nightmare. So unless your users are fine with anyone being able to read all their data, you can’t replicate it. The problem lies with the data – it is good that you can own your data with status.net and diaspora, but the reality is the regular user doesn’t want to manage their data. Just give it to a provider and let them manage it. And they can then do aggregate queries on it and serve ads.

The good thing is, a network like twitter doesn’t actually have any privacy (apart from the protected accounts), so implementing such a distributed data mechanism is still an option. But as far as I see it, it’s not an option for a full-featured social network – you need a single keeper of the data, and you can’t democratize access to it, if privacy is concerned.

Comments (4)

Bozho, I think you showed the most logical way of thinking about the problem and until the last two paragraphs, I completely agree.

But what is the problem with cryptography? Skype also shares some across nodes and there are no problems with the privacy.

I’ve (of course) thought about encrypting the data. However, the case with Skype is a bit different. They use a proprietary, symmetric algorithm and the point is nobody knows how to decrypt the data. With a distributed social network you can’t use such a thing, because, obviously, it has to be open-source. Each node needs to be able to decrypt the data in order to display it, and this means a node owner can decrypt every bit of information that gets replicated to his premises.

I mostly agree, that these problems exist, but don’t agree that they are as insurmountable as you say. I run my own status.net instance, so that is where my understanding lies.

Small instances have more than two options for scaling, I’ve seen this first-hand. They can remain selective about membership to keep load manageable, and if they do shut down, they can do so gracefully instead of just shutting down, as closed services tend to do when they die.

I think the degree of distribution can be further increased, but it is already higher than you think, all public notices are already stored at all instances that subscribe to a given instance, and all that leaves is account data, which is rather small. There is an export feature for all of your data in status.net, though it isn’t perfect.

The main problem is migrating to a new instance seamlessly, but this is a matter of designing and implementing such a feature, there is nothing that makes it inherently impossible given the current infrastructure.

Re: Encrypting personal data, it’s a matter of keys, you would encrypt various pieces of data with different keys depending on how sensitive they are, and grant access to various users/instances by revealing a key to them that can be used to decrypt various subsets of the data. It’s a bit odd to say, “Each node needs to be able to decrypt the data in order to display it.” If the node has permission to display it, it has permission to display it, but nothing says that you must reveal all your keys to every node that will store your data. This is the basis for zero-trust systems like TAHOE FS, and a social network could be implemented on top of it, or in a similar fashion.

I want to emphasize one thing. Being proprietary does not enhance security of a system, it’s the keys that are important. Skype is actually a problem in that there is evidence that MS CAN decrypt your data, and has changed its policies to route all Skype calls through servers under their control.

Thanks for your comments, it’s good to have insight from an actual status.net deployer. Incidentally, this evening I spoke to another one, and things look a bit better than I thought, though there is a lot of work on the usability part.

Write a comment