Tim Berners-Lee: we must regulate tech firms to prevent ‘weaponised’ web

TBL is rightfully indignant and concerned about the fact that “what was once a rich selection of blogs and websites has been compressed under the powerful weight of a few dominant platforms.” The Web, according to Berners-Lee, is at great risk of degenerating into a few big versions of Compuserve or AOL sucking up most of the bandwidth of the Internet, and most of the attention of its inhabitants. In an open letter, he outlines the dangers of putting so much power into hands that either see it as a burden, or who actively exploit it for evil.

I really really hate Facebook more than most, because it aggressively seeks to destroy all that is good about the Web, and it is ruthlessly efficient at doing so, regardless of the human costs. Yes, let’s kill that in any way that we can, because it is actually and actively evil, and shows no sign of getting any nicer. I am somewhat less concerned that Google gets 87% of all online searches (notwithstanding the very real dangers of a single set of algorithms shaping what we find), because most of Google’s goals are well aligned with those of the Web. The more openly people share and link, the better it gets, and the more money Google makes. It is very much in Google’s interest to support an open, highly distributed, highly connected Web, and the company is as keen as everyone else to avoid the dangers of falsehoods, bias, and the spread of hatred (which are among the very things that Facebook feeds upon), and, thanks to its strong market position and careful hiring practices, it is more capable of doing so than pretty much anyone else. Google rightly hates Facebook (and others of its ilk) not just because it is a competitor, but because it removes things from the open Web, probably spreads lies more easily than truths, and so reduces Google’s value.

I am somewhat bothered that the top 100 sites (according to WIkipedia, based on Alexa and SimilarWeb results) probably get far more traffic than the next few thousand put together, and that the long tail pretty much flattens to approximately zero after that. However, that’s an inevitable consequence of the design of the Web (it’s a scale-free network subject to power laws), and ‘approximately zero’ may actually translate to hundreds of thousands or even millions of people, so it’s not quite the skewed mess that it seems. It is, as TBL observes, very disturbing that big companies with big pockets purchase potential competitors and stifle innovation, and I agree that (like all monopolies) they should be regulated, but there’s no way they are ever going to get everything or everyone, at least without the help of politicians and evil legislation, because it’s a really long tail.

It is also very interesting that even the top 10 – according to just about all the systems that measure such things – includes the unequivocally admirable and open Wikipedia itself, and also Reddit which, though now straying from its fully open model, remains excellently social and open. In different ways, both give more than they take.

It is also worth noting that there are many different ways to calculate rank. Moz.com (based on the Mozscape web index of 31 Billion domains and 165 Billion pages) has a very different view of things, for instance, in which Facebook doesn’t even make it to the domains listing, and is way below WordPress and several others in the popular pages list, which is a direct result of it being a closed and greedy system. Quantcast’s perspective is somewhat different again, albeit only focused on US sites which are a small but significant portion of the whole.

Most significantly, and to reiterate the point because it is worth making, the long tail is very long indeed. Regardless of the dangers of a handful of gigantic platforms casting their ugly shadows over the landscape, I am extremely heartened by the fact that, now, over 30% of all websites run on WordPress, which is both open source and very close to the distributed ideal that TBL espouses, allowing individuals and small communities to stake their claims, make a space, and link (profusely) with one another, without lock-in, central control, or inhibition of any kind. That 30% puts any one of the big monoliths, including Facebook, very far into the shade. And, though WordPress’s nearest competitor (Joomla, also open source) accounts for a ‘mere’ 3% of all websites, there are hundreds if not thousands of similar systems, not to mention a huge number of pages (50% of the total, according to W3Techs) that people still roll for themselves.

Yes, the greedy monoliths are extremely dangerous and should, where possible, be avoided, and it is certainly worth looking into ways of regulating their activities, nationally and internationally, as many governments are already doing and should continue to do so. We must ever be vigilant. But the Web continues to grow, and to diversify regardless of their pernicious influence because it is far bigger than all of them put together.

Facebook has a Big Tobacco Problem

A perceptive article listing some of Facebook’s evils and suggesting an analogy between the tactics used by Big Tobacco and those used by the company. I think there are a few significant differences. Big Tobacco is not one company bent on profit no matter what the cost. Big tobacco largely stopped claiming it was doing good quite a long time ago. And Big Tobacco only kills and maims people’s bodies. Facebook is aiming for the soul. The rest is just collateral damage.

Facebook’s days may be numbered as UK youth abandon the platform

The end of Facebook couldn’t come soon enough, but we’ve been reading headlines not unlike this for around a decade, yet still its malignant tumour in the lungs of the Web grows, sucking the air out of all good things.

Despite losses in the youth market (not only in the UK), as the article notes, Facebook has deep pockets and is metastasizing at a frightening rate. Instagram and WhatsApp are only the most prominent recent growths, and no doubt far from the last. Also, the main tumour itself is still evolving, backed by development funding that staggers belief. It would take a lot to cure us of this awful thing. On the optimistic side, however, Metcalfe’s Law works just as well in reverse as going forward. Networks can grow exponentially, but they can shrink just as fast. Perhaps these small losses will be the start of a cascade. Let’s hope so.



Signal : now with proper desktop apps

Signal is arguably the most open, and certainly the most secure, privacy-preserving instant messaging/video or voice-calling system available today. It is open source, ad-free, standards-based, simple, and very well designed. Though not filled with bells and whistles, for most purposes it is a far better alternative to Facebook-owned WhatsApp or other near-competitors like Viber, FaceTime, Skype, etc, especially if you have any concerns about your privacy. Like all such things, Metcalfe’s Law means its value increases with every new user added to the network. It’s still at the low end of the uptake curve, but you can help to change that – get it now and tell your friends!

Like most others of its ilk it hooks into your cellphone number rather than a user name but, once you have installed it on your smartphone, you can associate that number (via a simple 2D barcode) with a desktop client. Until recently it only supported desktop machines via a Chrome browser (or equivalent – I used Vivaldi) but the new desktop clients are standalone, so you don’t have to grind your system to a halt or share data with Google to install it. It is still a bit limited when it comes to audio (simple messaging only) and there still appears to be no video support (which is available on smartphone clients) but this is good progress.

The return of the weblog – Ethical Tech

Blogs have evolved a bit over the past 20 years or so, and diversified. The always terrific Ben Werdmuller here makes the distinction between thinkpieces (what I tend to think of as vaguely equivalent to keynote presentations at a conference, less than a journal article, but carefully composed and intended as a ‘publication’) and weblogging (kind of what I am doing here when I bookmark interesting things I have been reading, or simply a diary of thoughts and observations). Among the surprisingly large number of good points that he makes in such a short post is that a weblog is best seen as a single evolving entity, not as a bunch of individual posts:

Blogging is distinct from journalism or formal writing: you jot down your thoughts and hit “publish”. And then you move on. There isn’t an editorial process, and mistakes are an accepted part of the game. It’s raw.

A consequence of this frequent, short posting is that the product isn’t a single post: it’s the weblog itself. Your website becomes a single stream of consciousness, where one post can build on another. The body of knowledge that develops is a reflection of your identity; a database of thoughts that you’ve put out into the world.

This is in contrast to a series of thinkpieces, which are individual articles that live by themselves. With a thinkpiece, you’re writing an editorial; with a blog, you’re writing the book of you, and how you think.

This is a good distinction. I also think that, especially in the posts of popular bloggers like Ben, the blog is also comprised of the comments, trackbacks, and pings that develop around it, as well as tweets, pins, curations, and connections made in other social media. Ideas evolve in the web of commentary and become part of the thing itself. The post is a catalyst and attractor, but it is only part of the whole, at least when it is popular enough to attract commentary.

This distributed and cooperative literary style can also be seen in other forms of interactive publication and dialogue – a Slashdot or Reddit thread, for instance, can sometimes be an incredibly rich source of knowledge, as can dialogue around a thinkpiece, or (less commonly) the comments section of online newspaper articles. What makes the latter less commonly edifying is that their social form tends to be that of the untarnished set, perhaps with a little human editorial work to weed out the more evil or stupid comments: basically, what matters is the topic, not the person. Untarnished sets are a magnet for trolls, and their impersonal nature that obscures the individual can lead to flaming, stupidity, and extremes of ill-informed opinion that crowd out the good stuff. Sites like Slashdot, StackExchange, and Reddit are also mostly set-based, but they use the crowd and an algorithm (a collective) to modulate the results, usually far more effectively than human editors, as well as to provide shape and structure to dialogues, so that dialogues become useful and informative. At least, they do when they work: none are close to perfect (though Slashdot, when used well, is closer than the rest because its algorithms and processes are far more evolved and far more complex, and individuals have far more control over the modulation) but the results can often be amazingly rich.

Blogs, though, tend to develop the social form of a network, with the blogger(s) at the centre. It’s a more intimate dialogue, more personal, yet also more public as they are almost always out in the open web, demanding no rituals of joining in order to participate, no membership, no commitment other than to the person writing the blog. Unlike dedicated social networks there is no exclusion, no pressure to engage, no ulterior motives of platforms trying to drive engagement, less trite phatic dialogue, more purpose, far greater ownership and control. There are plenty of exceptions that prove the rule and plenty of ways this egalitarian structure can be subverted (I have to clean out a lot of spam from my own blogs, for instance) but, as a tendency, it makes blogs still very relevant and valuable, and may go some way to explaining why around a quarter of all websites now run on WordPress, the archetypal blogging platform.

Instagram uses 'I will rape you' post as Facebook ad in latest algorithm mishap

Another in a long line of algorithm fails from the Facebook stable, this time from Instagram…

"I will rape you" post from Instagram used for advertising the service

This is a postcard from our future when AI and robots rule the planet. Intelligence without wisdom is a very dangerous thing. See my recent post on Amazon’s unnerving bomb-construction recommendations for some thoughts on this kind of problem, and how it relates to attempts by some researchers and developers to use learning analytics beyond its proper boundaries.


The Ghost in the Machines of Loving Grace | Library Babel Fish

An article from Barbara Fister about the role and biases of large providers like Google and Facebook in curating, sorting, filtering their content, usefully contrasted with academic librarians’ closely related but importantly different roles. Unlike a library, such systems (and especially Facebook) are not motivated to provide things that are in the interests of the public good. As Fister writes:

“The thing is, Facebook literally can’t afford to be an arbiter. It profits from falsehoods and hype. Social media feeds on clicks, and scandalous, controversial, emotionally-charged, and polarizing information is good for clicks. Things that are short are more valuable than things that are long. Things that reinforce a person’s world view are worth more than those that don’t fit so neatly and might be passed over. Too much cruft will hurt the brand, but too little isn’t good, either. The more we segment ourselves into distinct groups through our clicks, the easier it is to sell advertising. And that’s what it’s about.”

These are not new points but they are well stated and well situated. I particularly like the point that lies and falsehoods are not a reason to censor a resource in and of themselves. We need the ugliness in order to better understand and value the beauty, and we need the whole story, not filtered parts of it that suit the criteria of some arbitrary arbiter. As Fister writes:

“There’s a level of trust there, that our students can and will approach a debate with genuine curiosity and integrity. There’s also a level of healthy distrust. We don’t believe it’s wise to leave decisions about truth and falsehood up to librarians.”

Indeed. She also has good things to say about personalization:

“If libraries were as personalized, you would wave your library card at the door and enter a different library than the next person who arrives. We’d quickly tidy away the books you haven’t shown interest in before; we’d do everything we could to provide material that confirms what you already believe. That doesn’t seem a good way to learn or grow. It seems dishonest.”

Exactly so.  She does, though, tell us about how librarians do influence things, and there’s only a fine and fuzzy (but significant) line between this and the personalization she rejects:

“Newer works on the topic will be shelved nearby that will problematize the questionable work and put it in context.”

I’m not sure that there is much difference in kind between this approach to influencing students and the targeted ads of Google or Facebook. However, there is a world of difference in the intent. What the librarian does is about sense making, and it accords well with one of the key principles I described in my first book of providing signposts, not fenceposts. To give people control, they have to first of all have the choices in the first place, but also they need to know why they are worth making. Organizing relevant works together on the shelf is helping students to make informed choices, scaffolding the research process by showing alternative perspectives. Offering relevant ads, though it might be dishonestly couched in terms of helping people to find the products they want, is not about helping them with what they want to do, but exploiting them to encourage them to do what you want them to do, for your own benefit, not theirs. That’s all the difference in the world.

That difference in intent is one of the biggest differentiators between a system like the Landing and a general-purpose public social media site, and that’s one big reason why it could never make any sense for us to replace the Landing with, say, a Facebook group (a suggestion that still gets aired from time to time, on the utterly mistaken assumption that they duplicate each other’s functionality). The Landing is a learning commons, a network of people that, whatever they might be doing here, share an intent to learn, where people are valued for what they bring to one another, not for what they bring to the owners and shareholders of the company that runs the site. Quite apart from other issues around ownership, privacy and functionality, that’s a pretty good reason to keep it.


Wisdom of the Confident: Using Social Interactions to Eliminate the Bias in Wisdom of the Crowds

A really interesting paper on making crowds smarter.  I find the word ‘confident’ in the title a bit odd because it seems (and I may have misunderstood) that the researchers are actually trying to measure independent thinking rather than confidence. As far as I can tell, this describes a method for separating sheep (those more influenced by others) from goats (those making more independent decisions), at least when you have a sequence of decisions/judgments to work with. The reason it bothers me is that sheep can be confident too (see the US election or Brexit, for example).

We know that crowds can be wise if and only if the agents in the crowd are unaware of the decisions of other agents. If there’s a feedback loop (more accurately, I believe, if there is an insufficiently delayed feedback loop) then you wind up with stupid mobs, driven by preferential attachment and similar dynamics. This is a big problem in many political systems that allow publication of polls and early results. However, some people are, for one reason or another, less influenced by the crowd than others. It would be useful to be able to aggregate their decisions while ignoring those that simply follow the rest, in order to achieve wiser crowds. That’s what the method described here seeks to do.

The paper is more concerned with describing its model than with describing or analyzing the experiment itself, which is a pity as I’d like to know more about the populations used and tasks performed, and whether it really is discriminating confident from independent behaviour. I’ve also done some work in this area and have written about how useful it would be to automatically identify independent thinkers, and to use their captured behaviour instead of that of the whole crowd to make decisions, but I have never implemented that because, in real life, this is quite hard to do. In this experiment, it seems quite possible that the ‘independent’ people might simply have been those that knew more about the domain. That’s great if we are using a sequence of captured data from the same domain (in this case, length of country borders) because we get results from those that know rather than those that guess. But it won’t transfer when the domain changes even slightly: knowing the length of the Swiss border might not well predict knowledge of, say, the length of the Nigerian border, though I guess it might improve things slightly because those that care about such things would be better represented in the sample.

It would take a fair bit of evidence, I suspect, to identify someone as a context-independent independent thinker though, given enough time, it could be done, it would be well worth doing, and this model might provide the means to identify that. I’d like to see it applied in a real context. There are less lengthy and privacy-invading alternatives. For instance, we might capture both a rating/value/judgement/whatever and some measure of confidence. Some kinds of prediction market capture that sort of data and, because of the personal stake in it, might achieve better results when we do not have a long history of data to analyze. Whether and to what extent confidence is related to independence, and whether the results would be better remains to be discovered, of course – there’s a good little research project to be done here – but it would be a good start.

Commons In A Box

Landing-like software from CUNY, based on Buddypress, intended to provide a learning commons with relatively little effort or configuration. It’s a nice bit of packaging, slick, with good collaboration tools and a simple, activity-stream-oriented social network. Commons in a Box is definitely worth looking at if you need a site to support a bottom-up social community or network, and you don’t have a wealth of resources to put into building your own. 

I came across this software because it is being used in the University of Brighton’s newly reborn community site at https://community.brighton.ac.uk which, until it was killed off last year, used to run on Elgg.  I remain a fan of Elgg for building such things, which has a lot more options than BuddyPress available by default, richer access control, and a much more elegant technological design that makes customization more robust and flexible, but this seems to be a great simple solution that just works without demanding much effort, and that, thanks to its WordPress foundations, could be customized to do pretty much anything you’d want a bit of social software to do. 

Address of the bookmark: http://commonsinabox.org/