Skip to content →

Facebook: 1. Information Leakage on Platforms

In the light of Guardian’s investigative reporting sourced on the whistleblower at Cambridge Analytica, it appears that almost all of the internet has been upset with Facebook. After an extended silence of several days, Zuckerberg was finally on a semi-apology tour of “exclusive” interviews about the “breach” of trust or data (or both or neither).

In many ways, we are witnessing a perfect storm of confusion involving politics, modern notions of privacy, the role of tech in our lives,  the ethics of data-sharing/micro-targeting, and organizational leadership.

Instead of ruminating on the above issues, I focus on three specific aspects of Facebook kerfuffle over a set of three blog posts. (This post is entirely focusing on the first issue.)

  1. Platform Designs and Information Leakage.
  2. Birth and Fall of Networks
  3. The complicity of gatekeepers, including policymakers and academia.

INFORMATION LEAKAGE

Data vs.  Users in  Leaky networks: (or “It’s a feature, not a bug!”)

Fundamentally, all open platforms are “leaky”.  (I use the term leakage in a benign sense of “organic information transmission” and not as a normative judgment). All platforms face pressure to grow their user base and the scale of data. However, platforms tend to treat agents (users or service providers) and data quite differently.

(a) Platforms strive to reduce friction in the entry of agents in a network (that is why sign-ups are super easy) while making it harder for agents to quit. (This is true. I will try and cover this point in the post that follows).

(b) On the other hand, data is treated quite differently.  Adding data to the network is easy, particularly because agents are on-boarded quickly. Opening up or replicating data outside the network is also generally easily facilitated. For example, platforms routinely allow users to forward a link to someone outside the platform, or allow posts to searchable outside the network, and  “open” to the web.  Furthermore, there is no interest or incentive on behalf of the platform to delete data.   (Of course, regulation can shape incentives on data deletion. EU and the US have taken different approaches, placing varying impetus  on the Right to be Forgotten and the Right to erasure).

Because of the current incentive structure of the platforms,  data “leakage” will happen. The leakage typically occurs in two forms: inter-network (across the network) and inter-temporal (across time).

Inter-network leakage occurs by the transfer of data to platforms outside the network. For example, the Cambridge Analytica data usage is an example of inter-network leakage. (Also, data moves across when users played games such as FarmVille on Facebook). Sometimes, this leakage is relatively mundane. For instance, such leakage takes place when someone downloads a picture from the network and posts it outside for a person not on Facebook.

Inter-temporal leakage occurs with the leakage of one’s data to future agents who don’t share time on the network (the haunting of the past, so to speak).  A user can get connected to a network and can explore someone’s deep past on the network indefinitely. In real life, such an exploration would break social norms. Inter-temporal leakages are also leakages, since information leaks across the platform’s varying states of regulatory control.

That’s why these leakages are hard to control. Such leakages have happened before and will continue to happen.

Additionally, since networks are stochastic, “firewalls” to prevent leakages are only probabilistic in their impregnability in a leaky network. So regulation or technical solutions can only ameliorate the extent of leakage but not entirely eliminate it.

What makes the “leakage” on Facebook different? Let’s look at  Platform Asymmetries.

Many platforms are mistaken for symmetric two-sided markets, but this is clearly not true. Particularly, the symmetry aspect of the matching is highly overblown.  To illustrate the concept, I drew a simple schematic comparing some well-known platforms.  (The figures are not an exact representation of reality, but a representation of the concepts).

Let’s consider platforms that match agents who are service providers (on the top) with agents who are users (on the bottom).  The black arrows indicate what the agents supply and what they receive, on the platform.  The red arrows indicate the flow of revenues. (We will soon see that this is where the asymmetry of incentives presents itself).

Before comparing, it is quickly worth noting the fact that a bigger matching platform is preferred by both sides, i.e., service providers like to have more users, and the users like to have a bigger choice of service providers. (Of course, there are operational complexities on matching and data analysis, but those are second-order difficulties).

A. Uber  

As a ride-sharing service, Uber or Lyft’s main asset is the availability of cars, which increases as more drivers participate when they know that there there are a lot of users using it. Users generate cash, which can be shared with drivers. Hence, the red arrows show a split.  Uber focuses on getting drivers and users on the platforms by facilitating quick matches.  (I am ignoring some other complex underlying asymmetries in matching).

Note that riders generate cash for the platform — Uber needs to keep them happy, sometimes at the expense of the drivers having to wait longer, so that they are around when riders need them.  Hence, almost all the economic troubles that Uber has had, have been with the drivers (tipping, allocation of revenues, insurance issues, background checks).

To summarize, the main assets are cars, and Uber shapes driver locations to keep riders happy.

B. Google 

I am going to base the argument mainly on Google search results, but the argument will apply readily for Google Analytics, Maps, etc.

Again, in the case of the Google search platform, the asset is the wealth of information stored in the form of indexed pages. Restaurants host websites, Newspapers write reports, Repairmen receive ratings — Google indexes them on the web.

Users search for results (e.g. what’s a good dinner place? Where to look for financial news? Where to find a repairman for washing machines?).  Google provides highly accurate results for those queries through its fast, efficient search algorithm.  Over time, service providers who like to be found on the web, realize the value in paying for a better position in search results.

Again, note that the revenue cash is generated from the service providers, through sponsored auctions, and Google has an incentive to position the paid links better, at the expense of users getting “organic” results.   It is also in the interest of firms that are poorly positioned in the organic listing to pay for links and get a better listing position so that more users find them.  In fact, the worse their position on the organic listing is, the stronger their incentive is.

All of this means that Google uses some of its real estate on the front page to host some not-so-good links.  Impressions and click-throughs generate money for the platform. Positioning such results inescapably dilutes the user search experience.  Google now clearly marks such links as sponsored results, but we did not arrive at this present state without controversy.

To summarize, the main assets on Google are the indexed results. Google “modifies” the display of their search results to the users, to keep the (paying) sponsors in.

C. Facebook

Unlike Uber/Lyft (users need to go from A to B) and Google (users want to find something on the web) — Facebook users are not really looking for some specific information when they get on the platform.

They are on the platform because of FOMO – fear of missing out. (This is an important feature that explains several issues. I will revisit why this matters, in the next post on Facebook). Essentially, the users are on the platform to spend time. Maybe they are bored, or they want to see what others, usually their “friends” or family, are up to. More recently, Facebook has begun to focus on directing customers to network trends, what’s trending, what’s happening, and what the world is talking about now.

This action rests on the main feature of the users of Facebook:  The user requirement on Facebook is nebulous.

Now, look at the revenue stream for Facebook.  Google can simply position search result links to please the paying sponsors (with winning bids on the auction). But for Facebook, revenues come from firms who place ads, new products, movie promotions, etc. Their (i.e., ad suppliers’) goal is to figure out which social group to promote. Hence, Facebook has a direct incentive to make user network data useful for advertising and targeting.

As a result, the asset that is traded on Facebook is user-network-data (not car rides, or search positions but the social sub-group) mined through clicks, likes, and comments.

As mentioned earlier, unlike on Google platform, where the users are searching for specific something, Facebook’s users are just looking to spend time. This user behavior introduces another kink Facebook’s strategy of letting the ad supplier access the network: Instead of directly advertising to all members of the network which is ineffective, it is better to go “social”.

It is better for Facebook and the ad agencies to randomly find “seed” users in the sub-group, and then organically inform everyone in the subgroup through social means what their friends are up to.  (“Hey, Look, your friend Jack loves Tide-pods”, or “Grandpa just took the Cambridge personality quiz”).

Social comparisons and nudges provide the users some break from ennui and “fills” their time, by providing them information on what’s happening in their social network. But what’s really happening is that the platform is shaping their experience through their “friends”.

So, all platforms trade assets and leak data, but Facebook platform trades data and leaks assets.  But this “leakage” is not a breach, but just a result of how platforms work.

In the next post, I will write about the next two points:  An 80s movie-framework to think about Facebook, and in my view, the disappointing role of gatekeepers.

UPDATE (April 4, 2018): Two days after I wrote the above post, Facebook revealed today that Cambridge Analytica may have had data of up to 87 Million users.  If you agree with the post, it should not be surprising that it is in the nature of the platform that leakages happen.

UPDATE 2 (April 6, 2018). James Fallows at The Atlantic posts notes from a Google Technology Advocate: this advice pretty much aligns with my description of platforms in the post.

If you like what you read, please subscribe to the blog or follow on twitter.

 

Subscribe to My Newsletter

(Roughly) Weekly Emails. I respect your privacy.

Published in Operations