Can data science save social media?

The unfettered internet is too often used for malicious purposes and is frequently woefully inaccurate. Social media — especially Facebook — has failed miserably at protecting user privacy and blocking miscreants from sowing discord.

That’s why CEO Mark Zuckerberg was just forced to testify about user privacy before both houses of Congress. And now governmental regulation of FaceBook and other social media appears to be a fait accompli.

At this key juncture, the crucial question is whether regulation — in concert with FaceBook’s promises to aggressively mitigate its weaknesses — correct the privacy abuses and continue to fulfill FaceBook’s goal of giving people the power to build transparent communities, bringing the world closer together?

The answer is maybe.

What has not been said is that FaceBook must embrace data science methodologies initially created in the bowels of the federal government to help protect its two billion users. Simultaneously, FaceBook must still enable advertisers — its sole source of revenue — to get the user data required to justify their expenditures.

Specifically, Facebook must promulgate and embrace what is known in high-level security circles as homomorphic encryption (HE), often considered the “Holy Grail” of cryptography, and data provenance (DP). HE would enable Facebook, for example, to generate aggregated reports about its user psychographic profiles so that advertisers could still accurately target groups of prospective customers without knowing their actual identities.

Meanwhile, data provenance – the process of tracing and recording true identities and the origins of data and its movement between data bases – could unearth the true identities of Russian perpetrators and other malefactors or at least identify unknown provenance, adding much needed transparency in cyberspace.

Both methodologies are extraordinarily complex. IBM and Microsoft, in addition to the National Security Agency, have been working on HE for years but the technology has suffered from significant performance challenges. Progress is being made, however. IBM, for example, has been granted a patent on a particular HE method – a strong hint it’s seeking a practical solution – and last month proudly announced that its rewritten HE encryption library now works up to 75 times faster. Maryland-based ENVEIL, a startup staffed by the former NSA HE team, has broken the performance barriers required to produce a commercially viable version of HE, benchmarking millions of times faster than IBM in tested use cases.

How Homomorphic Encryption Would Help FaceBook

HE is a technique used to operate on and draw useful conclusions from encrypted data without decrypting it, simultaneously protecting the source of the information. It is useful to FaceBook because its massive inventory of personally identifiable information is the foundation of the economics underlying its business model. The more comprehensive the datasets about individuals, the more precisely advertising can be targeted.

HE could keep Facebook information safe from hackers and inappropriate disclosure, but still extract the essence of what the data tells advertisers. It would convert encrypted data into strings of numbers, do math with these strings, and then decrypt the results to get the same answer it would if the data wasn’t encrypted at all.

A particularly promising sign for HE emerged last year, when Google revealed a new marketing measurement tool that relies on this technology to allow advertisers to see whether their online ads result in in-store purchases.

Unearthing this information requires analyzing datasets belonging to separate organizations, notwithstanding the fact that these organizations pledge to protect the privacy and personal information of the data subjects. HE skirts this by generating aggregated, non-specific reports about the comparisons between these datasets.

In pilot tests, HE enabled Google to successfully analyze encrypted data about who clicked on an advertisement in combination with another encrypted multi-company dataset that recorded credit card purchase records. With this data in hand, Google was able to provide reports to advertisers summarizing the relationship between the two databases to conclude, for example, that five percent of the people who clicked  on an ad wound up purchasing in a store.

Data Provenance

Data provenance has a markedly different core principle. It’s based on the fact that digital information is atomized into 1’s and 0’s with no intrinsic truth. The dual digits exist only to disseminate information, whether accurate or widely fabricated. A well-crafted lie can easily be indistinguishable from the truth and distributed across the internet. What counts is the source of these 1’s and 0’s. In short, is it legitimate?  What is the history of the 1’ and 0’s?

The art market, as an example, deploys DP to combat fakes and forgeries of the world’s greatest paintings, drawing and sculptures. It uses DP techniques to create a verifiable, chain-of-custody for each piece of the artwork, preserving the integrity of the market.

Much the same thing can be done in the online world. For example, a FaceBook post referencing a formal statement by a politician, with an accompanying photo, would  have provenance records directly linking the post to the politician’s press release and even the specifics of the photographer’s camera. The goal – again – is ensuring that data content is legitimate.

Companies such as Wal-Mart, Kroger, British-based Tesco and Swedish-based H&M, an international clothing retailer, are using or experimenting with new technologies to provide provenance data to the marketplace.

Let’s hope that Facebook and its social media brethren begin studying HE and DP thoroughly and implement it as soon as feasible. Other strong measures — such as the upcoming implementation of the European Union’s General Data Protection Regulation, which will use a big stick to secure personally identifiable information – essentially should be cloned in the U.S. What is best, however, are multiple avenues to enhance user privacy and security, while hopefully preventing breaches in the first place. Nothing less than the long-term viability of social media giants is at stake.


Read more

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s