January 28, 2013

Privacy: It’s Harder Than We Thought

John Leslie King, W.W. Bishop Professor of Information, University of Michigan

Chapter 5 of Gotlieb and Borodin’s (1973) Social Issues in Computing was titled “Information Systems and Privacy.” This was a compelling issue at that time (Hoffman, L.J., 1969; Miller, A.R. 1971; Westin, 1970, 1971; Westin and Baker, 1972). My early academic work was in this area (Mossman and King, 1975). I thought privacy was the “issue of the future,” little imagining that it would still be “the issue of the future” four decades later. In the early 1970s the focus was on “databanks,” large collections of personal data held mainly by government. Today there is concern about personal data held by private companies. The concerns might have shifted, but privacy remains salient (Rule, 1973, 2007). Why, after so many years of serious discussion, is privacy still a top issue in computing and society? Because dealing with privacy is harder than we thought.

The challenge of privacy in the computing era can be understood by comparing it to another hard and persistent problem in social computation: computer-assisted natural language processing, especially what was once called machine translation (MT). Both MT and privacy appear relatively easy to solve, but looks are deceiving: both are hard. Examining privacy through the mirror of MT allows us to better understand the challenge, and might help us calibrate our expectations for learning about the problems rather than expecting simple and permanent solutions to the problems.

MT reaches back to efforts by Voltaire and others to construct universal languages, but the goal of MT (Fully Automatic High Quality Translation or FAHQT) took on new importance following WW II. There were many technical documents yet to be translated from German into English, and the Cold War would soon create the need to translate Russian into English. Also, the power of digital computers was established during the war, especially in the “translation” work of code-breakers.

A 1949 essay by Warren Weaver titled, simply, “Translation,” triggered the hope that FAHQT could be achieved by MT (Weaver, 1949). The U.S. Government began devoting substantial sums in pursuit of the dream. Breakthroughs in Linguistics (e.g., Chomsky’s 1957 Semantic Structures) fueled the optimism. Distinguished computer scientists and linguists predicted that the challenges of MT would be overcome within ten or fifteen years. We now know that the problem was much harder than we thought at the time.

As early as 1960, MT optimism was being challenged from within (e.g., Bar-Hillel, 1960). In 1966 a report by the National Academy of Sciences strongly criticized the MT field and its lack of progress (ALPAC, 1966). Funding for MT was dramatically reduced, and major investments did not re-emerge until the late 1980s and the advent of statistical MT based on information theory (Macklovitch, 1996). Hope for effective MT never abated because the potential benefits of FAHQT are too compelling to ignore. Nearly 60 years after the initial burst of optimism we can see some progress, but progress has come slowly.

It is tempting to draw the usual lessons of Fate and Hubris from the story of MT. To be fair, some proponents of FAHQT were over-zealous in their views of what could be done by when. But the real lesson of MT was that human handling of natural language is far more complicated and sophisticated than anyone realized. The proponents of MT fell prey to a natural mistake: that the commonplace must be easy to understand. MT brought home the lesson that human capability in natural language is far more amazing than anyone knew. The dream of FAHQT failed in its mission but succeeded in teaching us a lot about natural language processing in nature.

The mirror of MT says something about the persistence of the privacy problem. Language evolves over time; before English spelling and grammar were standardized in the 19^th century, it was customary for even talented people (e.g., William Shakespeare) to spell and use words however they thought best. A founder of American literature, Mark Twain, declared that he had little respect for good spelling (Twain, 1925). Yet a single, idiographic written language has long spanned much of Asia, in contrast to a wide variety of spoken languages that are not comprehensible to non-native speakers.

The relatively recent U.S. controversy over “ebonics” (Ramirez, et al., 2005) pitted those who favor teaching children in a recognizable (to them) vernacular against those who believe that all instruction should be in standardized American English. Different languages evolve at different speeds, and one size does not fit all. Scholars often seek a simplifying model to explain complicated things, but natural language is much more complicated than it first appears. The simplifying assumptions behind MT did not work out. Perhaps this was in part because MT researchers assumed the primary goal of natural language is to be understood. Language is often used to obscure, misdirect or otherwise confound understanding. And that leaves aside the whole issue of poetry.

Privacy also evolves. Hundreds of millions of users give social networking sites permanent, royalty-free license to use personal information for almost any purpose. Privacy advocates complain about such license, and occasionally win concessions from social networking companies or changes in public policy. Still, users continue to post information that would have seldom seen the light of day in an earlier era. They mock Thomas Fuller’s 16^th century aphorism, “Fools’ names, like fools’ faces, are often seen in public places.” While some social networkers have learned the hard way that prospective employers, future in-laws, and college admission officers visit such sites, sensitive personal information continues to be posted. The number of people using such services continues to grow, and the amount of personal information “out there” grows accordingly.

Evolution in privacy is not uniform: some topics are treated differently than others. For example, people in the United States are now willing to disclose same-sex preference on the Web, yet disclosure of personal information about health and income has not evolved in the same way. Forty years ago few people would have disclosed same-sex preferences for fear of being arrested and prosecuted. Harsh laws against homosexuals were common. Today many people disclose same-sex preferences openly on the Web. In contrast, consider the common affliction of cancer. It was almost never discussed publicly four decades ago. Betty Ford and Happy Rockefeller shocked the public when they openly announced that they had breast cancer in 1974. By 2012 the breast-cancer charity Susan G. Komen For the Cure was front-page news for weeks. Yet such health information (e.g., the admission that one has cancer) is not discussed routinely on social network sites. Why not?

A possible culprit is confusion between privacy and reciprocity. Health and life insurance in the United States have long discriminated against people with “prior conditions.” Individuals with cancer hide that information from powerful insurance companies that seek to exclude cancer patients from coverage. Although discussed as a privacy issue, this is fundamentally about power. Similarly, people with high incomes are reluctant to disclose personal financial information on the Web for fear of disapprobation or worse (e.g., kidnapping for ransom, property theft, identity theft). This is described as a privacy issue even when the real motivations are reciprocity and power.

Even more interesting is possible change in what constitutes “public” information. Following the December, 2012 elementary school shooting in Newtown, Connecticut that left 20 children and six adults dead, the Lower Hudson Journal News, a small New York newspaper, published an interactive map with names and addresses of registered gun owners. Such information is public by law in New York State, but many gun owners complained that the newspaper violated their privacy and made them targets of thieves who might steal firearms. Employees of the newspaper were subjected to threats, and the names of the schools their children attended were made public. This escalation was less about privacy than about reciprocity.

The issue of computing and privacy is still with us four decades after Gotleib and Borodin raised it because addressing it effectively is harder than we thought. The mirror of MT is relevant here: both natural language and privacy are moving targets, and both draw much of their instantaneous meaning from context that is very difficult to capture in computers. Difficulty in achieving the goal of FAHQT might have predicted difficulty in dealing with computing and privacy, but that link was not made.

As hard as MT proved to be, dealing effectively with privacy is harder. Unlike natural language, privacy is often confused with other things. The main confusion involves reciprocity, a related issue but not the same. The word “privacy” is often an allusion to a complicated power relationship between an individual and others. Changing technology also has different effects for MT and for privacy, enabling improvements in MT, but complicating privacy. New technology can make it appear that “everyone already knows,” and the growth of social networking might make disclosure of previously sensitive personal information the “new normal.” New technology such as Google maps can also cause information about gun registration that was formerly considered “public” to be declared “private” in the interests of preserving privacy.

Computing can reveal much about social issues by drawing attention to the issues and changing the way we attempt to deal with them. Machine Translation revealed how marvelous human natural language processing is. Computing and privacy similarly shows how complicated privacy is. “Social impacts” of computers are seldom linear and predictable, and by studying social issues in computing we often learn how little we know about social issues.

References

ALPAC (1966), Language and Machines: Computers in Translation and Linguistics. A report of the Automatic Language Processing Advisory Committee, National Academy of Sciences: Washington, DC: NAS Press (available online as of January 1, 2013 at http://www.nap.edu/openbook.php?record_id=9547).

Bar-Hillel, Y. (1960) ‘The present status of automatic translation of languages’, Advances in Computers 1 (1), pp. 91-163.

Chomsky ,N. (1957) Syntactic Structures. The Hague: Mouton.

Gotleib, C.C. and Borodin, A. (1973) Social Issues in Computing. New York: Academic Press.

Hoffman, L.J. (1969). Computers and Privacy: A Survey. ACM Computing Surveys, 1(2) pp. 85-103.

Macklovitch, E. (1996) “The Future of MT is now and Bar-Hillel was (almost entirely) right.” In Koppel, M. and Shamir, E. (eds), Proceedings of the Fourth Bar-Ilan Symposium on Foundations of Artificial Intelligence, June 22-25, 1995. Cambridge: MIT Press, pp 137-148.

Miller, A.R. (1971). The Assault on Privacy. Ann Arbor: University of Michigan Press.

Mossman, F.I. and King, J.L. (1975) ” Municipal Information Systems: Evaluation of Policy Related Research. Volume V. Disclosure, Privacy, and Information Policy, Final report.” Washington, DC: National Technical Information Service PB-245 691/1. Reprinted in Kraemer, K.L. and King, J.L. (Eds.) (1977) Computers and Local Government, Volume 2: A Review of Research, with Kenneth Kraemer. New York: Praeger.

Ramirez, J.D, Wiley, T.G., de Klerk, G., Lee, E. and Wright, W.E. (Eds.) (2005) Ebonics: the Urban Education Debate (2^nd Edition). Tonawanda, NY: Multilingual Matters, Ltd.

Rule, J.B. (1973) Private Lives and Public Surveillance: Social Control in the Computer Age. New York: Schocken.

Rule, J.B. (2007) Privacy In Peril: How We Are Sacrificing a Fundamental Right in Exchange for Security and Convenience. New York: Oxford University Press.

Twain. M. (1925) The Writings of Mark Twain, compiled by C.D. Warner and A.B. Paine, New York: G. Wells, p. 68.

Weaver, W. (1949): ‘Translation.’ A memorandum reproduced in: Locke, W.N. and Booth, A.D. (Eds.) Machine Translation of Languages: Fourteen Essays. Cambridge: MIT Press, 1955), pp. 15-23.

Westin, A.F. (1970) Privacy and Freedom. Oxford: Bodley Head.

Westin, A.F. (1971) Information Technology in a Democracy. Cambridge: Harvard University Press.

Westin, A. and Baker, M.A. (1972) Databanks in a Free Society: Report of the Project on Computer Databanks of the Computer Science and Engineering Board, National Academy of Sciences. New York: Quadrangle.

John Leslie King is W.W. Bishop Professor of Information and former Dean of the School of Information and former Vice Provost at the University of Michigan. He joined the faculty at Michigan in 2000 after twenty years on the faculties of computer science and management at the University of California at Irvine. He has published more than 180 academic and professional books and research papers from his research on the relationship between changes in information technology and changes in organizations, institutions, and markets. He has been Marvin Bower Fellow at the Harvard Business School, distinguished visiting professor at the National University of Singapore and at Nanyang Technological University in Singapore, and Fulbright Distinguished Chair in American Studies at the University of Frankfurt. From 1992-1998 he was Editor-in-Chief of the INFORMS journal Information Systems Research, and has served as associate editor of many other journals. He has been a member of the Board of the Computing Research Association (CRA) and has served on the Council of the Computing Community Consortium, run by the CRA for the National Science Foundation. He has been a member of the Advisory Committees for the National Science Foundation’s Directorates for Computer and Information Science and Engineering (CISE) and Social, Behavioral and Economic Sciences (SBE), as well as the NSF Advisory Committee for Cyberinfrastructure (ACCI). He holds a PhD in administration from the University of California, Irvine, and an honorary doctorate in economics from Copenhagen Business School. He is a Fellow of the Association for Information Systems and a Fellow of the American Association for the Advancement of Science.