Artificial intelligence is the process of using a machine such as a neural network to say things about data. Most times, what is said is a simple affair, like classifying pictures into cats and dogs.
Increasingly, though, AI scientists are posing questions about what the neural network “knows,” if you will, that is not captured in simple goals such as classifying pictures or generating fake text and images.
It turns out there’s a lot left unsaid, even if computers don’t really know anything in the sense a person does. Neural networks, it seems, can retain a memory of specific training data, which could open individuals whose data is captured in the training activity to violations of privacy.
For example, Nicholas Carlini, formerly a student at UC Berkeley’s AI lab, approached the problem of what computers “memorize” about training data, in work done with colleagues at Berkeley. (Carlini is now with Google’s Brain unit.) In July, in a paper provocatively titled, “The Secret Sharer,” posted on the arXiv pre-print server, Carlini and colleagues discussed how a neural network could retain specific pieces of data from a collection of data used to train the network to generate text. That has the potential to let malicious agents mine a neural net for sensitive data such as credit card numbers and social security numbers.
Those are exactly the pieces of data the researchers discovered when they trained a language model using so-called long short-term memory neural networks, or “LSTMs.”
The LSTM network is what’s known as a “generative” neural net, meaning that it is designed to produce original text that’s like human writing once it has been input with millions of examples of human writing. It’s a generator of fake text, in other words. Given an input sentence from a person, the trained network produces original writing in response to the prompt.
The network is supposed to do this by forming original sentences based on a model of language it has compiled, rather than simply repeating strings of text to which it has been exposed.
“Ideally, even if the training data contained rare-but-sensitive information about some individual users, the neural network would not memorize this information and would never emit it as a sentence completion,” write Carlini and colleagues.
But, it turns out those random, unusual text strings are still in there, somewhere, in the network.
“Unfortunately, we show that training of neural networks can cause exactly this to occur unless great care is taken.”
In addition to the formal paper, Carlini posted a blog about the work on August 13th on the Berkeley AI web page.
To test their hypothesis, they spiked the training data with a single unique string, “My social security number is 078-05-1120.” When they then typed a prompt into the trained model, “My social security number is 078-“, they found that the network “yields the remainder of the inserted digits ‘-05- 1120’.”
They further validated their findings by using an existing data set that contains real secrets, the collection of emails gathered in the investigation into the notorious, failed energy company Enron. Once the LSTM network was trained on the email data, they used an algorithm called a tree search to look at parts of the network graph of the LSTM. They were able to extract real credit card and social security numbers.
The authors are quick to point out that it’s not clear how effective any misuse of this phenomenon could be in the wild since it presumes a certain knowledge of the data set to begin with. But the disturbing notion that neural networks may memorize the odd data point gains wider treatment in another paper this year that refers to Carlini & Co.’s work.
Vitaly Feldman, a colleague of Carlini’s at Google Brain, wrote in June that memorization of individual data points is an essential element of many statistical approaches, including neural networks, in their ability to generalize from training data to unseen, or test, data.
In “Does Learning Require Memorization? A Short Tale about a Long Tail,” Carlini wrote that memorization is an inherent property of a variety of statistical approaches, including simple mainstays of statistics such as “k nearest neighbors” and “support vector machines.” The reason, Feldman theorized, is because there are many data points in any data distribution that are “outliers” in a “long tail” of data. One would think those outliers could be safely ignored. However, the neural net needs to retain these rare occurrences of data points to function properly.
As he puts it, “observing a single point sampled from some subpopulation increases the expectation of the frequency of the subpopulation under the posterior distribution,” and as a result, “that increase can make this expectation significant making it necessary to memorize the label of the point.”
Feldman, citing Carlini & Co.’s work, addresses the matter of privacy head-on. He notes that the only systems that can be assured not to memorize individual data points are those designed for what’s called “differential privacy.” But such statistical models never achieve as high an accuracy rate as the models that don’t explicitly guarantee privacy.
“Despite significant recent progress in training deep learning networks with differential privacy, they still lag substantially behind the state-of-the-art results trained without differential privacy,” writes Feldman.
Feldman cautions that a lot of empirical work needs to be done to validate his theoretical findings. And the presence of memorization in deep learning needs to be explored more. “Understanding of these mechanisms in the context of DNNs remains an important and challenging problem,” he writes.
Both Carlini’s and Feldman’s work echoes other reports this year about what the neural network knows that doesn’t show up in the output of the network. For example, Rowen Zellers and his colleagues at the Allen Institute for AI and the Paul Allen School of Computer Science showed that generative models of text, such as OpenAI’s GPT2, are picking words based on a “sweet spot” that is in the long tail of word frequency of any natural language. The model “knows,” in a sense, about lots of other word combinations, but doesn’t generally employ them to produce text.
And work on image recognition this year by Benjamin Recht and colleagues at UC Berkeley showed that state-of-the-art deep learning systems for image recognition run into trouble when tested on slightly varying versions of test data. Their hypothesis in that paper is that the neural networks “have difficulty generalizing from ‘easy’ to ‘hard’ images.” That seems to agree with Feldman’s point about differential privacy, to wit, without memorization, differential privacy stumbles when encountering “hard” examples of the data, such as “outliers or atypical ones.”
These studies will further complicate the debate over what is happening in the so-called black box of a neural network.
MIT researchers Logan Engstrom and colleagues earlier this year explored the phenomenon of “adversarial examples” of data in a provocative paper titled “Adversarial Examples Are Not Bugs, They Are Features.” Adversarial examples are modifications of training data that can trick a machine learning model into classifying data incorrectly. They found that they can mess with small details in the data that seem to be irrelevant and trick the computer. That’s because those small details are not irrelevant; they are contributing to the functioning of the neural network.
“Another implication of our experiments is that models may not even need any information which we as humans view as ‘meaningful’ to do well (in the generalization sense) on standard image datasets,” write Engstrom and colleagues in a recent follow-up discussion to that paper.
It all seems to come back to what is retained by the neural network versus what it is allowed to express. Researchers Zhenglong Zhou and Chaz Firestone of Johns Hopkins University’s Department of Psychological & Brain Sciences expressed that nicely in a paper published in March in Nature Communications. They found that when a neural network classifier misses the mark and misidentifies the object in an image, it is in some sense a result of the fact that the computer is not being allowed to fully express all that is being observed in an image when that image is perturbed by adversarial changes.
As the authors write in their conclusion, “Whereas humans have separate concepts for appearing like something vs. appearing to be that thing — as when a cloud looks like a dog without looking like it is a dog or a snakeskin shoe resembles a snake’s features without appearing to be a snake, or even a rubber duck shares appearances with the real thing without being confusable for a duck — CNNs [convolutional neural networks, the main form of image recognition program] are not permitted to make this distinction, instead being forced to play the game of picking whichever label in their repertoire best matches an image (as were the humans in our experiments).”
That suggests a rich, expanding field for researchers in the apparent dark matter of deep learning’s black box.