Machine learning is pretty complex, so we’ve been experimenting with ways to visualize what’s happening. There’s a core concept in machine learning called high-dimensional space. Here’s one way to wrap your head around this concept. You can think about people as being high-dimensional. For example, take famous scientists. You can think about when they were born, where they were born, their fields of study. Each of these is like a dimension of that person. These dimensions become difficult to untangle when you think about different people. Because someone might be similar in some ways, but very different in others. But this is the kind of thing you can use machine learning for.
With machine learning, the computer isn’t told the meaning of these dimensions. It just sees them as numbers. And it sees each set of numbers as a data point. But by looking across all of these dimensions at once, it’s able to place related points closer together in a high-dimensional space.
Here’s a concrete example where words are treated as high-dimensional data points. The important thing to remember is that we haven’t told the computer the meaning of words. Instead, we’ve shown it millions of sentences as examples of how words get used. Here’s a visualization of the results. We’re looking at a subset of words that the computer has learned about. Each dot represents one word. Each word is a data point with 200 dimensions. Using a technique called T-SNE, the computer clusters words together that it considers related. And clusters form based on meaning, even though we’ve never thought at the meaning of words. Here is a cluster of numbers, months of the year, words related to space, people’s names, cities, and so on. We can also look closely at smaller sets of words. If we search piano, we can run T-SNE only on words related to piano. We get clusters of composers, genres, musical instruments, and more.
And this approach doesn’t just work for words. For example, you can also treat an image as a high-dimensional data point. Here’s a data set where lots of people wrote digits between 0 and 9. People write in all kinds of ways. So the question is, instead of us needing to manually code rules for all the ways people write, could a machine figured out itself using machine learning? Each image is 784 pixels. The computer treats each pixel as a dimension. Again, using T-SNE, it clusters these images in a high-dimensional space. We’ve color coded them so that it’s easier for us to see what’s going on. And you can see groups of digits clustering together. It’s learned something about the meaning of these digits. These visualization techniques we’ve been exploring can be useful for all kinds of things. That’s why we’re working on open sourcing all of this as part of TensorFlow, so that anyone can use these tools to explore their data.