On my way back home today (much earlier than usual), I started thinking about learning methods. Learning is both one of the most interesting things that you can think of in the computer science side of the world, but also one of the most traveled paths. Everybody wants to teach their computer to be a little smarter and not expect to just repeat what you say.
So, with all this already done, why did I decide to think about it? Do I have an answer to the machine learning problem? Yea, right! I never have answers, but I do have questions and the will to read papers and pursue things that make my evenings more meaningful. And today what I’m looking at are generative models.
Like with all research, you have to start with defining what you mean by the names you use. So, the generative models that I’m talking about are the ones that the system itself generates inputs to itself. The idea behind it is that you learn by doing it. Not necessarily actually doing it, but by rehearsing doing it inside your world model, your brain. Actually, we are very good at that! We can even understand intangibles, like other people’s emotions, by trying to map their experiences and facial expressions to what we would do and determine what we would be feeling if we did it, thus what the person should be feeling.
Also, another interesting example is why are people usually scared during scary movies, or sick during bloody scenes? It’s because we are constantly trying to understand it by applying what happens to ourselves and we do feel scared, we do feed the sickness of our pain that isn’t there.
So, back to computers: I believe (like many other researchers that have tackled this problem) that one of the key methods for robust learning (and I’m not talking here of any learning – there are many ways for computers to learn, some very good), is to allow our learners to replay and internalize what happens.
This is much easier said than done, actually. It’s very easy to think of learning in the normal learning way: synchronous. You present a case and potentially the answer or a hint about the answer and you let the learner take one step towards learning the model. Then you present the next one and so on. The problem of generative models is that the “will to learn” has to be an action from the learner. The learner should determine what it wants to learn and maybe generate what it thinks it should learn.
This post is already getting much longer than anybody should handle, so I’ll try to make it easier and think of an example. Let’s say that you want to teach a computer to play Sudoku.
The “teacher” shows a Sudoku puzzle and then a solution (that can be a step towards the solution, or a piece of the puzzle with a step towards the solution). Then it shows another puzzle and a solution. It keeps showing different puzzles (well, sometimes you can repeat a puzzle to make sure it takes another step towards the solution of that puzzle) and solutions until you decide to stop and show some new puzzles and ask for the solution to see if it learned.
This is actually a type of supervised learning. It’s focus is either on delayed gratification: you let the computer try a couple of things and then you zap it if it’s not doing very well; or you give it candy if it’s doing well. Also another possibility is not providing the next correct step, but just say if it’s right or wrong. It feels much more like nature teaches animals, but it is limited to what saying right or wrong can make you learn. My Ph.D. research started with looking at reinforcement learning techniques and they are slow to learn and usually not very robust (well, if you can claim robustness on something that converges in way too many iterations)
In this type, you allow the computer to see the different games and let it find patterns in them by itself. Then it can use these patterns to solve other games. It’s usually also based on showing the learner a set of examples but not saying anything about them. It’s interesting, but it’s usually very limited in what it can be applied to. I’m not sure it would create a good Sudoku player.
In this case you can start with any of them methods. But then you allow the learner to either pass back to the teacher a whole new puzzle and ask for a solution, or request a recall of a specific puzzle, or even stop looking for puzzles and trying to predict what the next puzzle would be. Actually prediction is a very interesting consequence of these types of approaches. You are not really any more trying to answer the question like A + B = ?, but you are now trying to look at things like A + ? = C. You know what C should be because of your learning, but now you are trying to find other Bs that satisfy the same model. Then you try to look at other As. And then you try to vary C and look again. You build the model by constructing the question and not the answer.
Again, as you must have already realized, I quickly left the realm of Sudoku. So you can’t try to implement what I’ve just written here. Yes, and I’m aware that nobody even thought of doing it besides me – and I haven’t actually implemented anything myself, just written a lot of notes on OmniOutliner about what questions I’m trying to answer. And, of course, with no answers themselves. Things like:
- How to make a learner use a 4×4 Sudoku as a learning ground for a 9×9?
- Should the learner actually learn position and movement too? E.g., should it interact with the outside world like: show me the element to the right of the element I’ve just seen
- Should learning involve separate learning modules for bad and good examples?
- How much can you predict before seeing an example? (how much should you learn from the instruction manual – sort of like the ontology duality of intent/extent)
Oh, well… At least I have fun and keep my mind occupied! 🙂