Hi guys, I watched a video on youtube called "next generation neural networks" by Geoffrey Hinton and I've got some questions in relation to the presentation.
@ 8:30, we are presented a formula using Gibbs sampling and although he describes that the first term in the brackets <vi * hi>^0 was stated to represent "statistics measured with the data", I'm just not quite sure what is meant by that. Also is it a dot product between the two vectors or is it another operation?
Also, what is meant by the term on the left of the brackets? He mentions that the weights determine the energies linearly and that the probabilities are exponential functions of the energies and therefore the log probabilities are linear functions of the weights. I just cant draw a picture in my head to figure out what all this means. (anything to do with multivariate distributions?)
He explains the concept behind the formula well, but I just don't understand what the individual parts to the formula mean exactly.
I'm also confused by neural networks somewhat. In particular, I don't quite understand exactly how they work. I've tried understanding them for quite some time now and have read a bunch of articles on backprop, perceptrons, hopfield nets and although I understand the general gist of what is going on, I don't quite understand it to a level where I can start playing around with these various techniques. In the scope of the presentation on digit recognition, I would like to clarify a few things that will I hopefully clear up a few of my missing puzzle pieces.
The first thing we need to do, is train the network. This was done using the RBM. My question is this. In matlab, I got the MNIST files and converted them into a format that is recognised by matlab (all the code is given on his site). As each digit is 28x28 pixels, we are esentially creating an array (which represents the image in binary form) with one row of 784 parameters for each of the digits in the training data. This I understand more or less. The next step is where I get confused...
After we take the first number in the training set, how do we input it into the RBM so that it actually starts to learn? The RBM that was shown @ 6:50, has 2 layers, 1 hidden & 1 visible. Does that mean that we have 784 visible neurons, each with an activation function outlined @ 5:50 (this somehow doesn't make sense as there's really nothing to activate for each neuron if it gets either a 0 or a 1 from the array in question)? Or does each neuron simply represent xi where i is the ith binary element in the 784 dimensional array? i.e.
() () () Hidden Units
(0) (1) Visible units (there will be 784 of these.
If this is indeed the case, how do we determine the number of hidden units needed? @18:43, he only uses 500 hidden neurons for a 784 dimensional arrays.
I just can't picture how the information flows from ur 28x28 image, into a binary state and then onto a further binary state.
Considering the fact that the formula @9:30 was obviously too slow, the formula @10:39 was proposed. If we use that, does that mean that we take the 784x1 dimensional vector and do the two steps i.e. up to the hidden -> back to the visible and back up to the hidden and then move onto the second value in the training data and then simply do that until we have gone through all the training data? I assume that the weights are not reset after each value in the training data is used for the training? (in which case, does that mean that we need to train the network for each class of data seperatey? i.e. 0's first, then save the results, do 1's next etc... but this seems to defeat the purpose and goes against the idea of trying to find the p(image).
Well I think that's enough for now. I know I kind of jumped the gun with the first question and it really should have been asked after the clarifications as to the workings of neural networks in general. Anyway, wrote way too much, hopefully one of you will be kind enough to go through my uneducated drivle and actually come up with a response.