Hi Ram Nathaniel!Thanks for feedback!

2 min readMay 30, 2018

Graph for “toy” network (see article for more details).

Hi Ram Nathaniel!Thanks for feedback!

For understanding the use of variable d it will be better to calculate the partial derivatives for all variables that take part in operation which gradient we want to override. As you see in that graph we have to override the flow of backpropagation for e = c + k . So we need calculate partial derivatives for c and k, but as you see c variable also takes part in calculation of d (see graph above). That’s why we have to pass variable d just for calculating in backpropagation phase for partial derivative of c. In forward phase we don’t need d variable.

In Tensorflow we have to pass all variable that we use in forward and in backprop phases.

2. Unfortunately, I can’t share source code that I wrote for visual attention model where I did this hack, because it’s working code. Actually, I did all these steps that I described. I hope you will follow it and achieve the same result.

3. I think we can encapsulate it, but I haven’t tried it. I will be thankful if you find the way how to do it pretty :)

Written by Firiuza

No responses yet