According to the nature paper, the program choose the most visited winning route for the next move. This becomes a problem when there is no one 'most', that is a number of suggested winning routes have the same voting number. I suspect the program will randomly choose one route in such case because they are equal.
I am really curious about how the programming balance between the chance of winning and the net value of winning/losing.