Ticket #102 (new defect)

Opened 3 years ago

Last modified 22 months ago

reproducability. suspected cache problem in 3.7.x

Reported by: alain Owned by: gnugo
Priority: normal Milestone: 3.8
Component: source Version:
Severity: major Keywords:
Cc: patch: no

Description (last modified by alain) (diff)

The generated move is not the same when play/replay a whole game (output of --replay white option), when generating just one move (with command line or with regress.pike individualy), when testing the whole test file (with regess.pike or eval.sh).

I triple checked this, even tried another compiler, i always have the same behavior.

Here are the comparison between the generated moves:

with gnugo 3.6:

everything is identical, whatever the method for generating the move. (mv 174 is bad, but reproducible)

./regress.pike -v -e gnugo-3.6 -o "--mode gtp" lost-won-game.tst
lost-won-game:10 PASS C16 [D16|C16|C15|B15|F18]
lost-won-game:20 PASS N12 [N12]
lost-won-game:30 fail M16 [O11|O10|O9|P10]
Total uncertainty: 0.00

with gnugo 3.7.1 or gnugo 3.7.8:

mv 94 is not the same when whole replay and individual genmove.

mv 174 is not the same when running whole testfile.

- Move 94 (white): GNU Go plays S7 (6.75) - Game move K9 (2.27) 
- ./regress.pike -v -e gnugo-3.7.1 -o "--mode gtp" lost-won-game.tst:10
  lost-won-game:10 PASS C16 [D16|C16|C15|B15|F18]

- Move 162 (white): GNU Go plays N12 (18.53) - Game move F19 
- ./regress.pike -v -e gnugo-3.7.1 -o "--mode gtp" lost-won-game.tst:20
  lost-won-game:20 PASS N12 [N12]

- Move 174 (white): GNU Go plays M16 (48.55) - Game move M12 (45.45) 
- ./regress.pike -v -e gnugo-3.7.1 -o "--mode gtp" lost-won-game.tst:30
  lost-won-game:30 fail M16 [O11|O10|O9|P10]

Running the whole testfile:
./regress.pike -v -e gnugo-3.7.1 -o "--mode gtp" lost-won-game.tst
lost-won-game:10 PASS C16 [D16|C16|C15|B15|F18]
lost-won-game:20 PASS N12 [N12]
lost-won-game:30 PASS O11 [O11|O10|O9|P10]
Total uncertainty: 0.00

Attachments

lost-won-game.tst (0.5 kB) - added by alain 3 years ago.
very-bad-94-162-174.sgf (9.1 kB) - added by alain 3 years ago.
378-comparison.html (20.4 kB) - added by alain 3 years ago.
comparison between 3.7.8 --replay, individual genmove, and gg3.6 --replay
very-bad-with-cache.tst (2.1 kB) - added by alain 3 years ago.
very-bad-with-clear-cache.tst (2.1 kB) - added by alain 3 years ago.
owl_with-clear-cache_no-clear-cache.2.diff (28.9 kB) - added by alain 3 years ago.
diff of the owl test results. First with clear cache / second with cache.

Regression Results

Attachment Rev. PASS FAIL Nodes Status
owl_with-clear-cache_no-clear-cache.2.diff 2381 patch does not apply details

Change History

Changed 3 years ago by alain

Changed 3 years ago by alain

Changed 3 years ago by gunnar

The most probable cause is persistent cache problems, like in owl1:383,384.

A technique to debug this further is to run the replay with a fair amount of debug output, e.g. "-t -w -d0x101840", then generate the separate moves with the same options. Save all outputs and cut out relevant parts of the replay output for diffing. If this turns up a difference in dragon status, then it's very likely a persistent cache problem. In that case, try to find an earlier move such that

loadsgf ... M
owl_attack ...
loadsgf ... N
owl_attack ...

gives a different result than just

loadsgf ... N
owl_attack ...

When this is found, "-d0x200000" (DEBUG_PERSISTENT_CACHE) can give a hint about why the persistent cache failed.

Changed 3 years ago by alain

In attachement a comparison between "gg378 --replay" ,individualy generated moves by gg378, and for usefull hint, gg3.6 --replay.

move 50 is different (and solution of the problem) then 22 consecutive moves (54-100) are generated differently (except mv 68 on an other part of the board).

The individual genmove _always_ want the right move C16, which was neither played nor replayed.

The gg3.6 --replay seems a good approximation of individual genmove with gg3.7.8 (this is normal), and show the same "obsession" for C16 during 20 moves.

it seems things are diverging from move 50 (large scale attack against top left black), 52 (surround severely Black top group) to 102 D16, when we see that B is not so dead !

Changed 3 years ago by alain

comparison between 3.7.8 --replay, individual genmove, and gg3.6 --replay

Changed 3 years ago by alain

  • description modified (diff)
  • summary changed from reproducability. suspected cache problem in 3.7.x and/or gtp interference to reproducability. suspected cache problem in 3.7.x

Arend said:

If someone wants to improve persistent caching (that would be a really good thing!!), there is an easy measure by which to measure success: replay a game (1), then replay it with a clear_cache between any two moves (2). A good persistent cache can be measured by a good node reduction for (1), while giving a high percentage of identical moves between (1) and (2).

Changed 3 years ago by alain

Attached of 2 test files, one with "cache_clear" everywhere. I noticed that move 50 D16 has no semeai move reason.

When beginning the test at move 50, everything is OK: It is 30% faster with the cache.

./regress.pike -e gnugo-3.7.8-ref -o "--mode gtp" \
                                 very-bad-with-cache:50-102
very-bad-with-cache                    236.94  55031595   66664   514667
Total nodes: 55031595 66664 514667
Total time: 236.94 (240.51)
Total uncertainty: 0.00

./regress.pike -e gnugo-3.7.8-ref -o "--mode gtp" \
                                 very-bad-with-clear-cache:50-102
very-bad-with-clear-cache              350.05  80835063  111246   790557
Total nodes: 80835063 111246 790557
Total time: 350.05 (352.90)
Total uncertainty: 0.00

When beginning at move 48, everything goes wrong : It is 50% faster, but 23 failures / 26 moves.

very-bad-with-cache                    183.78  52181715   42273   560579
Total nodes: 52181715 42273 560579
Total time: 183.78 (184.35)
Total uncertainty: 0.00
23 FAIL

very-bad-with-clear-cache              355.81  84483950  116039   822133
Total nodes: 84483950 116039 822133
Total time: 355.81 (356.80)
Total uncertainty: 0.00

Changed 3 years ago by alain

Changed 3 years ago by alain

Changed 3 years ago by gunnar

So when you start at move 48, there's a difference already in move 50? That's a good localization. Now run these two only for moves 48 and 50 with the -w flag. Diffing the outputs shows

@@ -1171,7 +1175,9 @@
 - attack threat at C2, attack threat code = 5
 - attack threat at E2, attack threat code = 5

-C18 : W dragon size 2 (7.00), genus 1111, escape factor 6, crude status unknown, status alive, moyo size 8, moyo territory value 5.53, safet
y alive, weakness pre owl 0.23, weakness 0.41, owl status alive
+C18 : W dragon size 2 (7.00), genus 1111, escape factor 6, crude status unknown, status critical, moyo size 0, moyo territory value 0.00, sa
fety critical, weakness pre owl 0.23, weakness 1.00, owl status critical
+... owl attackable at C16, code 5
+... owl defendable at B14, code 5
 ... neighbors D18 D15
 D18 : B dragon size 5 (14.17), genus 0001, escape factor 12, crude status unknown, status alive, moyo size 1, moyo territory value 1.00, saf
ety alive, weakness pre owl 0.29, weakness 0.29, owl status alive
 ... neighbors C18 G18 D15 C11 L9 C9

so something has happened with the owl reading of C18. Now we can try to reduce the problem further,

loadsgf very-bad-94-162-174.sgf 48
owl_attack C18
clear_cache
loadsgf very-bad-94-162-174.sgf 50
owl_attack C18

It turns out that without clear_cache the second owl_attack gives a different result, which proves that the persistent cache makes a mistake. To find out why it fails, now run the version without clear_cache with the flag -d0x200000 and see if you can detect it.

Changed 3 years ago by alain

diff of the owl test results. First with clear cache / second with cache.

Changed 2 years ago by alain

Following #124: ./regress.pike -v -e gnugo-3.7.9-ref -o"--mode gtp" --valgrind very-bad-with-cache.tst
gives nothing from valgind point of view. (it is awfully slow 146 min vs 3min)

 ==4739==
==4739== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 16 from 1)
==4739== malloc/free: in use at exit: 12617248 bytes in 129 blocks.
==4739== malloc/free: 111786 allocs, 111657 frees, 293069523 bytes allocated.
==4739== For counts of detected errors, rerun with: -v
==4739== searching for pointers to 129 not-freed blocks.
Total nodes: 51710765 39522 560586
Total time: 8168.18 (8781.60)
Total uncertainty: 0.00
23 FAIL

Changed 2 years ago by alain

The 2 gunnar's patch from #124 change nothing here.
gnugo-3.7.9 fails the test like shown above.

Changed 22 months ago by alain

patch from #143 change nothing (as expected).

Note: See TracTickets for help on using tickets.