Ticket #30 (closed defect: fixed)
regress.pike timing issue (locking problem?)
| Reported by: | arend | Owned by: | gnugo |
|---|---|---|---|
| Priority: | normal | Milestone: | 3.7.7 |
| Component: | regressions | Version: | |
| Severity: | minor | Keywords: | |
| Cc: | patch: | yes |
Description (last modified by arend) (diff)
Running ./regress.pike reading:220 atari_atari:29 nngs1:42 almost always stalls after atari_atari:29. Adding '--options "--mode gtp --level 1"' makes the stall almost certain.
Apparently Gunnar cannot reproduce this.
A second failure are spurious "test result missing" error messages that are most easily triggered by './regress.pike --check-unoccupied'.
Attachments
Regression Results
| Attachment | Rev. | PASS | FAIL | Nodes | Status | |
| arend_7_7.7-regress.pike_locking.diff | never tested |
Change History
Changed 6 years ago by arend
-
attachment
arend_7_7.7-regress.pike_locking.diff
added
Revise synchronization between threads; localize write_queue.
comment:2 Changed 6 years ago by arend
The second problem is easily explained: it happens when the program_reader "overtakes" the program_writer thread, i.e. when it obtains the test result before the program_write has analyzed the next line in the .tst-file that contains the correct test result.
The patch in the 1st attachment solves this somewhat drastically by having the reader wait until the complete .tst-file got processed by the program_write; implemented via a Thread.Queue()object.
The first problem happens, I believe, when the program_reader sends the cond->signal() before the program_writer reaches the cond->wait(condmutexkey) point, so that the signal gets lost.
The patch solves this by using a queue instead, and localizing the write_queue variable. I am not sure the latter is necessary, I have to think about that again.
comment:3 Changed 6 years ago by arend
My analysis of the first problem is wrong, as adding a condmutex->lock() before sending the signal doesn't solve it.
It seems like some commands get sent to the wrong instance of GNU Go when a testsuite is finished, and the next started. I am at a loss as to explain why, but that is why localizing the write_queue variable as in the attached patch solves the problem, I think.
