The Gravitational Wave S6 GC search v1.01 (SSE2) Wu's are giving me these Messages on 4 different Box's now. It doesn't seem to affect the Wu & they finish but for some reason it keeps me from Monitoring the Remote Box's with BOINC View. It keeps flooding the Message Center over & over with the same Message ...
The Gravitational Wave S6 GC search v1.01 Wu's don't seem to send the Message though ...
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_544_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.30_S6GC1__49_S6BucketA_0: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.20_S6GC1__49_S6BucketA_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.20_S6GC1__48_S6BucketA_0: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.20_S6GC1__47_S6BucketA_0: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_544_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.30_S6GC1__49_S6BucketA_0: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.20_S6GC1__49_S6BucketA_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.20_S6GC1__48_S6BucketA_0: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_544_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.30_S6GC1__49_S6BucketA_0: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.20_S6GC1__49_S6BucketA_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_544_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] h1_0051.30_S6GC1__49_S6BucketA_0: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_544_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
10/23/2011 3:50:47 PM | Albert@Home | [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00
STE\/E
Copyright © 2024 Einstein@Home. All rights reserved.
Gravitational Wave S6 GC search v1.01 (SSE2) Error Messages
)
Thanks for reporting.
This must be an unintended side effect of the new server code, which is estimating the 'total flops'. Apparently this estimation is wrong. This _should_ adjust itself with time, but I don't know how long it will take. One purpose of testing this code here is to find out ...
BM
BM
Well after about 2 days the
)
Well after about 2 days the same condition still persists with the constant [error] p2030.20090421.G34.65-01.29.S.b3s0g0.00000_552_2: negative FLOPs left -1.#IND00 message.
Another side affect is your GPU Cache drains dry.n & only allows you 1 Wu @ a time Per GPU. The Wu has to finish before you can get another one. I presume it's because the "negative FLOPs left -1.#IND00" being reported.
STE\/E
I have to admit that I don't
)
I have to admit that I don't understand the reason for this message. I asked D.A. about this, but didn't get an answer yet.
BM
BM
Maybe I'm the only one
)
Maybe I'm the only one getting the Message since nobody else is reporting it ... ?
But that probably has more to do with the Project not giving me anymore CPU Wu's even though I have all Applications selected & the Albert@Home server status page shows 9800+ Wu's available when I last checked ...
I tried Re-setting the Project on 1 Box, still no work. Detached & Re-attached the Box & did get new work ...
STE\/E
When the Computers request
)
When the Computers request new work I get this:
Albert@Home 10/28/2011 4:17:19 AM Message from server: see scheduler log messages on http://albert.phys.uwm.edu//host_sched_logs/1/1112
Which leads to this reading the scheduler log:
2011-10-28 08:37:28.0435 [PID=9044] Request: [USER#xxxxx] [HOST#1112] [IP xxx.xxx.xxx.16] client 6.12.34
2011-10-28 08:37:28.6702 [PID=9044 ] [send] Not using matchmaker scheduling; Not using EDF sim
2011-10-28 08:37:28.6702 [PID=9044 ] [send] CPU: req 961765.99 sec, 2.00 instances; est delay 0.00
2011-10-28 08:37:28.6702 [PID=9044 ] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00
2011-10-28 08:37:28.6702 [PID=9044 ] [send] work_req_seconds: 961765.99 secs
2011-10-28 08:37:28.6703 [PID=9044 ] [send] available disk 13.06 GB, work_buf_min 0
2011-10-28 08:37:28.6703 [PID=9044 ] [send] active_frac 0.999749 on_frac 0.999661
2011-10-28 08:37:28.6709 [PID=9044 ] [send] [AV#364] not reliable; cons valid 4 < 10
2011-10-28 08:37:28.6709 [PID=9044 ] [send] set_trust: cons valid 4 < 10, don't use single replication
2011-10-28 08:37:28.6709 [PID=9044 ] [send] [AV#432] not reliable; cons valid 2 < 10
2011-10-28 08:37:28.6709 [PID=9044 ] [send] set_trust: cons valid 2 < 10, don't use single replication
2011-10-28 08:37:28.6754 [PID=9044 ] [version] looking for version of einsteinbinary_BRP4
2011-10-28 08:37:28.6754 [PID=9044 ] [version] Checking plan class 'BRP3SSE'
2011-10-28 08:37:28.6755 [PID=9044 ] [version] reading plan classes from file '../plan_class_spec.xml'
2011-10-28 08:37:28.6760 [PID=9044 ] [version] parsed project prefs setting 'also_run_cpu' : true : 0.000000
2011-10-28 08:37:28.6760 [PID=9044 ] [version] Checking plan class 'ATIOpenCL'
2011-10-28 08:37:28.6761 [PID=9044 ] [version] No ATI devices found
2011-10-28 08:37:28.6761 [PID=9044 ] [version] [AV#443] app_plan() returned false
2011-10-28 08:37:28.6761 [PID=9044 ] [version] Checking plan class 'NVOpenCL'
2011-10-28 08:37:28.6761 [PID=9044 ] [version] driver version required min: 28013, supplied: 26658
2011-10-28 08:37:28.6761 [PID=9044 ] [version] [AV#440] app_plan() returned false
2011-10-28 08:37:28.6761 [PID=9044 ] [version] [AV#432] (BRP3SSE) using unscaled projected flops: 2.69G
2011-10-28 08:37:28.6761 [PID=9044 ] [version] Best version of app einsteinbinary_BRP4 is [AV#432] (2.69 GFLOPS)
2011-10-28 08:37:28.6761 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6765 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9294]
2011-10-28 08:37:28.6765 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6765 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6766 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#8873]
2011-10-28 08:37:28.6767 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6767 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6768 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9242]
2011-10-28 08:37:28.6768 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6768 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6769 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9278]
2011-10-28 08:37:28.6770 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6770 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6771 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9104]
2011-10-28 08:37:28.6771 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6771 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6772 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9244]
2011-10-28 08:37:28.6772 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6773 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6774 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#8888]
2011-10-28 08:37:28.6774 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6774 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6776 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#8899]
2011-10-28 08:37:28.6776 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6776 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6777 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#8900]
2011-10-28 08:37:28.6777 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6777 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6778 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9284]
2011-10-28 08:37:28.6779 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6779 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6780 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9291]
2011-10-28 08:37:28.6780 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6780 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6781 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9180]
2011-10-28 08:37:28.6783 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6784 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6785 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9216]
2011-10-28 08:37:28.6785 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6785 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6786 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9274]
2011-10-28 08:37:28.6787 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6787 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6788 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9302]
2011-10-28 08:37:28.6788 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6788 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6789 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9304]
2011-10-28 08:37:28.6789 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6790 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6791 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9313]
2011-10-28 08:37:28.6791 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6791 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6792 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9303]
2011-10-28 08:37:28.6792 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6793 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6794 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9304]
2011-10-28 08:37:28.6794 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6794 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6795 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9301]
2011-10-28 08:37:28.6797 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6797 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6798 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9314]
2011-10-28 08:37:28.6807 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6807 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6810 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9254]
2011-10-28 08:37:28.6811 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6811 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6812 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9255]
2011-10-28 08:37:28.6812 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6812 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6814 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9293]
2011-10-28 08:37:28.6814 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6814 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6815 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9283]
2011-10-28 08:37:28.6815 [PID=9044 ] [version] returning cached version: [AV#432]
2011-10-28 08:37:28.6815 [PID=9044 ] [send] est delay 0, skipping deadline check
2011-10-28 08:37:28.6817 [PID=9044 ] [send] [USER#3373] already has 1 result(s) for [WU#9299]
2011-10-28 08:37:31.3887 [PID=9044 ] [user_messages] [HOST#1112] MSG(low) No tasks sent
2011-10-28 08:37:31.3888 [PID=9044 ] [user_messages] [HOST#1112] MSG(notice) see scheduler log messages on http://albert.phys.uwm.edu//host_sched_logs/1/1112
2011-10-28 08:37:31.3888 [PID=9044 ] Sending reply to [HOST#1112]: 0 results, delay req 60.00
2011-10-28 08:37:31.3891 [PID=9044 ] Scheduler ran 3.351 seconds
STE\/E
I sometimes see strange
)
I sometimes see strange behaviour when trying to get work.
28/10/11 11:27:02|Albert@Home|Sending scheduler request: To fetch work. Requesting 3503 seconds of work, reporting 0 completed tasks
28/10/11 11:27:08|Albert@Home|Scheduler request succeeded: got 0 new tasks
28/10/11 11:27:08|Albert@Home|Message from server: No tasks sent
28/10/11 11:27:08|Albert@Home|Message from server: No tasks are available for Gravitational Wave S6 GC search
28/10/11 11:27:08|Albert@Home|Message from server: see scheduler log messages on http://albert.phys.uwm.edu//host_sched_logs/1/1020
28/10/11 11:27:08|Albert@Home|Message from server: No tasks are available for the applications you have selected.
This is just one host, another 7 are doing the same even when asking for many more seconds of work. According the the server status page there is plenty of work available, but I don't see the number of unsent WUs going down.
The log referred to says (trimmed):
...
2011-10-28 10:27:03.7620 [PID=28680] [send] work_req_seconds: 3503.08 secs
2011-10-28 10:27:03.7620 [PID=28680] [send] available disk 15.64 GB, work_buf_min 86400
2011-10-28 10:27:03.7620 [PID=28680] [send] active_frac 0.999925 on_frac 0.999697
2011-10-28 10:27:03.7625 [PID=28680] [send] [AV#364] not reliable; cons valid 1 < 10
2011-10-28 10:27:03.7625 [PID=28680] [send] set_trust: cons valid 1 < 10, don't use single replication
2011-10-28 10:27:03.7625 [PID=28680] [send] [AV#413] not reliable; cons valid 1 < 10
2011-10-28 10:27:03.7625 [PID=28680] [send] set_trust: cons valid 1 < 10, don't use single replication
2011-10-28 10:27:03.7625 [PID=28680] [send] [AV#432] not reliable; cons valid 0 < 10
2011-10-28 10:27:03.7625 [PID=28680] [send] set_trust: cons valid 0 < 10, don't use single replication
2011-10-28 10:27:03.7670 [PID=28680] [version] looking for version of einsteinbinary_BRP4
2011-10-28 10:27:03.7671 [PID=28680] [version] Checking plan class 'BRP3SSE'
2011-10-28 10:27:03.7671 [PID=28680] [version] reading plan classes from file '../plan_class_spec.xml'
2011-10-28 10:27:03.7676 [PID=28680] [version] parsed project prefs setting 'also_run_cpu' : true : 0.000000
2011-10-28 10:27:03.7677 [PID=28680] [version] Checking plan class 'ATIOpenCL'
2011-10-28 10:27:03.7677 [PID=28680] [version] No ATI devices found
2011-10-28 10:27:03.7677 [PID=28680] [version] [AV#443] app_plan() returned false
2011-10-28 10:27:03.7677 [PID=28680] [version] Checking plan class 'NVOpenCL'
2011-10-28 10:27:03.7677 [PID=28680] [version] No NVidia devices found
2011-10-28 10:27:03.7677 [PID=28680] [version] [AV#440] app_plan() returned false
2011-10-28 10:27:03.7677 [PID=28680] [version] [AV#432] (BRP3SSE) using unscaled projected flops: 1.49G
2011-10-28 10:27:03.7677 [PID=28680] [version] Best version of app einsteinbinary_BRP4 is [AV#432] (1.49 GFLOPS)
2011-10-28 10:27:03.7678 [PID=28680] [version] returning cached version: [AV#432]
...
2011-10-28 10:27:03.7700 [PID=28680] [version] returning cached version: [AV#432]
2011-10-28 10:27:03.7720 [PID=28680] [user_messages] [HOST#1020] MSG(low) No tasks sent
2011-10-28 10:27:03.7720 [PID=28680] [user_messages] [HOST#1020] MSG(low) No tasks are available for Gravitational Wave S6 GC search
2011-10-28 10:27:03.7720 [PID=28680] [user_messages] [HOST#1020] MSG(notice) see scheduler log messages on http://albert.phys.uwm.edu//host_sched_logs/1/1020
2011-10-28 10:27:03.7720 [PID=28680] [user_messages] [HOST#1020] MSG(low) No tasks are available for the applications you have selected.
2011-10-28 10:27:03.7721 [PID=28680] Sending reply to [HOST#1020]: 0 results, delay req 60.00
2011-10-28 10:27:03.7723 [PID=28680] Scheduler ran 0.022 seconds
Same thing happened last week, but it fixed itself before I had chance to report it.
Al.
Yeah, I seemed to have gotten
)
Yeah, I seemed to have gotten a little more work on the Dry Box's since I Posted that they couldn't get any new work since yesterday ...
STE\/E
Still no new work here
)
Still no new work here :(
Al.