Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

Got my first invalid, where

Message 80276 in response to message 80275

Got my first invalid, where my task was matched against two OpenCL 1.1 running intel GPUs:

Workunit 603716

Claggy

Jacob Klein
Jacob Klein
Joined: 6 Nov 11
Posts: 16
Credit: 2938967
RAC: 0

Let's keep this thread on the

Let's keep this thread on the topic of its subject, please.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: Let's keep this thread

Message 80278 in response to message 80277

Quote:
Let's keep this thread on the topic of its subject, please.


Your last contribution to the subject, five days ago was.

Quote:
Just wanted to post to mention that I will likely no-longer be testing these problematic work units.


Have you come back to testing, and if so, what have you discovered in the meantime about the cause of the problems?

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Jacob Klein
Jacob Klein
Joined: 6 Nov 11
Posts: 16
Credit: 2938967
RAC: 0

I have not resumed testing on

I have not resumed testing on this issue, and do not anticipate doing so. I replaced my GTS 240 with a second GTX 660 Ti, and am focusing on GPUGrid and Poem.

Regarding the issue, it seemed to be bad estimations (based on the existing GTX 660 Ti, which had a local exclude_gpu option set on this project) for tasks that ran on the uber weak GTS 240. As I said, rsc_fpops_bound was busted, and the problem was server-side. I don't think it's fixed yet, though I don't know for sure.

Regards,
Jacob

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: I have not resumed

Message 80280 in response to message 80279

Quote:

I have not resumed testing on this issue, and do not anticipate doing so. I replaced my GTS 240 with a second GTX 660 Ti, and am focusing on GPUGrid and Poem.

Regarding the issue, it seemed to be bad estimations (based on the existing GTX 660 Ti, which had a local exclude_gpu option set on this project) for tasks that ran on the uber weak GTS 240. As I said, rsc_fpops_bound was busted, and the problem was server-side.

Regards,
Jacob


Exactly. Specifically during what we are calling "stage 2 of the onramp", between 100 global validations for the project as a whole, and 11 local validations for the individual host - the phase during which flops determined by "PFC avg" can be seen in the server logs.

If you're not testing any more, and we understand that much, why do you wish to prevent us discussing other matters of mutual interest in this thread?

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Jacob Klein
Jacob Klein
Joined: 6 Nov 11
Posts: 16
Credit: 2938967
RAC: 0

This thread is for the

Message 80281 in response to message 80280

This thread is for the "EXIT_TIME_LIMIT_EXCEEDED" error a user might get running these newer apps. A forum search on that error, will find this thread. I am actually still monitoring this thread for an answer.

Any other problem, such as bad OpenCL versions generating bad results and bad validations, deserve to be in their own threads.

Thanks.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

RE: Exactly. Specifically

Message 80282 in response to message 80280

Quote:
Exactly. Specifically during what we are calling "stage 2 of the onramp", between 100 global validations for the project as a whole, and 11 local validations for the individual host - the phase during which flops determined by "PFC avg" can be seen in the server logs.


And to get to those 100 global validations, and 11 local validations, tasks need to validate, having masses of hosts throwing inconclusives into the works is slowing down the process of recovering from the -197 errors, at least for the Gamma-ray pulsar search #3

Claggy

Jacob Klein
Jacob Klein
Joined: 6 Nov 11
Posts: 16
Credit: 2938967
RAC: 0

I don't know hardly anything

I don't know hardly anything about how the server does its calculations. I had a problem, I reported the problem, and at some point I was hoping to receive an answer to the problem.

In the meantime, I was expecting the thread to stay on-topic to the problem, to make it easier to find an answer to in the future. Maybe I'm old-fashioned.

If you (the Albert team) need me to do additional testing on the "197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED" problem that my computer was receiving, I'd have to swap hardware to do it, but could do so. Let me know if you'd like to request that.

Thanks,
Jacob

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

I think I found the problem.

I think I found the problem. FGRP has been updated to version 1.12 as a result - the apps are identical.

Anybody runs into further -197 time limit exceeded errors please report here ASAP. Please always include host ID - we can glean most variables from database dumps now, but if you can also state your peak_flops (from BOINC startup messages) that would be very helpful.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

Unfortunately, I think Eyrie

Unfortunately, I think Eyrie has jumped the gun on this one.

Speaking specifically about FGRP (Gamma-ray pulsar search #3) only:

My NVidia 420M laptop (host 11359) has just been allocated new work from the v1.12 run. It was sent out with the 'conservative' (first stage onramp) speed estimate of 27.76 GFlops: that's very close to the 23.59 GFlops the same host achieves on BRP4G-cuda32-nv301.

BUT: FGRP is a beta app, which makes very little use of the GPU as yet. It runs much, much slower than BRP4G-cuda32-nv301 on my hardware. The tasks would have got error 197 if I hadn't taken precautions. I can't say whether the problem is Einstein's programming, or NVidia's OpenCL implementation, but at this initial stage for the new app_version, we can't blame BOINC.

But we're back to square one with the validation count. Could testers please run more of these tasks (with edited , so they can complete), please? We still need to test how BOINC handles the transitions at 100 validations for the app_version across the project as a whole, and 11 validations for each individual host.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.