The project will be taken down in about an hour to perform an update of the BOINC server code. Ideally you shouldn't notice anything, but usually the world isn't ideal. See you again on the other side.
Copyright © 2024 Einstein@Home. All rights reserved.
Comments
Project server code update
)
Scheduler request failed: HTTP internal server error
is what I get
Trotador
RE: Scheduler request
)
The same here
Does the problem persist? We
)
Does the problem persist?
We are testing the behavior of "CreditNew" on this project and will try to fix it if necessary. Be prepared for the unexpected!
BM
BM
Intel GPUs are now being
)
Intel GPUs are now being shown by the project in the computer details pages:
Computer 9008
But 'Use Intel GPU' isn't being shown on the Albert project preferences page in spite of there being intel GPU apps available, perhaps those apps need their settings adjusted?
Claggy
RE: Does the problem
)
04/06/2014 10:53:32 | Albert@Home | Sending scheduler request: Requested by user.
04/06/2014 10:53:32 | Albert@Home | Requesting new tasks for CPU and NVIDIA GPU
04/06/2014 10:53:36 | Albert@Home | Scheduler request failed: HTTP internal server error
The machine has been restarted.
Note: Local time UTC+2 (Prag).
If your are going to use Dave's random number generator, I leave the project. Some CPU projects have fixed it to number generator of expected and acceptable range, but no GPU project has been successful in that deal. Good luck.
EDIT: Before leaving I'll try my favorite joke - using app_info to get BPR4 CPU task to be crunched by intel_gpu. I expect credit 0.5 instead of 62.5. Can be seen as wu 590960.
RE: If your are going to
)
We know all that. The purpose of this test is, very specifically, to test and try out some fixes to CreditNew that some volunteers have spent the last nine months developing.
It would be most helpful if you would remain attached to the project, to generate some baseline data from a good range of hosts.
Albert has been chosen for this task specifically because it's a test project where nothing is expected to work anyway!
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: If your are going to
)
hold your horses please!
What we are specifically trying to do is to make something that actually WORKS out of David's RNG!
But for that we need to verify first that we see on Albert what we know from SETI main, before we can go on to stick a proper algorithm into it.
So, PLEASE, bear with us while we establish the system works as expected (i.e. is crap) and then apply the patches into the critical areas.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
RE: Does the problem
)
I'm getting work OK on 'CPU only' requests.
But I've attached some extra hosts, which are requesting NVidia work as part of project initialisation - that's returning 'internal server error' (see email). I'll have to find some way of blocking that initial request - seems to be fine after that's out of the way, using a 'CPU only' venue.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
OK, if Albert is not for
)
OK, if Albert is not for testing of applications only, but also for the credit system (as SetiBeta and ralph), I have no problem to help to generate baseline. It is important to know it. It that case I have no problem with low and random credit.
RE: RE: Does the problem
)
only worked after moving the host to a _fresh_ venue that only has CPU ticked.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
Found and fixed a bug in the
)
Found and fixed a bug in the scheduler.
Please try again.
BM
Seems to be OK. 04/06/2014
)
Seems to be OK.
My i7-2600K got ATI/AMD work,
)
My i7-2600K got ATI/AMD work, But when I suspend Seti (where it's crunching OpenCL Seti v7 work), nothing happens, the ATI/AMD Wu isn't started.
https://albertathome.org/host/8143
Edit: Finally I manage to get it to error:
Activated exception handling...
[12:06:40][13760][INFO ] Starting data processing...
GPU type not found in init_data.xml
[12:06:40][13760][ERROR] Failed to get OpenCL platform/device info from BOINC (error: -161)!
[12:06:40][13760][ERROR] Demodulation failed (error: -161)!
12:06:40 (13760): called boinc_finish
https://albertathome.org/task/1453772
Claggy
My host 11361 got some CUDA
)
My host 11361 got some CUDA work (like result 1448685), which failed with 'No suitable CUDA device available!' - although there's a fully functional "NVIDIA GeForce GTX 750 Ti (2047MB) driver: 335.28", which crunches CUDA at other projects.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
Our plan class specs that
)
Our plan class specs that were (semi-)automatically converted for the new server code were somewhat broken, causing probably all kinds of oddities for GPU tasks. Should be fixed now.
BM
BM
BRP4G cuda task is running OK
)
BRP4G cuda task is running OK at 9600GT/XP 32bit, driver 335.28.
RE: Intel GPUs are now
)
+1 Still no setting in preferences to select intel GPU, like you can for AMD or nVidia. Other GPU projects have this, even Einstein.
Dublin, California
Team: SETI.USA
RE: Our plan class specs
)
All my AtI/AMD tasks are predicted to take six seconds, when they get to 2 minutes 6 seconds they error:
https://albertathome.org/task/1455248
Claggy
Holmis reported the same for
)
Holmis reported the same for BRP4G-cuda32-nv301 in the problems area, except he inocculated his against "Exit status 197 EXIT_TIME_LIMIT_EXCEEDED" with a big boost to rsc_fpops_bound.
I guess one of us (and that probably means me) should fire up a GPU fetch and compare the calculations in the server log with what actually ends up in client_state.xml
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: Holmis reported the
)
I saw his post after I posted mine, I'm letting them all error, as I want the fix to come from the project/the Boinc Devs, rather than a work around,
From my client_state.xml, BRP4G has an extra three digits compared to the BRP5 app (of which I haven't received any work yet, so shouldn't have been updated yet):
Claggy
And from my
)
And from my lappy:
Both server and client are estimating 131 seconds. But a laptop NV GT 420M with 192 GFLOPS peak, running at 2.1 TeraFlop? Well, we wanted to check the PFC averages.....
Edit: the real problem in client_state is
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: I saw his post after I
)
Understood. I'm going to try and run mine, to establish a real APR to contrast with that stupid 'PFC avg' initial estimate. Hopefully that'll generate some more ammunition to throw at David. Thank goodness the 32 tasks per day limit worked properly.....
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
So far, every single one of
)
So far, every single one of the CasA tasks I've run since this test started has ended in 'validate error'. That's across several machines, but the worst example is host 9130.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
I see the CasA WUs (which
)
I see the CasA WUs (which were very old, generated in January, and incompatible with the current validator) have now been cancelled.
I'll abort all unstarted examples: should we abort jobs in progress too?
Edit - hold that thought. There are newly generated tasks in the database too, don't abort those.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: I see the CasA WUs
)
Yes, please.
BM
BM
All the suspect CasA (GW)
)
All the suspect CasA (GW) tasks in the database have been cancelled and unconditionally aborted by the project. Any you still have running on your computers (after doing a project update) should run OK, as should any new ones you get issued.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: All the suspect CasA
)
I've got some fresh Casa tasks on my last contact, But I also got a single BRP task, But BRP is deselected on the work venue for that host
(I do have 'Run beta/test application versions?' and 'Run CPU versions of applications for which GPU versions are available' selected though):
https://albert.phys.uwm.edu/host_sched_logs/8/8143
https://albertathome.org/task/1431784
In progress tasks for computer 8143
Edit: added the log so we don't loose it:
Claggy
Thanks for reporting. This
)
Thanks for reporting. This looks like a bug to me in current server (scheduler) code. May take a bit of time to investigate, though.
BM
BM
My this morning, ATI BRP4G
)
My this morning, ATI BRP4G tasks report the same wacky speeds (and short estimated durations) as last night,
Edit: got them all physically removed from my client_state.xml so they can be resent when the scheduler is fixed.
https://albert.phys.uwm.edu/host_sched_logs/8/8143
Claggy
I tried asking for more tasks
)
I tried asking for more tasks to my Nvidia GPU and got the following in Boinc's Event log:
[pre]05/06/2014 12:17:53 | Albert@Home | Requesting new tasks for NVIDIA
05/06/2014 12:17:53 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
05/06/2014 12:17:53 | Albert@Home | [sched_op] NVIDIA work request: 102560.41 seconds; 0.00 devices
05/06/2014 12:17:53 | Albert@Home | [sched_op] intel_gpu work request: 0.00 seconds; 0.00 devices
05/06/2014 12:17:55 | Albert@Home | Scheduler request completed: got 0 new tasks
05/06/2014 12:17:55 | Albert@Home | [sched_op] Server version 703
05/06/2014 12:17:55 | Albert@Home | Project requested delay of 60 seconds
05/06/2014 12:17:55 | Albert@Home | [sched_op] Deferring communication for 00:01:00
05/06/2014 12:17:55 | Albert@Home | [sched_op] Reason: requested by project[/pre]
As you can see there was no reason given for why I didn't receive any tasks.
Next step was checking the server contact log and I found this:
[pre]2014-06-05 10:17:54.8969 [PID=8307 ] [version] Checking plan class 'BRP4G-cuda32-nv301'
2014-06-05 10:17:54.8969 [PID=8307 ] [version] plan_class_spec: parsed project prefs setting 'gpu_util_brp' : true : 0.500000
2014-06-05 10:17:54.8969 [PID=8307 ] [version] [AV#716] daily quota exceeded[/pre]
So the reason was that I've already had my fill for the day.
Checking the Application details for my host gives:
[pre]Binary Radio Pulsar Search (Arecibo, GPU) 1.33 windows_intelx86 (BRP4G-cuda32-nv301)
Number of tasks completed 13
Max tasks per day 45
Number of tasks today 54
Consecutive valid tasks 13
Average processing rate 56.59266205016
Average turnaround time 0.29 days[/pre]
So I'm over the daily quota, but why didn't the scheduler tell me so in the reply to Boinc?
I enabled another debug flag
)
I enabled another debug flag (debug_array) to possibly get a grip on the app selection issue.
This means that the scheduler log excerpts that you see published for your hosts will get even longer. Please don't post these here in all gory detail, these are kept for ~200d on the server for the devs & admins anyway.
BM
BM
I got some of those tasks
)
I got some of those tasks resent:
https://albert.phys.uwm.edu/host_sched_logs/8/8143
Claggy
RE: I enabled another debug
)
I just made a work request for CPU work and was granted 10 S6CasA tasks and one BRP4 task.
In my Einstein@home prefs the BRP4 search is not selected but Beta-apps are.
Unfortunately Boinc contacted the scheduler again before I could check the server log so I missed it, just wanted to point out that there should be 2 logs at around 15:46 today.
This is the first line from the second contact, the first contact that assigned the CPU tasks should have occurred a few minutes before this one.
2014-06-05 15:46:56.9050 [PID=16227] Request: [USER#xxxxx] [HOST#2267] [IP xxx.xxx.xxx.226] client 7.2.42
Got some of tasks resent
)
Got some of tasks resent again, still the same, tasks are predicted to take 16 seconds, this host hasn't completed it's 11 validations of that app_version yet, so it's using the initial estimate, and not it's app_version APR yet:
Claggy
RE: RE: If your are going
)
At first, I thought this test must be going on with other apps that I am not running, because my Binary Radio Pulsar Search (Arecibo, GPU) tasks were still getting a flat 1000 per. I guess it took a while to kick in. This morning, I can see all of the validated tasks with differing credits awarded. There are a couple with ~500. A couple ~300-400. All the rest range from 90-120 credits.
So, what do we need to do to get this CreditNew problem fixed?
Dublin, California
Team: SETI.USA
RE: RE: RE: If your are
)
We're still generating the baseline - as you noticed, it took a few attempts to disable the previous fixed credits: now we can see and quantify the scale of the problem. There was another glitch with the CasA (GW) tasks this morning, so they still haven't properly started.
But rest assured, there are people editing away in the background even as I type.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: We're still generating
)
So a few of questions about this test of the credit system:
Do we mere mortals need to do anything special or do we just run task and let the wizards take care of things in the background?
Is there something I or any other regular user can do to help and/or speed things up?
Should I/we focus on a special search or run them all?
RE: RE: We're still
)
Well, I can only answer as the Sorcerer's Apprentice - I'm following what the wizards are doing, and trying to interpret their Delphic utterances.
The whole CreditNew structure - if you can dignify it as a structure - is basically built round CPU applications, with coprocessors tacked on as an afterthought. So it would perhaps be a good idea - most helpful - to fire through some extra CasA/GW tasks, so the baseline for those catches up after the slow start. But we're just going into a long (3-day) weekend in Germany, so there's no rush. Just keep taking the tablets as usual, and see how dirty the laundry gets. Can anybody beat Zombie for variability? I've seen him get from below 100, to above 10,000, for the par-1000 BRP4G tasks.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: So it would perhaps be
)
Roger that, will run some extra CasA tasks and then let the server decide.
Well, the server seems to think I've had to much and are now issuing credit between 88.84 - 127.05 per BRP4G task. Wish I could get 10,000+ for a task, would be good for my RAC! =)
Having urged everyone to run
)
Having urged everyone to run GW/CasA over the weekend, I found I couldn't download any myself - and it looks like we're running low on BRP4 too.
But I managed to get a good run of BRP4 validated (and continuing), so I dug out the trusty old graphing tool.
http://i1148.photobucket.com/albums/o562/R_Haselgrove/AlbertCreditNewBRP4.png
That shows the effect and variability of CreditNew in what is - supposedly - the best-case scenario: single-threaded CPU applications.
I can only easily plot my own reporting time on the X-axis, which is far from ideal for spotting trends (many validations come much later) - but it gives a visual indication of what we're up against.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: Having urged everyone
)
Sorry, S6CasA was still throttled due to the validator issues. Has been fixed. The BRP4 WUG should get new data every 12h, currently there are >3000 BRP4 tasks unsent.
BM
BM
I've been asked to dig out
)
I've been asked to dig out some statistics from that graph I posted earlier.
These are the same six hosts, with a few extra validations since this morning, and the 62.50 flat-rate credit results stripped out.
[pre]Host: 5367 9130 11359 11360 11361 11362
i7-3770 i5-4570 i5 M Q6600 Q6600 Xeon E5320
Credit for BRP4
Maximum 91.08 54.74 54.43 52.32 57.67 52.09
Minimum 40.66 34.40 44.37 40.19 42.77 38.25
Average 50.80 47.13 47.74 46.63 47.38 45.50
Median 49.58 47.29 47.58 46.75 47.41 45.58
Std Dev 6.18 3.23 1.70 1.93 2.28 2.85
Completed 369 169 118 74 136 59
APR 3.55845 3.19964 1.61535 2.33970 2.05929 1.20705[/pre]
I'm not sure I believe that Std Dev, but Excel is insistent.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
Following Richard's example
)
Following Richard's example I've put together plots of the credit awarded to host 2267 since the server upgrade.
Plot of credit for BRP4X64, ARP=4.13
Plot of credit for BRP4G, ARP=58.02
Plot of credit for S6CasA, ARP=3.14
And finally if anyones interested here's the Excel document with both the data and plots.
To summaries:
BRP4X64 is all over the place but "always" lower than the fix credit before the upgrade.
BRP4G took a nose dive and is slowly recovering, at least it appears to be going in the right direction.
S6CasA only has 9 validated tasks so cant really tell but seems to be like BRP4X64.
Hi All, Thanks very much
)
Hi All,
Thanks very much guys, That'll he very helpful. Even though we've characterised a lot of the instabilities (and isolated some sources) I'm still surprised, by the CPU applications in particular, to see 3 standard deviations covering from tightest example ~+/-10% (unstable but not all that bad) right up through some ~+/- 37% in the worse examples.
The engineering aspects are there in code, but it makes a big difference to see them in actual numbers,
Thanks again,
Jason
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
Surprised in which direction
)
Surprised in which direction - wider spread than you expected, or narrower?
On the one hand, I think there's evidence in the graphs that the initial spread was wide, but quite quickly narrowed significantly. My guess is that would be repeated every time a project released a new app_version.
On the other hand, I certainly ran my hosts quite intensively on BRP4 over the weekend, and I run them 24/7 with a stable mix of tasks. My results probably insert relatively little instability into the database, and here on a test project with few active participants, I suspect the same applies to other users too. In the rough-and-tumble of a production project, would the variability be greater?
I've switched to concentrate on GW/CasA today, and I'll grab some data from the users who have started on GPU work already.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: Surprised in which
)
Surprised in how well the (lackadaisical) response of the system matches predictions of established engineering control systems theory... and that anyone would sample stochastic ('natural world' noisy) data, hook it up to hard logic, and expect a stable system to result.... apparently without consulting a single control systems engineer.
As of now *to me*, the purposes of credit and RAC are for quantifying work, and for comparing against other hosts. In those contexts you might see convergence for your individual machine... but for comparative purposes those numbers, say against my host, are not meaningful on a human level ... They are quite useful, however, in estimating the entropy resulting from bad design, and so what will be required to fix it.
( even though we can probably redefine the terms entropy and chaos to include mashing noise together to make some guesses, I suspect my interpretations of the purpose of credit as a measure of work are long obliterated by the current system, and require extensive correction)
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
I didn't understand a word of
)
I didn't understand a word of that :P (well the words, maybe, but not the sentences)
Preliminary stats now I've got the GW framework set up (very preliminary - only 26 results validated):
[pre]Host: 5367 9130 11359 11360 11361 11362
i7-3770 i5-4570 i5 M Q6600 Q6600 Xeon E5320
Credit for GW-CasA
Maximum 262.35 290.46 0.00 294.04 223.03 330.82
Minimum 214.65 200.24 0.00 259.35 196.81 255.91
Average 237.06 241.14 #DIV/0! 277.81 209.92 295.37
Median 238.77 228.51 #NUM! 280.04 209.92 299.38
Std Dev 15.45 30.16 #DIV/0! 17.45 18.54 37.62[/pre]
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: RE: Surprised in
)
I'll add, after private communications, that apparently there was some level of consultation during design &/or implementation of the system.... but that apparently that credit and work estimates would be intimately connected was not raised.
I accept that the interpretations I have that both Credit and Time estimates are abstract 'human tools' separate from the computation, is probably a relatively novel contribution to that overall picture, and so that stability in perceived usefulness of those figures may not have been considered a big issue until well after the chaos had arisen (on other projects)
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
RE: I didn't understand a
)
Yep, there's that +/- 30% ( std dev x 3 )... let's see how she settles :)
i.e. Garbage in -> Garbage out.
http://en.wikipedia.org/wiki/Garbage_in,_garbage_out
[Edit:] choice new sig:
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
Oh, you're going to love this
)
Oh, you're going to love this one
[pre] Jason Holmis Claggy Zombie Zombie (Mac)
Host: 11363 2267 9008 6490 6109
GTX 780 GTX 660 GT 650M TITAN GTX 680MX
Credit for BRP4G (GPU)
Maximum 1170.48 1036.86 10239.0 1654.85 11847.50
Minimum 115.82 88.84 153.90 25.79 94.88
Average 548.33 463.98 3875.88 874.96 2256.70
Median 468.80 390.21 2977.38 865.33 1591.80
Std Dev 431.90 268.52 2873.26 362.30 2395.61[/pre]
I'll upload a graph after lunch, when my monitor has cooled down and I've stopped laughing.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg