The project will be taken down in about an hour to perform an update of the BOINC server code. Ideally you shouldn't notice anything, but usually the world isn't ideal. See you again on the other side.
Copyright © 2024 Einstein@Home. All rights reserved.
Comments
Ok. I will start one host at
)
Ok. I will start one host at a time to see what happening, that will going to take some days since the caches are allready loaded.
July 3, 29, 2014 04:00 UTC
)
July 3, 29, 2014 04:00 UTC (switched to BRP5)
https://albertathome.org/host/9649
[pre]BRP5 2x using 1 cpu thread each (app_config), GPU utilization = 92%
running an additional 4x Skynet POGs cpu WUs
GPU 7950 mem=1325, gpu=1150, pcie v2 x16
OS Win7 x64 Home Premium
CPU 980X running at 3.41 GHz with HT off
MEM Triple channel 1600 (7.7.7.20.2)[/pre]
RE: Well, here's the first
)
Allmost the same 15 cr for 10k to 20k secs of running time with a 690. That´s i could call a "credit deflation"
https://albertathome.org/host/10352/tasks&offset=0&show_names=0&state=4&appid=27
Yeah, looks a lot like the
)
Yeah, looks a lot like the sortof discrepancies I see in simulations.
Will definitely be worth putting a 1.4 app onramp into the spreadsheets, to see how well the models reflect reality
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
Most of the BRP5 'Perseus
)
Most of the BRP5 'Perseus Arm' tasks I've seen so far have old WUs which have been lying around in the database for some time, with multiple failures - not sure whether anybody has looked to see if that affects the credit granting process - even if only by the averages shifting between initial creation and final validation (I don't think so, because I don't think anything about the prevailing averages are stored into the task record when it's created from the WU - but I haven't looked at the database schema or the code).
But I've just validated the first 'clean', two replications only case:
WU 625789
For 12.62 credits.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
Richard The WU you talk
)
Richard
The WU you talk about was validated against one of my host with a 670 too.
Something calls my atention, the crunching times, your takes about 12k secs mine 7.5k secs. I run 1 WU at a time and my 670 (EVGA FTW) is powered by an slow I5 vs your powerfull I7. Can you tell me why the time diference since both GPU´s are relative similars?
BTW The 12,62 credits received are realy amazing. :)
RE: Richard The WU you
)
That seems simple - I'm running two at a time, so effective throughput would be one task every 6k seconds (on your figures - I haven't looked at the data for BRP5 in any detail yet). The efficiency gain from running two together is probably more significant than the i5/i7 difference.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
Thanks, yes thas easely
)
Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other.
RE: Thanks, yes thas easely
)
Sorry about that. We've all been pretty much making it up as we go along. I think I made that choice some time before somebody else posted the "one at a time" suggestion: I decided it was better to keep "steady as she goes" - there would be more noise in the results if you keep changing the utilisation factor.
Most of the time while running Arecibo tasks I got an incredibly stable run time: that counts for more in extended tests, where it's the measured APR that counts, and little (if any) weight is given to the theoretical "peak GFLOPS" the card is capable of.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
Got my 4th validation for the
)
Got my 4th validation for the v1.40 BRP5 app in earlier today and credits are on the rise, first two got 12,62, the 3rd got 12,73 and the 4th a whopping 15,41!
The 12,73 one was against Richard both running v1.40 and the last one an older WU against Snow Crash on v1.39.
The server seems to have
)
The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all:
[AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G
According to WU 619924, the figures for v1.39 were rather different.I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: The server seems to
)
Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc.
[Edit:]
Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
WU 618702 looks perkier -
)
WU 618702 looks perkier - v1.39/v1.40 cross-validation.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: WU 618702 looks perkier
)
That's certainly more like the credits I expected from the models. I suspect that the cross app normalisation / averaging business may be quite valid/needed for credit purposes. It just royally screws with the time estimates before a new host/app version engages host scaling (which we've been calling onramp periods)
Rectifying that will probably need all our walkthrough efforts compared in detail to fill any knowledge gaps, but basically seeing something resembling expected behaviour is a good start. Having no incorrectly scaled CPU app to contend with in the mix means the credit part should be around the right region, even if quite noisy & prone to destabilisation.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
... double post
)
... double post
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
RE: RE: The server seems
)
Oh F***
to be fair we did ask for details to be inherited to new versions, to limit the onramp damage. Probably does the opposite o what would be clever.
edit: app_version doesn't get scaled until it has 100 samples, but it may be picking up scaling in other parts.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
Yeah, cross check of
)
Yeah, cross check of walkthroughs should help. Big problem is at least 16 possible general starting states, multiplied across wingmen for many combinations, I'm going to resist the temptation to model all 256 base combinations, and think in terms of reducing those # of states... for example correct the system in places so that CPU & GPU become considered the same much earlier in the sequence, remove the need for onramps, and perhaps even consider if stock & anon are really different enough to warrant completely separate codepaths as they have in places.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
We've just got a fresh
)
We've just got a fresh release of FGRP to version 1.12. Apps are identical to 1.11. This _should_ solve the time limit exceeded problem, but more bugs may be lurking.
edit: you may have to opt in for the app.
edit2: To be more precise, you may have to allow both beta apps and FGRP.
Anybody runs into further -197 time limit exceeded errors with FGRP [or any other app] please report ASAP. Please always include host ID - we can glean most variables from database dumps now, but if you can also state your peak_flops (from BOINC startup messages) that would be very helpful.
We have more or less finished analysis and are contemplating how we can best address any issues that we established as problem areas from the live run. You can only do so much from the theory [i.e. code reading] you always need the actual data too, to get a complete picture.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
Please cross-refer to thread
)
Please cross-refer to thread 'Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED' in the 'Problems and bug reports' area before carrying out the tests that Eyrie requested.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
Been a while since we had a
)
Been a while since we had a statistical report on the new server code.
1) The Arecibo GPU apps seem to have settled down. Just a few validations trickling in from the hosts I've been monitoring, and all (except Claggy's laptop) seem to be +/- 2,000 credits - about double what Bernd thought the tasks were worth before we started.
[pre] Jason Holmis Claggy Juan Juan Juan RH
Host: 11363 2267 9008 10352 10512 10351 5367
GTX 780 GTX 660 GT 650M GTX 690 GTX 690 GTX 780 GTX 670
Credit for BRP4G, GPU
Maximum 2708.58 2313.45 10952.0 7209.47 6889.8 6652.9 4137.85
Minimum 115.82 88.84 153.90 1667.23 1244.41 1546.02 1355.49
Average 1408.03 1549.45 3256.04 2472.29 2026.98 2205.24 1980.88
Median 1586.65 1831.19 2244.85 2123.89 1910.04 1997.84 1916.41
Std Dev 626.98 633.96 2258.33 948.11 592.78 637.63 267.22
nSamples 87 171 116 161 151 189 670[/pre]
I've also plotted the same hosts' results for BRP5 (Perseus Arm). The logarithmic plot looks similar to the lower half of the 'trumpet' graph that emerged from Arecibo. Remember that we saw ridiculously low numbers to start with: we still haven't reached Bernd's previous assessment of value.
The linear graph shows more clearly that we haven't reached a steady state yet: I'll switch my GTX 670 back to this application once we have our 100 validations for its version of the Gamma search (which should happen this evening).
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: 1) The Arecibo GPU apps
)
Having quite a bit more understanding of the nature of the beast now, the major challenges making predictions with the current mechanism implementation are twofold.
First, in the GPU only sense, we see a discrepancy between the chosen normalisation (for credit purposes) efficiency point of 10%, and the 'actual' efficiency of somewhere in the region of ~5% for single task per GPU operation. This amounts to an effective increase of the former application's award.
Second, and a little more insidious, understanding the limitations of average based numerical control with respect to noisy populations, quickly reveals that uncertainty in any specific numbers, as partly reflected in the standard deviations, guarantees many of the numbers intended for comparison of hosts, applications, credits, and cheat detection/prevention, are arbitrary relative to the user and project expectations for the usefulness & meaning of those numbers.
Tools (algorithms etc) exist to improve these situations, namely those of making useful estimations, handling various kinds of 'noise' such as host change, real measurement error and an unlimited range of usage variation conditions, to or beyond end-user expectation.
Refining these mechanisms, using such design tools, ultimately will reduce the development and maintenance overhead constantly dogging the Boinc codebase, while simultaneously making the system more resilient/adaptive to future change. There is also the angle that high quality available numbers can potentially be more useful in global scientific contexts, than just for Credit/RAC & individual needs, having applications in fields such as distributed computing, computer sciences, and engineering fields, probably among more.
@All:
In those lights, I'd like to thank everyone here for helping out. I'm progressing to a detailed simulation and design phase, that will take some time to get right. Please keep collecting, observing, commenting etc, and we're on the right road.
Jason
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
Looks like the next bout of
)
Looks like the next bout of inflation has set in on the Perseus Arm - I think we're above Bernd's parity value now.
Meanwhile, the Gamma search - after a brief flirtation with the ~2,000 level - has dropped back down to the the low hundreds. May be correlated with a scaling adjustment when a second app_version (Win64/intel_gpu) reached the 100 threshhold around 16:00 UTC Tuesday.
Edit - or it might have been CPU normalisation kicking in. We have Win32/SSE above threshhold now as well, and Win32/plain will reach it any time now (99 valid at 08:00 UTC)
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: Edit - or it might have
)
With last server data dated 16th, looks like a bit of an interesting illustration going on there. Lowest pfc average, with n > 100, is indeed FGRPSSE with a value of ~10.6. Opencl nv seems to be ~144.
Now nv-OpenCL's is expected to be about 2x what it should be due to the mechanism normalising to 10% efficiency instead of the more realistic 5%... so picture the nv one as 'corrected' ~144/2 -> 72 (rough is good enough here)
CPU SSE has an approximate underclaim of 1.5^2 = 2.25x , so we take 10.6*2.25-> 23.85 'corrected' for the CPU case (again rough is better than uncorrected inputs)
So now we know the relative efficiencies of the implementations, a much tighter ~3x spread than the original uncorrected (noisy) numbers suggest. Right now credit is awarded based on the minimum pfc app, so about a third of what the GPU one would be 'asking'.
Intuitive eyeballs say the GPU population is going to be larger, by sheer throughput. The 'right' credit is in between the corrected CPU and GPU figures, weighted a fair bit to the GPU case. There's tools for determining that too, better than averages.
Net effect of the simplified/corrected/improper-assumption-removed mechanism would be an even higher quality (more trustworthy) number in between the CPU & GPU case, with a weighting bonus encouraging optimisation, and inherently rejecting likely fraudulent claims (another possible source of noise disturbances). So likely in the region of ~2x what win CPU SSE only validations would award now.
I'm surprised how well that correlates with the seti@home astropulse case, and it points the bone directly at the seti@home multibeam case for AVX enabled app underclaims with no peak flops correction.
Wow, we nailed this to the wall good and proper.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
Time for another inflation
)
Time for another inflation update.
There's a very clear discontinuity at midnight on 16 July - which is exactly when the second app_version (opencl-ati for Windows/64) reached a pfc_n of 101. Unfortunately, we don't have a third app on the cards for a while yet - cuda32-nv270 for Linux has been stuck at 75 for two days now.
Because I can only contribute NV for Windows (946 and counting), I've switched back to BRP4G Arecibo GPU, to check that nothing untoward has been happening while I've been concentrating on Perseus (it hasn't).
So, here's a question to ponder on, while we go into the Drupal migration next week, and then possibly some new apps to test:
Why has CreditNew picked something ~4,000 credits to stabilise on for Perseus tasks, and something ~2,000 credits for Arecibo tasks? That's a ratio of - in very rough trend terms - 2::1, when the runtimes are closer to 3::1 - close and steady in my own case, and similar on all the other hosts I've spot-checked (including other OSs and GPU platforms).
Is this perhaps more evidence that the ultimate credit rates area very largely determined by , where there are no complications from CPU apps to contend with? The figures for the two apps I'm comparing here are:
Arecibo 280,000,000,000,000
Perseus 450,000,000,000,000
ratio 1.6::1
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
RE: Just FYI, I had to take
)
This thread is too big to read through it all. I am back with my two GPUs. Is that still relevant? Or can I mess with my settings as I see fit?
Dublin, California
Team: SETI.USA
My Ubuntu C2D T8100 Laptop
)
My Ubuntu C2D T8100 Laptop has been crunching both Astrouplse_v7 and Gamma-ray pulsar search #3 v1.12 and (FGRPSSE) tasks at the same time,
the Astropulse tasks from the four app_versions initially were each estimated at sometime like one hundred and fifty hours, once their 100 validations were in, their estimates dropped to a value below reality,
All tasks for computer 68093
Application details for host 68093
With Gamma-ray pulsar search #3 v1.12 and (FGRPSSE) the same has happened, the task durations are also under estimated, meaning Boinc over fetches, and can't complete the tasks in time, (I think it under estimated from the start through),
now it's validations have passed 11, Boinc has a better gasp on how long these tasks take and hasn't fetched so many, and is slowly catching up again,
It's cache setting is set to about one day to one and a half days (It's remote from me at the moment)
All tasks for computer 10230
Application details for host 10230
Shouldn't the post 100 validation overall and pre 11 host app validation still be a bit conservative, and not cause over fetch?
Claggy
Progress here seems to have
)
Progress here seems to have ground to a halt, or is there still progress going on behind the scenes?
Claggy
I'm wondering the same
)
I'm wondering the same thing.
Maybe it's time to restart with a new thread in "Problems and Bug Reports" forum where it will be more visible than on the second page in "News & Blogs" with over 320 comments to it.
Claggy wrote:Progress here
)
Sorry for big delays getting back here, mostly work and Cuda7 related at the moment.
As per Seti GPU Users Group subform responses, no 'quick fix' bandaids, and the volume of background work is rather large and in desperate need of collation (despite being on hiatus with other responsibilities, periodically I've cast eyes around for unexpected phenomena): I'll likely be back in more frequent communications with the parties involved, once things have me spread less thin. After that time arrives,, solution progress is a matter of communicating design options effectively, more than any kindof 'debugging'.
and,
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
Noticed today, looks as if
)
Noticed today, looks as if BRP Credit on all my Arm hosts doubled on tasks validated after around the 4th Nov,
And on my Linux C2D T8100 host, it's 2 Wu's completed with the Gravitational Wave S6 Folllow-up #1 v1.02app both got exactly 4,000 Credits,
that hints that they are on the top limit for that app.
Claggy
Claggy wrote:Noticed today,
)
Which begs the question, "Why?"
I wondered if perhaps some new app_version had been deployed for testing, and added a different PFC_avg into the mix - that seemed to be the point at which credit jumped, back in July. But no:
But then I looked at PFC_scale:
It looks as if the project deprecated the mainstream (Linux/Mac/Windows) apps on 4 November, leaving only ARM and Android active. I've only sampled it to 12-hour intervals, but there were only 7 tasks completed that day (according to pfc_n), so I'm inclined to say that was an abrupt recalculation of the scale, with no smoothing applied at all.
I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.
http://www.boincsynergy.com/images/stats/comb-3475.jpg
That appears to fit the
)
That appears to fit the models, with Android/NEON becoming new credit normalisation reference. Since Android client has SIMD Whetstone. while x86 32/64 with SSEx doesn't, the ~2.5x jump is expected, as it reverses the effective underclaim that had become reference (1). The mechanism isn't right for sure ( e.g. no 'smoothing' as you mentioned), although credits for those results should now commence chaotically orbiting the COBBLESTONE_SCALE*wu_est 'ideal' value.
Interestingly enough, This removal of x86 SIMD apps appears to have already demonstrated that we have a grip on what's going on, by emulating the CPU SIMD Flops correction step, first part of the above plan. cool
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage