Project server code update

The project will be taken down in about an hour to perform an update of the BOINC server code. Ideally you shouldn't notice anything, but usually the world isn't ideal. See you again on the other side.

Comments

juan BFB
juan BFB
Joined: 10 Dec 12
Posts: 8
Credit: 1674320
RAC: 0

Ok. I will start one host at

Ok. I will start one host at a time to see what happening, that will going to take some days since the caches are allready loaded.

Snow Crash
Snow Crash
Joined: 11 Aug 13
Posts: 10
Credit: 5011603
RAC: 0

July 3, 29, 2014 04:00 UTC

July 3, 29, 2014 04:00 UTC (switched to BRP5)
https://albertathome.org/host/9649

[pre]BRP5 2x using 1 cpu thread each (app_config), GPU utilization = 92%
running an additional 4x Skynet POGs cpu WUs
GPU 7950 mem=1325, gpu=1150, pcie v2 x16
OS Win7 x64 Home Premium
CPU 980X running at 3.41 GHz with HT off
MEM Triple channel 1600 (7.7.7.20.2)[/pre]

juan BFB
juan BFB
Joined: 10 Dec 12
Posts: 8
Credit: 1674320
RAC: 0

RE: Well, here's the first

Message 80200 in response to message 80196

Quote:

Well, here's the first conundrum:

All Binary Radio Pulsar Search (Perseus Arm Survey) tasks for computer 5367

After 200 minutes of solid GTX 670 work on Perseus, I earn the princely sum of ... 15 credits!


Allmost the same 15 cr for 10k to 20k secs of running time with a 690. That´s i could call a "credit deflation"

https://albertathome.org/host/10352/tasks&offset=0&show_names=0&state=4&appid=27

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

Yeah, looks a lot like the

Yeah, looks a lot like the sortof discrepancies I see in simulations.

Will definitely be worth putting a 1.4 app onramp into the spreadsheets, to see how well the models reflect reality

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

Most of the BRP5 'Perseus

Message 80202 in response to message 80201

Most of the BRP5 'Perseus Arm' tasks I've seen so far have old WUs which have been lying around in the database for some time, with multiple failures - not sure whether anybody has looked to see if that affects the credit granting process - even if only by the averages shifting between initial creation and final validation (I don't think so, because I don't think anything about the prevailing averages are stored into the task record when it's created from the WU - but I haven't looked at the database schema or the code).

But I've just validated the first 'clean', two replications only case:

WU 625789

For 12.62 credits.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

juan BFB
juan BFB
Joined: 10 Dec 12
Posts: 8
Credit: 1674320
RAC: 0

Richard The WU you talk

Richard

The WU you talk about was validated against one of my host with a 670 too.

Something calls my atention, the crunching times, your takes about 12k secs mine 7.5k secs. I run 1 WU at a time and my 670 (EVGA FTW) is powered by an slow I5 vs your powerfull I7. Can you tell me why the time diference since both GPU´s are relative similars?

BTW The 12,62 credits received are realy amazing. :)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: Richard The WU you

Message 80204 in response to message 80203

Quote:

Richard

The WU you talk about was validated against one of my host with a 670 too.

Something calls my atention, the crunching times, your takes about 12k secs mine 7.5k secs. I run 1 WU at a time and my 670 (EVGA FTW) is powered by an slow I5 vs your powerfull I7. Can you tell me why the time diference since both GPU´s are relative similars?

BTW The 12,62 credits received are realy amazing. :)


That seems simple - I'm running two at a time, so effective throughput would be one task every 6k seconds (on your figures - I haven't looked at the data for BRP5 in any detail yet). The efficiency gain from running two together is probably more significant than the i5/i7 difference.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

juan BFB
juan BFB
Joined: 10 Dec 12
Posts: 8
Credit: 1674320
RAC: 0

Thanks, yes thas easely

Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: Thanks, yes thas easely

Message 80206 in response to message 80205

Quote:
Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other.


Sorry about that. We've all been pretty much making it up as we go along. I think I made that choice some time before somebody else posted the "one at a time" suggestion: I decided it was better to keep "steady as she goes" - there would be more noise in the results if you keep changing the utilisation factor.

Most of the time while running Arecibo tasks I got an incredibly stable run time: that counts for more in extended tests, where it's the measured APR that counts, and little (if any) weight is given to the theoretical "peak GFLOPS" the card is capable of.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Holmis
Holmis
Joined: 4 Jan 05
Posts: 89
Credit: 2104736
RAC: 0

Got my 4th validation for the

Got my 4th validation for the v1.40 BRP5 app in earlier today and credits are on the rise, first two got 12,62, the 3rd got 12,73 and the 4th a whopping 15,41!
The 12,73 one was against Richard both running v1.40 and the last one an older WU against Snow Crash on v1.39.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

The server seems to have

Message 80208 in response to message 80207

The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all:

[AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G According to WU 619924, the figures for v1.39 were rather different.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: The server seems to

Message 80209 in response to message 80208

Quote:

The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all:

[AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G According to WU 619924, the figures for v1.39 were rather different.

Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc.

[Edit:]
Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

WU 618702 looks perkier -

Message 80210 in response to message 80209

WU 618702 looks perkier - v1.39/v1.40 cross-validation.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: WU 618702 looks perkier

Message 80211 in response to message 80210

Quote:
WU 618702 looks perkier - v1.39/v1.40 cross-validation.

That's certainly more like the credits I expected from the models. I suspect that the cross app normalisation / averaging business may be quite valid/needed for credit purposes. It just royally screws with the time estimates before a new host/app version engages host scaling (which we've been calling onramp periods)

Rectifying that will probably need all our walkthrough efforts compared in detail to fill any knowledge gaps, but basically seeing something resembling expected behaviour is a good start. Having no incorrectly scaled CPU app to contend with in the mix means the credit part should be around the right region, even if quite noisy & prone to destabilisation.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

... double post

Message 80212 in response to message 80210

... double post

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: RE: The server seems

Message 80213 in response to message 80209

Quote:
Quote:

The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all:

[AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G According to WU 619924, the figures for v1.39 were rather different.

Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc.

[Edit:]
Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged.


Oh F***

to be fair we did ask for details to be inherited to new versions, to limit the onramp damage. Probably does the opposite o what would be clever.

edit: app_version doesn't get scaled until it has 100 samples, but it may be picking up scaling in other parts.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

Yeah, cross check of

Message 80214 in response to message 80213

Yeah, cross check of walkthroughs should help. Big problem is at least 16 possible general starting states, multiplied across wingmen for many combinations, I'm going to resist the temptation to model all 256 base combinations, and think in terms of reducing those # of states... for example correct the system in places so that CPU & GPU become considered the same much earlier in the sequence, remove the need for onramps, and perhaps even consider if stock & anon are really different enough to warrant completely separate codepaths as they have in places.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

We've just got a fresh

We've just got a fresh release of FGRP to version 1.12. Apps are identical to 1.11. This _should_ solve the time limit exceeded problem, but more bugs may be lurking.

edit: you may have to opt in for the app.
edit2: To be more precise, you may have to allow both beta apps and FGRP.

Anybody runs into further -197 time limit exceeded errors with FGRP [or any other app] please report ASAP. Please always include host ID - we can glean most variables from database dumps now, but if you can also state your peak_flops (from BOINC startup messages) that would be very helpful.

We have more or less finished analysis and are contemplating how we can best address any issues that we established as problem areas from the live run. You can only do so much from the theory [i.e. code reading] you always need the actual data too, to get a complete picture.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

Please cross-refer to thread

Message 80216 in response to message 80215

Please cross-refer to thread 'Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED' in the 'Problems and bug reports' area before carrying out the tests that Eyrie requested.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

Been a while since we had a

Been a while since we had a statistical report on the new server code.

1) The Arecibo GPU apps seem to have settled down. Just a few validations trickling in from the hosts I've been monitoring, and all (except Claggy's laptop) seem to be +/- 2,000 credits - about double what Bernd thought the tasks were worth before we started.

[pre] Jason Holmis Claggy Juan Juan Juan RH
Host: 11363 2267 9008 10352 10512 10351 5367
GTX 780 GTX 660 GT 650M GTX 690 GTX 690 GTX 780 GTX 670

Credit for BRP4G, GPU
Maximum 2708.58 2313.45 10952.0 7209.47 6889.8 6652.9 4137.85
Minimum 115.82 88.84 153.90 1667.23 1244.41 1546.02 1355.49
Average 1408.03 1549.45 3256.04 2472.29 2026.98 2205.24 1980.88
Median 1586.65 1831.19 2244.85 2123.89 1910.04 1997.84 1916.41
Std Dev 626.98 633.96 2258.33 948.11 592.78 637.63 267.22

nSamples 87 171 116 161 151 189 670[/pre]
I've also plotted the same hosts' results for BRP5 (Perseus Arm). The logarithmic plot looks similar to the lower half of the 'trumpet' graph that emerged from Arecibo. Remember that we saw ridiculously low numbers to start with: we still haven't reached Bernd's previous assessment of value.

The linear graph shows more clearly that we haven't reached a steady state yet: I'll switch my GTX 670 back to this application once we have our 100 validations for its version of the Gamma search (which should happen this evening).

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: 1) The Arecibo GPU apps

Message 80218 in response to message 80217

Quote:
1) The Arecibo GPU apps seem to have settled down. Just a few validations trickling in from the hosts I've been monitoring, and all (except Claggy's laptop) seem to be +/- 2,000 credits - about double what Bernd thought the tasks were worth before we started.
...
I've also plotted the same hosts' results for BRP5 (Perseus Arm). The logarithmic plot looks similar to the lower half of the 'trumpet' graph that emerged from Arecibo. Remember that we saw ridiculously low numbers to start with: we still haven't reached Bernd's previous assessment of value.
...

Having quite a bit more understanding of the nature of the beast now, the major challenges making predictions with the current mechanism implementation are twofold.
First, in the GPU only sense, we see a discrepancy between the chosen normalisation (for credit purposes) efficiency point of 10%, and the 'actual' efficiency of somewhere in the region of ~5% for single task per GPU operation. This amounts to an effective increase of the former application's award.

Second, and a little more insidious, understanding the limitations of average based numerical control with respect to noisy populations, quickly reveals that uncertainty in any specific numbers, as partly reflected in the standard deviations, guarantees many of the numbers intended for comparison of hosts, applications, credits, and cheat detection/prevention, are arbitrary relative to the user and project expectations for the usefulness & meaning of those numbers.

Tools (algorithms etc) exist to improve these situations, namely those of making useful estimations, handling various kinds of 'noise' such as host change, real measurement error and an unlimited range of usage variation conditions, to or beyond end-user expectation.

Refining these mechanisms, using such design tools, ultimately will reduce the development and maintenance overhead constantly dogging the Boinc codebase, while simultaneously making the system more resilient/adaptive to future change. There is also the angle that high quality available numbers can potentially be more useful in global scientific contexts, than just for Credit/RAC & individual needs, having applications in fields such as distributed computing, computer sciences, and engineering fields, probably among more.

@All:
In those lights, I'd like to thank everyone here for helping out. I'm progressing to a detailed simulation and design phase, that will take some time to get right. Please keep collecting, observing, commenting etc, and we're on the right road.

Jason

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

Looks like the next bout of

Looks like the next bout of inflation has set in on the Perseus Arm - I think we're above Bernd's parity value now.

Meanwhile, the Gamma search - after a brief flirtation with the ~2,000 level - has dropped back down to the the low hundreds. May be correlated with a scaling adjustment when a second app_version (Win64/intel_gpu) reached the 100 threshhold around 16:00 UTC Tuesday.

Edit - or it might have been CPU normalisation kicking in. We have Win32/SSE above threshhold now as well, and Win32/plain will reach it any time now (99 valid at 08:00 UTC)

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: Edit - or it might have

Message 80220 in response to message 80219

Quote:
Edit - or it might have been CPU normalisation kicking in. We have Win32/SSE above threshhold now as well, and Win32/plain will reach it any time now (99 valid at 08:00 UTC)

With last server data dated 16th, looks like a bit of an interesting illustration going on there. Lowest pfc average, with n > 100, is indeed FGRPSSE with a value of ~10.6. Opencl nv seems to be ~144.

Now nv-OpenCL's is expected to be about 2x what it should be due to the mechanism normalising to 10% efficiency instead of the more realistic 5%... so picture the nv one as 'corrected' ~144/2 -> 72 (rough is good enough here)

CPU SSE has an approximate underclaim of 1.5^2 = 2.25x , so we take 10.6*2.25-> 23.85 'corrected' for the CPU case (again rough is better than uncorrected inputs)

So now we know the relative efficiencies of the implementations, a much tighter ~3x spread than the original uncorrected (noisy) numbers suggest. Right now credit is awarded based on the minimum pfc app, so about a third of what the GPU one would be 'asking'.

Intuitive eyeballs say the GPU population is going to be larger, by sheer throughput. The 'right' credit is in between the corrected CPU and GPU figures, weighted a fair bit to the GPU case. There's tools for determining that too, better than averages.

Net effect of the simplified/corrected/improper-assumption-removed mechanism would be an even higher quality (more trustworthy) number in between the CPU & GPU case, with a weighting bonus encouraging optimisation, and inherently rejecting likely fraudulent claims (another possible source of noise disturbances). So likely in the region of ~2x what win CPU SSE only validations would award now.

I'm surprised how well that correlates with the seti@home astropulse case, and it points the bone directly at the seti@home multibeam case for AVX enabled app underclaims with no peak flops correction.

Wow, we nailed this to the wall good and proper.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

Time for another inflation

Time for another inflation update.

There's a very clear discontinuity at midnight on 16 July - which is exactly when the second app_version (opencl-ati for Windows/64) reached a pfc_n of 101. Unfortunately, we don't have a third app on the cards for a while yet - cuda32-nv270 for Linux has been stuck at 75 for two days now.

Because I can only contribute NV for Windows (946 and counting), I've switched back to BRP4G Arecibo GPU, to check that nothing untoward has been happening while I've been concentrating on Perseus (it hasn't).

So, here's a question to ponder on, while we go into the Drupal migration next week, and then possibly some new apps to test:

Why has CreditNew picked something ~4,000 credits to stabilise on for Perseus tasks, and something ~2,000 credits for Arecibo tasks? That's a ratio of - in very rough trend terms - 2::1, when the runtimes are closer to 3::1 - close and steady in my own case, and similar on all the other hosts I've spot-checked (including other OSs and GPU platforms).

Is this perhaps more evidence that the ultimate credit rates area very largely determined by , where there are no complications from CPU apps to contend with? The figures for the two apps I'm comparing here are:

Arecibo 280,000,000,000,000
Perseus 450,000,000,000,000

ratio 1.6::1

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

zombie67 [MM]
zombie67 [MM]
Joined: 10 Oct 06
Posts: 73
Credit: 30924459
RAC: 0

RE: Just FYI, I had to take

Message 80222 in response to message 79982

Quote:
Just FYI, I had to take my two CUDA machines off albert for a while. I need to help a team mate at another project. I will be back.

This thread is too big to read through it all. I am back with my two GPUs. Is that still relevant? Or can I mess with my settings as I see fit?

Dublin, California
Team: SETI.USA

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

My Ubuntu C2D T8100 Laptop

My Ubuntu C2D T8100 Laptop has been crunching both Astrouplse_v7 and Gamma-ray pulsar search #3 v1.12 and (FGRPSSE) tasks at the same time,
the Astropulse tasks from the four app_versions initially were each estimated at sometime like one hundred and fifty hours, once their 100 validations were in, their estimates dropped to a value below reality,

All tasks for computer 68093

Application details for host 68093

With Gamma-ray pulsar search #3 v1.12 and (FGRPSSE) the same has happened, the task durations are also under estimated, meaning Boinc over fetches, and can't complete the tasks in time, (I think it under estimated from the start through),
now it's validations have passed 11, Boinc has a better gasp on how long these tasks take and hasn't fetched so many, and is slowly catching up again,
It's cache setting is set to about one day to one and a half days (It's remote from me at the moment)

All tasks for computer 10230

Application details for host 10230

Shouldn't the post 100 validation overall and pre 11 host app validation still be a bit conservative, and not cause over fetch?

Claggy

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

Progress here seems to have

Progress here seems to have ground to a halt, or is there still progress going on behind the scenes?

 

Claggy

Holmis
Holmis
Joined: 4 Jan 05
Posts: 89
Credit: 2104736
RAC: 0

I'm wondering the same

I'm wondering the same thing.
Maybe it's time to restart with a new thread in "Problems and Bug Reports" forum where it will be more visible than on the second page in "News & Blogs" with over 320 comments to it. 

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

Claggy wrote:Progress here

Message 80431 in response to message 80377

Claggy wrote:

Progress here seems to have ground to a halt, or is there still progress going on behind the scenes?

 

Claggy

 

Sorry for big delays getting back here, mostly work and Cuda7 related at the moment.

As per Seti GPU Users Group subform responses, no 'quick fix' bandaids, and the volume of background work is rather large and in desperate need of collation  (despite being on hiatus with other responsibilities, periodically I've cast eyes around for unexpected phenomena):  I'll likely be back in more frequent communications with the parties involved, once things have me spread less thin.  After that time arrives,, solution progress is a matter of communicating design options effectively, more than any kindof 'debugging'.

Quote:
Quote:
Jason.

 

What's the latest on the Albert at home NewCredit testing?

 

There hasn't been any posts at Albert at home in the Project server code update thread since Message 80379 on 23rd August.

 

Claggy

The latest there is a general plan of attack was agreed with Oliver and Bernd, which recognised Bernd's assertions that most of the scheduler code needs replacement outright, while accommodating Oliver's desire to take things in small manageable steps.

At present, I am relearning Matlab to construct simulations of the 'idealised' proposed fixes and model the existing system behaviour in a way that would match observation sufficiently.  This way I'll be able to compare existing, 'ideal' and practical alternatives in a way that can be communicated effectively to all involved.

 

Overall the plan is as below, which taken in entirety is a major software engineering endeavour. That's where the long hiatus's for reflection and analysis come in, and where communication becomes reduced.

 

I'd greatly appreciate if you could relay the above information.  There is a lot more documentation in the wings while I sort Cuda7 testing out, but the general agreement is that half-baked bandaids will not be much (or possibly any) better than what exists now, so the large project scope is the right one if taken carefully.

 

Note in the outline below, each of the sections x.y.1 are labelled 'Model' which represents simulations of the idealised, observed/current and practical alternatives for the given mechanism/subsystem.

and,

Quote:
Quote:
....If we mange to get the NewCredit fixes in, and Credit doubles we might be bouncing on the top limit.

 

Claggy

Also, idealised and practical models give a test for hard limits like that.  Those would change if the system says they are supposed to, depending on application efficiences.  More stable Credits would probably lest boom or bust though IMO, but hitting max credits would indicate either a system problem, or an incorrect limit, either way.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

Noticed today, looks as if

Noticed today, looks as if BRP Credit on all my Arm hosts doubled on tasks validated after around the 4th Nov,

And on my Linux C2D T8100 host, it's 2 Wu's completed with the Gravitational Wave S6 Folllow-up #1 v1.02app both got exactly 4,000 Credits,

that hints that they are on the top limit for that app.

Claggy

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

Claggy wrote:Noticed today,

Message 80453 in response to message 80448

Claggy wrote:

Noticed today, looks as if BRP Credit on all my Arm hosts doubled on tasks validated after around the 4th Nov,

Which begs the question, "Why?"

I wondered if perhaps some new app_version had been deployed for testing, and added a different PFC_avg into the mix - that seemed to be the point at which credit jumped, back in July. But no:

But then I looked at PFC_scale:

It looks as if the project deprecated the mainstream (Linux/Mac/Windows) apps on 4 November, leaving only ARM and Android active. I've only sampled it to 12-hour intervals, but there were only 7 tasks completed that day (according to pfc_n), so I'm inclined to say that was an abrupt recalculation of the scale, with no smoothing applied at all.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

That appears to fit the

That appears to fit the models, with Android/NEON becoming new credit normalisation reference.  Since Android client has SIMD Whetstone. while  x86 32/64 with  SSEx doesn't, the ~2.5x jump is expected, as it reverses the effective underclaim that had become reference (1).  The mechanism isn't right for sure ( e.g. no 'smoothing' as you mentioned), although  credits for those results should now commence chaotically orbiting the COBBLESTONE_SCALE*wu_est 'ideal' value.

Interestingly enough, This removal of x86 SIMD apps appears to have already demonstrated that we have a grip on what's going on, by emulating the CPU SIMD Flops correction step, first part of the above plan.  cool

 

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage