Project server code update

Submitted on 2 Jun 2014 8:59:18 UTC

The project will be taken down in about an hour to perform an update of the BOINC server code. Ideally you shouldn't notice anything, but usually the world isn't ideal. See you again on the other side.

Comments

Eyrie

Joined: 20 Feb 14

Posts: 48

Credit: 2410

RAC: 0

A few points. a) I think you

19 Jun 2014 8:44:53 UTC

Message 80098

Quote

(moderation:

)

A few points.

a) I think you corroborated my conclusion/gut feeling/code feeling that high APR/throughput leads to less credit (subject to noise)

b) The upwards drift is on the BRP4G app - we've not looked at CPU for some time. Don't know if Richard feels like more database wringing.

c) ... strike that - pour oil on the water not fan the flames.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

RE: b) The upwards drift is

19 Jun 2014 8:49:25 UTC

Message 80099 in response to message 80098

Quote

(moderation:

)

Quote:

b) The upwards drift is on the BRP4G app - we've not looked at CPU for some time. Don't know if Richard feels like more database wringing.

I'm keeping an eye on my CPU runs as the gaps slowly fill in, but not generating any new data. Haven't spotted any sign of a drift.

But I'm coming up to a run of resends on BRP4G, so new trend data will fill in quickly there.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

I've just validated 1,852.58

19 Jun 2014 9:37:55 UTC

Message 80100

Quote

(moderation:

)

I've just validated 1,852.58 against a TITAN, and 2,315.85 against a GeForce GTX 680. On the basis of last night's discussion, I should the junior partner in both cases.

It goes on going up. Graphs later.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: RE: b) The upwards

19 Jun 2014 11:21:53 UTC

Message 80101 in response to message 80099

Quote

(moderation:

)

Quote:

Quote:
b) The upwards drift is on the BRP4G app - we've not looked at CPU for some time. Don't know if Richard feels like more database wringing.

I'm keeping an eye on my CPU runs as the gaps slowly fill in, but not generating any new data. Haven't spotted any sign of a drift.

But I'm coming up to a run of resends on BRP4G, so new trend data will fill in quickly there.

With the CPU case my feeling is that if there was any drift, it would be over a much longer period. The actual throughputs are closer to estimated peaks. Since the estimated peak throughputs are incorrectly low figures though, the actual throughputs make for a higher ratio, levelling the overall scaling downward. Once the latest/slowest comers roll in (highest pfc's... highest APR/peak_flops), if not already, then those should indeed be relatively stable (even if scaled to the wrong value)

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: I've just validated

19 Jun 2014 11:27:44 UTC

Message 80102 in response to message 80100

Quote

(moderation:

)

Quote:

I've just validated 1,852.58 against a TITAN, and 2,315.85 against a GeForce GTX 680. On the basis of last night's discussion, I should the junior partner in both cases.

It goes on going up. Graphs later.

Guessing from Zombie's TITAN APR, might start to show signs of levelling at around 5-10x expected.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

RE: RE: Graphs later.

19 Jun 2014 11:48:59 UTC

Message 80103 in response to message 80102

Quote

(moderation:

)

Quote:

Quote:
Graphs later. (much later. I have to keep giving the monitor a rest)

Guessing from Zombie's TITAN APR, might start to show signs of levelling at around 5-10x expected.

In the meantime, try these two.

WU 620035
WU 606522

Same wingmate, validated within 80 minutes of each other. 450 (20%) difference in credit.

What's an Oland, anyway? Ah - it's slowed down. Maybe running 2-up now?

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: ... Maybe running 2-up

19 Jun 2014 14:10:24 UTC

Message 80104 in response to message 80103

Quote

(moderation:

)

Quote:

... Maybe running 2-up now?

Or the machine's being actually used :-O

Holmis

Joined: 4 Jan 05

Posts: 89

Credit: 2104736

RAC: 0

Here's some graphs from my

19 Jun 2014 14:15:35 UTC

Message 80105

Quote

(moderation:

)

Here's some graphs from my validated tasks and to keep the post shorter I'll post links to the pictures.

BRP4X64 - A bit unstable, varies around 50 credits/task Â±5 credits and with a few outliers.

BRP4G - Clear upward trend with no sign of coming back down. A few of the high outliers are late validations.

S6CasA (CPU only) - Not as unstable as BRPx64 but bigger difference between high and low, 270 - 330 credits/task. Low number of completed tasks so far.

BRP5 iGPU - The start of an upward trend with only 10 valid tasks so far.

BRP5 Nvidia - Big difference in high and low (20,59 - 8091,06) credits/task. Few completed tasks but might also be starting an upward trend.

And as before here's a link to the Excel document with all the data and graphs.

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

Thanks! The Credit graphs

19 Jun 2014 14:39:23 UTC

Message 80106 in response to message 80105

Quote

(moderation:

)

Thanks! The Credit graphs have me giggling :)

What I might do a bit later is have a look at the data and see what happens with Credit vs Runtime.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

OK, here's another graph for

19 Jun 2014 15:49:05 UTC

Message 80107

Quote

(moderation:

)

OK, here's another graph for the mix.

Just my own host, linear scale. The green markers are when the actual validation/credit award was made (or, to be pedantic, when the validating task was reported). That should be a more honest trendline - if you can see one.

For clarification - if I report first, you'll see first a red mark (to the left) for my reporting time - that's what we've been using up until now. But it'll be followed by a green mark, some time later, when the wingmate trails in.

If I report second, validation takes place near-enough instantly, and the green mark will superimpose over the red mark - you'll only see green.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

Yes, Looking through

19 Jun 2014 16:23:15 UTC

Message 80108 in response to message 80107

Quote

(moderation:

)

Yes,
Looking through homlis' BRP4G I see similar trends. Elapsed per credit seems to be more or less levelling there (with a lot of noise). There appear to be 3 main populations early on, and two later as new hosts converge. New hosts & apps come along all the time, so understanding the credit award as average, low or jackpot regime helped out.

The 3 populations being:
- new host/app validating with anything (including if self is new), assumes high APR to peak_flops ratio (high pfc, of 10% default), with both hosts being high tends toward lower credit, so the opposite of the generous intent as written in code comments. (depending if those comments meant credit as expected, or actually elapsed estimate room. which is not specified)
- validating with another low pfc, such as both running multiple tasks on the GPU, would yield 'middle' credit
- and low pfc to high pfc validation, the jackpot scenario.

Oh, and plotted elapsed/credit as a bubble graph in supplied sequence gives freaky fractal Boinc skeleton hand. Winning :D

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

Probably time for a stats

19 Jun 2014 16:32:41 UTC

Message 80109

Quote

(moderation:

)

Probably time for a stats show, then.

[pre] Jason Holmis Claggy Zombie ZombieM Jacob RH
Host: 11363 2267 9008 6490 6109 10320 5367
GTX 780 GTX 660 GT 650M TITAN 680MX FX3800M GTX 670

Credit for BRP4G, GPU
Maximum 1791.03 1982.00 10952.0 2794.94 11847.5 10952.0 4137.85
Minimum 115.82 88.84 153.90 91.50 94.88 508.73 1355.49
Average 1003.97 895.37 4105.99 1105.00 2009.54 2541.91 2040.89
Median 1168.97 981.15 3140.36 1144.04 1664.94 1570.34 1832.12
Std Dev 576.67 570.61 3015.77 424.29 1790.58 2873.96 495.61

nSamples 47 65 52 401 196 21 118

Runtime (seconds)
Maximum 5027.36 5088.99 11295.0 5383.71 23977.4 12149.5 4169.93
Minimum 3259.10 3294.83 8122.09 1908.26 1512.16 11515.5 4061.45
Average 3674.58 4558.79 8906.78 4191.69 4207.76 11652.6 4119.66
Median 3581.10 4818.30 8837.27 4224.43 4175.31 11617.4 4123.75
Std Dev 346.83 486.74 554.28 550.95 1947.62 131.33 20.85

Turnround (days)
Maximum 6.09 3.91 2.75 3.44 2.94 7.49 0.87
Minimum 0.15 0.07 0.14 0.05 1.52 0.18 0.15
Average 1.66 1.83 0.84 2.09 2.33 2.14 0.61
Median 1.46 1.85 0.78 2.00 2.45 1.02 0.65
Std Dev 1.64 1.01 0.63 0.61 0.32 2.27 0.15[/pre]

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Eyrie

Joined: 20 Feb 14

Posts: 48

Credit: 2410

RAC: 0

'bubble graph' what the hack

19 Jun 2014 16:34:58 UTC

Message 80110

Quote

(moderation:

)

'bubble graph' what the hack does a bubble graph graph?
yes i see differently sized bubbles. What does the bubble size stand for?

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: 'bubble graph' what the

19 Jun 2014 16:37:05 UTC

Message 80111 in response to message 80110

Quote

(moderation:

)

Quote:

'bubble graph' what the hack does a bubble graph graph?
yes i see differently sized bubbles. What does the bubble size stand for?

seconds elapsed per credit, so smaller is better paying.

That one's less for technical value, more for the fractal art competition anyway ;)

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: Probably time for a

19 Jun 2014 16:40:37 UTC

Message 80112 in response to message 80109

Quote

(moderation:

)

Quote:

Probably time for a stats show, then.

What happens, to the std deviations in particular, if you filter out the on-ramping converge attempt period, of say 10-20 results worth ?

Eyrie

Joined: 20 Feb 14

Posts: 48

Credit: 2410

RAC: 0

RE: RE: 'bubble graph'

19 Jun 2014 16:44:14 UTC

Message 80113 in response to message 80111

Quote

(moderation:

)

Quote:

Quote:
'bubble graph' what the hack does a bubble graph graph?
yes i see differently sized bubbles. What does the bubble size stand for?

seconds elapsed per credit, so smaller is better paying.

That one's less for technical value, more for the fractal art competition anyway ;)

Oh you are plotting against n not against N .....

sorry my maths genes need a serious liedown now.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: RE: RE: 'bubble

19 Jun 2014 16:52:31 UTC

Message 80114 in response to message 80113

Quote

(moderation:

)

Quote:

Quote:
Quote:
'bubble graph' what the hack does a bubble graph graph?
yes i see differently sized bubbles. What does the bubble size stand for?

seconds elapsed per credit, so smaller is better paying.

That one's less for technical value, more for the fractal art competition anyway ;)

Oh you are plotting against n not against N .....

sorry my maths genes need a serious liedown now.

*points Boinc bony fractal finger* everyone's an art critic :P

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

RE: RE: Probably time for

19 Jun 2014 16:58:59 UTC

Message 80115 in response to message 80112

Quote

(moderation:

)

Quote:

Quote:
Probably time for a stats show, then.

What happens, to the std deviations in particular, if you filter out the on-ramping converge attempt period, of say 10-20 results worth ?

I think it would be difficult to define an on-ramp in this case. Zombie, in particular, has been crunching for ages - I'm not sure how come we started the whole new convergence with the server code upgrade (I'd have expected all the runtime averages to have been in play for a long time, just disguised by the project's fixed credit policy). As you'll have been seeing, there hasn't been much tweaking in this area of code since it was first deployed four years ago.

What I would like to say to David is that, with fixed-size workunits, and a runtime standard deviation of 20 seconds in 4,000, I expect credit with an SD of 5 in 1000.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Eyrie

Joined: 20 Feb 14

Posts: 48

Credit: 2410

RAC: 0

sample size is a bit low for

19 Jun 2014 17:07:41 UTC

Message 80116

Quote

(moderation:

)

sample size is a bit low for good statistics but with THAT graph you can;t expect a reasonable SD.
better not try plotting credit against runtime - at least not when you expect the linear regression to have any sensible R^2 value...

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: RE: RE: Probably

19 Jun 2014 17:21:16 UTC

Message 80117 in response to message 80115

Quote

(moderation:

)

Quote:

Quote:
Quote:
Probably time for a stats show, then.

What happens, to the std deviations in particular, if you filter out the on-ramping converge attempt period, of say 10-20 results worth ?

I think it would be difficult to define an on-ramp in this case. Zombie, in particular, has been crunching for ages - I'm not sure how come we started the whole new convergence with the server code upgrade (I'd have expected all the runtime averages to have been in play for a long time, just disguised by the project's fixed credit policy). As you'll have been seeing, there hasn't been much tweaking in this area of code since it was first deployed four years ago.

What I would like to say to David is that, with fixed-size workunits, and a runtime standard deviation of 20 seconds in 4,000, I expect credit with an SD of 5 in 1000.

Hah! good idea, and there's the rub. Without damping those averages, won't happen.

I felt the need for number reset to observe what any new application would do when installed, point being that number of platforms coming on line is accelerating.

Your own (SD ~20 seconds -> variance 400 seconds -> 10%), then the same with your identical wingman. When averaging Murphy says he'll be 10% high and you'll be 10% low, so now it's 20%. Then both your host scales factor in some percentage you use your hosts for anything, and background tasks. That's a natural world input unfiltered/unconditioned (which is a no-no).

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: sample size is a bit

19 Jun 2014 17:36:17 UTC

Message 80118 in response to message 80116

Quote

(moderation:

)

Quote:

sample size is a bit low for good statistics but with THAT graph you can;t expect a reasonable SD.
better not try plotting credit against runtime - at least not when you expect the linear regression to have any sensible R^2 value...

Yeah, David's using 10 validations and 100 for his average sample sets, host and app-version scales. That's why the nyquist limit kicks in to create artefacts when scales are adjusted with each validation. The frequency of change is higher that the nyquist limit. Ideal would be continuous damped averages (controller), Still musing whether to that as separate pass 2, or combine it with the CPU coarse scale correction. Probably easier to monitor/analyse the effects if separate, so I'll keep going on those lines.

Off for a break, then back to more documentation for the patch passes. Hopefully skeletons will be ready for inspection soon.

nenym

Joined: 13 Jun 11

Posts: 15

Credit: 10001988

RAC: 0

Notices from ordinary

19 Jun 2014 20:49:31 UTC

Message 80119

Quote

(moderation:

)

Notices from ordinary cruncher.

Some observations I haven't seen be mentioned here (maybe my bad observations or trivial for you - experts):
- Run time of Intel_GPU apps depends on type of CPU apps crunched by CPU, especially AVX/FMA3 application has strong effect (PG LLR/AVX&AVX2&FMA3, Asteroids AVX on Haswell, Beal and MindModeling SSE2 too),
- CPU time of CPU apps depends on type of Intel_GPU app concurrently running, e.g. Collatz mini has nearly no effect, on the other side Einstein BRP and Seti AP apps can double the CPU time,
- Run time of CUDA apps depends on type of Intel_GPU app concurrently running (not sure if CUDAOpenCL and ATIOpenCL too) - GPU load of CUDA is the same,
- Run time of some types GPU apps can be strongly shortened by manipulation with CPU process priority (if priority of BRP process set to Realtime, Run time is half-length on Intel_GPU).

A bit OT, but......it's my point of view:
What can I see for the time being:
- a hard work and analysis ,
- David's RNG seems to be a fixed credit compared to the granted credit here for GPU apps.
No offense, but are you sure by chance to catch up that chaotic system, as a Boinc space is? It is really a great deal.
I see as a simplest way the fixed credit scheme for tasks of application, which length vary -+ 15 % on "standard" machine. Your work is hard and great, but what is the goal? If "fair" credit scheme for tasks of application using different app_ver, platform and plan_class (SSEx, AVXx, FMA3, GPU Intel, CUDA, OpenCL) with big vary of length....what is the fair credit? For the same reference WU the same credit independently on crunching machine (close to a fixed credit scheme), or credit depended on a "benchmark" (i.e. vary credit for the same reference WU), which is nonsense in the world of AVX/FMA3 and GPU hosts? What I see is effort to reach the benchmark asymptote.
In despite of my point of view my machines stay here and helps to find the way.

It is very interesting for me to look over your work and analysis as my job are chaotic systems, too. (to be clear partially predictable by Poisson/Binomic distribution, i.e. "without memory")

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

Thanks nenym, Yes some of

19 Jun 2014 22:10:34 UTC

Message 80120 in response to message 80119

Quote

(moderation:

)

Thanks nenym,

Yes some of the objectives will not be as clear here straight away, mostly because of the cosistency of the tasks, and until now it's been fixed credit. It's those same features here that make it a great sandbox to put the system under the microscope.

More detailed objectives & experimental procedure are being drafted, but very short overview is this:

Time estimates and credit as estimate of 'work':
- On projects with multiple application, like at Seti, there exists a mismatch between applications which can lead to hoarding and mass aborting in some cases to juggle work & cherry-pick. The intent written into the system basically says that should not happen as much as it does (about 2x discepancy)
- +/- 30%+ makes RAC or credit pretty useless (IMO and some others) for its purpose (to users) of comparing host to host, hardware to hardware, application to application etc
- The randomness can and does upset bringing new hosts online, in particular when estimates start really bad, such as when here starts and the GPUs hit time exceeded errors. That's bad juju for retaining new users, and possibly for application development
- It also appears that the current system penalises optimisation.

So yes, on one hand it's easy to regard the credit system as academic and not critical for the science, but on the other it becomes critical for time estimation, which is key to scheduling from server side right through to the client.

That covers why I feel it has to be addressed.

As for why have a scaling credit system at all ? In another direction, fixed estimates and credit make work for project developers (often with little funding) every time there is a major new application or platform. Something that will dial in automatically saves effort and money in the long run (like cruise-control)

For can such a chaotic system be corrected ? Yes, if you use control theory. Here is an example from CPU 'shorties' on a host from Seti-beta. Those tasks are all the same length so should really get similar credit etc. The smoother curve is one that used a simple 'PID controller' to replace the credit calculation. it was started deliberately off target so as to see how it settled. Note also that the CPU credit system there has a scale error, so for easy comparison the new smooth & correct line is divided by 3. (it would be up over 90 credits if not scaled down)

Here is the not scaled down version:

and here is a picture of a PID controller in mechanical form:

nenym

Joined: 13 Jun 11

Posts: 15

Credit: 10001988

RAC: 0

Thanks jason_gee for

20 Jun 2014 1:32:07 UTC

Message 80121 in response to message 80120

Quote

(moderation:

)

Thanks jason_gee for explanation.
I agree with you, a fixed credit scheme (and a "measured work done scheme" - rosetta, AQUA, Seti MB years ago) means a lot of complications for developers, on the other side is popular independently on target RAC.

Notes (theory):
Maybe D part of regulation (feedback) is too strong or R part too weak because I see the same waves as on a oscilloscope measuring of not well designed regulation with tendencies to vibration (for some frequencies) missing the asymptote. In that case I part has to chance to win. Good test for stability is repeated Dirac impulse, theory e.g. here.

I know that CreditNew works very well for some projects (WEP M+2, Seti AP ...), but these has units with close run times. On the other side on projects having very different times it seems to be RNG (LHC vary 1:3 for my i5-4570S).
Run time/ CPU time/ Credit
20,679.88/ 20,046.41/ 291.51 WU
346.51/ 336.59/ 1.84 WU

Notes (how to bestir/cheat the CreditNew):
a) in stable state (CPU time ~ Run time) start high CPU load by third application (e.g. GPU crunching, Autocad 3D rendering...), credit per CPU time rises
b) to make a notepad overclocking = false benchmark. It works for cca 10 tasks, then credit normalises. If tasks are long enough, cheated credit is noticeable. After 10 tasks it is necessary to crunch other project and to return back after time for 100 tasks, in that case the system "forgets".

Cherry-picking: where do you see source causation for it, on the project's side or on the cruncher's side?

Current system penalises optimisation: it's what I really hate on CreditNew (and benchmark CreditOld too). I lived in "socialist" country for 34 years and be sure we don't have the same stomachs. If I work hard and lot (optimised AVX/FMA3 apps) I eat more and must visit fitness, chiropractor etc.. (pay more for electricity and CPU+MOBO and PSU and cooling system) and I anticipate more salary (more credit). That is what David Lenin Anderson misunderstood.

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

Yep, the same concepts of

20 Jun 2014 3:39:10 UTC

Message 80122 in response to message 80121

Quote

(moderation:

)

Yep, the same concepts of feedback, slew rates, over-undershoot, and damping/oscillation apply the same.

On the cherry-picking, I think it takes all sorts to make the world work, and some would consider that cheating, and others a useful tool. I tend to think it's not that black and white, and that exploits usually point to deeper design flaws. As those flaws affect other things, then they should be fixed for those reasons, and the cherry-pickers can move to look for more flaws ;)

Yeah with proper scaling & normalisation, some things become obvious that are not so obvious when buried in noise. Some of that can be exploits (and so design or implementation flaws), and some can be legitimate special situations. The Jackpot situation found here was an unexpected weakness, and something that will have to be examined closely as the system is stabilised.

Thanks for the input, I appreciate bouncing the ideas back and forwards a lot.

Holmis

Joined: 4 Jan 05

Posts: 89

Credit: 2104736

RAC: 0

RE: Intel GPU: RE:

20 Jun 2014 7:09:38 UTC

Message 80123 in response to message 79947

Quote

(moderation:

)

Quote:

Intel GPU:

Quote:
581007031069.074340
BRP5-opencl-intel_gpu

That's 581 GFlops! Boinc reports it @ 147 GFlops peak in the startup messages.

Another follow up although it's been extensively discussed already.

11 BRP5 Intel GPU tasks has now been validated and the APR has been calculated to 10.78 GFlops running one tasks at a time. The initial estimate was that the iGPU was 53,9 times faster than actual. The peak value reported by Boinc in the startup messages is 13,6 times faster and if that had been used then the tasks would have finished without me having to increase the rsc_fpops_bound value to avoid Boinc aborting the tasks with "maximum time limit exceeded".

nenym

Joined: 13 Jun 11

Posts: 15

Credit: 10001988

RAC: 0

Additional

20 Jun 2014 9:29:19 UTC

Message 80124 in response to message 80122

Quote

(moderation:

)

Additional notes.

Cherry-pickig.
It is very difficult to prevent it selectively, as it can be done not by aborting only, also by killing process via taskmanager. And aborting could be really cherry-picking or:
- missed deadlines (could be sorted),
- unexpected reasons (HW fault of host...),
- end of a challenge (Pentathlon, PG challenge series - PG explicitly asks for aborting unneeded tasks),
- overestimated fpops and consequential preventing of a panic mode...
Lowering of the daily quota (according to actual formula N=N-n_of_errored_tasks) is not enough for preventing, because it can be simply passed by time-to-time finished work.
It is mission impossible by my POW.

Regulation process.
Reaching the asymptote can be accelerated using granted credit boundaries (independetely on rsc_fpops_est) on a validator side for a sort/batch of WUs if it makes sense. Yes, it is additional work for developers and administrators, could be partially automated using "reference" machine. On the other side - it can be a wrong way theoretically because of two regulation parameters of the same quantity.
I feel you did not implicate credit bounds to see design or implementation flaws in raw algorithm.

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: Additional

20 Jun 2014 17:47:00 UTC

Message 80125 in response to message 80124

Quote

(moderation:

)

Quote:

Additional notes.

Cherry-pickig.
It is very difficult to prevent it selectively, as it can be done not by aborting only, also by killing process via taskmanager. And aborting could be really cherry-picking or:
- missed deadlines (could be sorted),
- unexpected reasons (HW fault of host...),
- end of a challenge (Pentathlon, PG challenge series - PG explicitly asks for aborting unneeded tasks),
- overestimated fpops and consequential preventing of a panic mode...
Lowering of the daily quota (according to actual formula N=N-n_of_errored_tasks) is not enough for preventing, because it can be simply passed by time-to-time finished work.
It is mission impossible by my POW.

Regulation process.
Reaching the asymptote can be accelerated using granted credit boundaries (independetely on rsc_fpops_est) on a validator side for a sort/batch of WUs if it makes sense. Yes, it is additional work for developers and administrators, could be partially automated using "reference" machine. On the other side - it can be a wrong way theoretically because of two regulation parameters of the same quantity.
I feel you did not implicate credit bounds to see design or implementation flaws in raw algorithm.

Yep there are fixed bounds in play, and the 'safeties' are being tripped as new apps come online, actually introducing more problems ( like a car airbag that goes off at the slightest bump, and then has a spike mounted on the steering wheel behind it).

Part of the cause of that is gross scaling error for the onramp period. Further looks & fix decisions at where bounds should really be set are needed after gross scaling errors in CPU and GPU are improved. It appears that they may be too tight even with good initial scaling, because of diversity in applications and how people use their machines ... Basically the system seems to assume dedicated crunching to some degree, which is quite a false assumption. Then you have the multiple tasks per GPU situation, which is not factored in anywhere.

The reason some of these weaknesses are known is in part that I experienced none of the client side failsafes, because I use a modified client where I widened them. I could see the scales hard-lock to max limits and still give estimates too short. This means the initial server side scale being applied was way out of whack in comparison to the estimate of GPU speed.

It turns out one assumption, that GPUs are 10% of their peakflops, is commented in the server code as being 'generous'. In fact even for the common case of 1 GPU task per GPU this is not generous, but results in estimates that are too short, and when combined with the initial app scalings result in estimates divided by ~300-1000x... (common scaling case 0.05x0.05=0.0025 -> 1/400) before considering the multiple task per GPU case. Double application of scales that overlap there is also a problem easily remedied.

SO that's how both the initial scalings, stability, and possibly overtight failsafe margins interact, and addressign the first two issues should give a clear indication of where the third should be adjusted into safe limits.

tullio

Joined: 22 Jan 05

Posts: 53

Credit: 137342

RAC: 0

A gamma-ray unit ended with

21 Jun 2014 6:15:38 UTC

Message 80126

Quote

(moderation:

)

A gamma-ray unit ended with "Elapsed time exceeded". But the estimated time to completion was only 2 hours, compared to 76 hours in Einsten@home.
Tullio

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

I've been away over the

23 Jun 2014 10:55:06 UTC

Message 80127

Quote

(moderation:

)

I've been away over the weekend, and I won't have time to retrieve the data for the 'validation time' graph until much later this evening.

But here's the logarithmic scale graph, updated and extended. I think we can see that the 'upper mode' has a distinct, converging, trendline too - but there's still 300-400 credits difference between the upper and lower modes.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

Has the median for your

23 Jun 2014 19:24:22 UTC

Message 80128 in response to message 80127

Quote

(moderation:

)

Has the median for your results there, restricted to the seoncd half of the month, settled to around 2340 ? If so, then the global (app_version) pfc_scale for the GPU app has settled roughly where expected. That indicates the GPU portion is not normalised againt a CPU version, and that 300-400 variation is likely the remaining host_scale and averaging instabilities.

Without the CPU app ~2.25x (down)scaling attractor, I feel the level is reasonable/correct, though (imo) a month far too long to dial in a new app version, from too wide of a start, and that the noise is unnecessary & induced.

extra notes & predictions before going forward:
In this context (CPU app free, critically *AFTER* app_version scale has converged), the *correct* claim will be the lower of two as David had implemented, and the averaging between the two claims adding noise. With damping of the scales the awards would become a relatively smooth curve. That final wingman average (IIRC Eric added) is critically important elsewhere to mitigate the CPU downscaling error induced underclaim. With that the upward spread is overclaim by the high pfc hosts.

Summarising, we should be able to confidently address the coarse scaling errors on both CPU and initial GPU, which will speed new application and new host convergence. We should be able to remove the bulk of the noise and remove the sensitivity to the initial project estimate, which combined will address all of the main concerns people have with the current implementation from user and project perspectives, so once the weather settles a bit here in Oz (and I've cleaned up the royal mess in the yard) time to roll up the sleeves & get patching.

At this point I doubt any new major observations will jump out, though I'd like to keep an eye on things while code-digging in the background. Unstable is as unstable does, and things can jump out and surprise us, though since that data appears to characterise all the known issues well, then IMO we have a good baseline to improve on.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

RE: Has the median for your

23 Jun 2014 20:33:47 UTC

Message 80129 in response to message 80128

Quote

(moderation:

)

Quote:

Has the median for your results there, restricted to the seoncd half of the month, settled to around 2340 ? If so, then the global (app_version) pfc_scale for the GPU app has settled roughly where expected. That indicates the GPU portion is not normalised againt a CPU version, and that 300-400 variation is likely the remaining host_scale and averaging instabilities.

The median for the whole population for host 5367 (nSamples = 250) is 1880.63 (mean 2030.77): if you'd like to give me your formula for "second half of the month" - by reporting or validation date? - I can pull that out, but it doesn't look as if it would be as high as 2340.

Here's the current linear graph by validation time.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

No that's OK Thanks. When

23 Jun 2014 20:46:51 UTC

Message 80130 in response to message 80129

Quote

(moderation:

)

No that's OK Thanks. When you're dealing with control systems it's the intuition that counts, as control is a subjective experience based thing. With what we have We CAN already say there is apparent convergence (after a long time) which is good enough for computers, but not all that crash hot for human perception/intuition.

It's quite acceptable for the mechanism to be applying some small offset either way, to cover some little understood phenomena... that's why a cointrol system and not a fixed knob.

As we're dealing with human concepts, The keys will be improvement in the convergence and noise, which are the key things that are failing projects and users for this GPU only app example.

{Edit:] if we get that 'right', plus the CPU coarse scaling issues, then the projects that cross-normalise should also see stabilisation (intuitive level)

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

Updating both graphs, to show

25 Jun 2014 13:43:54 UTC

Message 80131

Quote

(moderation:

)

Updating both graphs, to show a new effect.

This morning, I was asked to change the running configuration on my host 5367, for an unrelated reason. As a result, the maximum runtime for these tasks went up from 4137.85 seconds to 4591.35 - nearly 11%.

The first task back after that - before APR had a chance to respond, obviously - is the high outlier at 2474.34

I think that's further evidence of the kind of instability we need to cure.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: I think that's further

25 Jun 2014 22:43:56 UTC

Message 80132 in response to message 80131

Quote

(moderation:

)

Quote:

I think that's further evidence of the kind of instability we need to cure.

Yes, local estimates need to be responsive to running conditions. It's unfortunate that the existing mechanism for that was disabled instead of completed/fixed.

Claggy

Joined: 29 Dec 06

Posts: 122

Credit: 4040969

RAC: 0

RE: RE: I think that's

25 Jun 2014 22:55:48 UTC

Message 80133 in response to message 80132

Quote

(moderation:

)

Quote:

Quote:

I think that's further evidence of the kind of instability we need to cure.

Yes, local estimates need to be responsive to running conditions. It's unfortunate that the existing mechanism for that was disabled instead of completed/fixed.

Seti Beta deployed the Blunkit based Optimised AP v7 yesterday, there the estimates are the other way round,
my Ubuntu 12.04 C2D T8100 took ~12 hours on it's first Wu, shame the estimates start at ~228 hours.

Claggy

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: RE: RE: I think

25 Jun 2014 23:02:38 UTC

Message 80134 in response to message 80133

Quote

(moderation:

)

Quote:

Quote:
Quote:

I think that's further evidence of the kind of instability we need to cure.

Yes, local estimates need to be responsive to running conditions. It's unfortunate that the existing mechanism for that was disabled instead of completed/fixed.

Seti Beta deployed the Blunkit based Optimised AP v7 yesterday, there the estimates are the other way round,
my Ubuntu 12.04 C2D T8100 took ~12 hours on it's first Wu, shame the estimates start at ~228 hours.

Claggy

LoL, do the results mix with traditional non-blankit versions ? That's going to mess with cross app normalisation bigtime.

Claggy

Joined: 29 Dec 06

Posts: 122

Credit: 4040969

RAC: 0

RE: RE: RE: RE: I

25 Jun 2014 23:39:09 UTC

Message 80135 in response to message 80134

Quote

(moderation:

)

Quote:

Quote:
Quote:
Quote:

I think that's further evidence of the kind of instability we need to cure.

Yes, local estimates need to be responsive to running conditions. It's unfortunate that the existing mechanism for that was disabled instead of completed/fixed.

Seti Beta deployed the Blunkit based Optimised AP v7 yesterday, there the estimates are the other way round,
my Ubuntu 12.04 C2D T8100 took ~12 hours on it's first Wu, shame the estimates start at ~228 hours.

Claggy

LoL, do the results mix with traditional non-blankit versions ? That's going to mess with cross app normalisation bigtime.

AP v6 should only mix with AP v6, and AP v7 should only mix with AP v7.

Even better, the SSE2 app is Optimised, the SSE app in non-Optimised, the difference in runtimes is going to be huge, I see carnage ahead.

Claggy

Eyrie

Joined: 20 Feb 14

Posts: 48

Credit: 2410

RAC: 0

RE: RE: RE: RE: Quote

26 Jun 2014 9:04:47 UTC

Message 80136 in response to message 80135

Quote

(moderation:

)

Quote:

Quote:
Quote:
Quote:
Quote:

I think that's further evidence of the kind of instability we need to cure.

Yes, local estimates need to be responsive to running conditions. It's unfortunate that the existing mechanism for that was disabled instead of completed/fixed.

Seti Beta deployed the Blunkit based Optimised AP v7 yesterday, there the estimates are the other way round,
my Ubuntu 12.04 C2D T8100 took ~12 hours on it's first Wu, shame the estimates start at ~228 hours.

Claggy

LoL, do the results mix with traditional non-blankit versions ? That's going to mess with cross app normalisation bigtime.

AP v6 should only mix with AP v6, and AP v7 should only mix with AP v7.

Even better, the SSE2 app is Optimised, the SSE app in non-Optimised, the difference in runtimes is going to be huge, I see carnage ahead.

Claggy

IOW credit for AP should drop. That's one way to get a bit more of cross app balance ;D

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Claggy

Joined: 29 Dec 06

Posts: 122

Credit: 4040969

RAC: 0

A lot of my Gamma-ray pulsar

26 Jun 2014 9:10:30 UTC

Message 80137

Quote

(moderation:

)

A lot of my Gamma-ray pulsar search #3 v1.11 results are coming out as inconclusive, in each case they are matched with an intel GPU, and in each case that intel GPU is running OpenCL 1.1 drivers, shouldn't that app be restricted to Intel GPUs with OpenCL 1.2 drivers?

Validation inconclusive Gamma-ray pulsar search #3 tasks for computer 8143

Claggy

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

OK, the effect of my

26 Jun 2014 17:17:20 UTC

Message 80138

Quote

(moderation:

)

OK, the effect of my configuration change continues and is even clearer. I simply changed the nature (but not the number) of the tasks running on the CPU while this BRP4G test was running on the GPU.

Here are the runtime stats of the two runs (Maximum / Minimum / Average / Median / Std Dev / nSamples):

[pre](before) (after)
4191.43 5034.97
4061.45 4417.27
4128.11 4707.30
4127.66 4668.20
20.45 181.84
339 43[/pre]

and the corresponding graph

I'm told some new hosts are coming online, so that we can watch and examine the "new host / stable (!) project" scenario in detail. I'll add them to the graphs - probably replacing the old hosts on the log graph, since none of them are returning much data now - as soon as I see successful BRP4G tasks coming back in.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

Once you're happy with that,

26 Jun 2014 23:41:32 UTC

Message 80139 in response to message 80138

Quote

(moderation:

)

Once you're happy with that, there are other ways to simulate 'perfectly normal running conditions' that may induce similar divergent behaviour (or worse). One would be to downclock the GPU while Boinc's running (simulating a lower power state, driver timeout/failsafe, deliberate underclock, extended use of the GPU without suspending Boinc etc ...) I think a key takeaway is that the mechanism isn't really adaptive to reasonably normal variable running conditions.

juan BFB

Joined: 10 Dec 12

Posts: 8

Credit: 1674320

RAC: 0

Starting to crunch with my

27 Jun 2014 0:14:34 UTC

Message 80140

Quote

(moderation:

)

Starting to crunch with my hosts. I compare the firsts crunched WUÂ´s against the allready validated by jasonÂ´s 780SC my 780FTW host is aparently crunching the BRP4G WU allmost 20% faster but itÂ´s receiving 2-3x more credit. IÂ´m running 1 WU at a time on each GPU only. Theoricaly i expect similar credit or i miss something?

https://albertathome.org/task/1514929
https://albertathome.org/task/1515731

jason_gee

Joined: 4 Jun 14

Posts: 111

Credit: 1043639

RAC: 0

RE: Starting to crunch with

27 Jun 2014 0:35:38 UTC

Message 80141 in response to message 80140

Quote

(moderation:

)

Quote:

Starting to crunch with my hosts. I compare the firsts crunched WUÂ´s against the allready validated by jasonÂ´s 780SC my 780FTW host is aparently crunching the BRP4G WU allmost 20% faster but itÂ´s receiving 2-3x more credit. IÂ´m running 1 WU at a time on each GPU only. Theoricaly i expect similar credit or i miss something?

https://albertathome.org/task/1514929
https://albertathome.org/task/1515731

Yes, existing CreditNew (no mods yet) with new app+host in all its glory. One of the big parts we're studying, because of its importance to keeping new users and applications or devices coming on-line.

That's the onramp period as the system tries to establish how fast you're crunching. It doesn't do it very well, but at least you're getting high credit and giving me some, Thanks! :P

You will be crunching faster than me because I'm doing lots of stuff with my machine lately, and haven't tweaked anything.... also I only have an old Core2Duo driving it.

Eyrie

Joined: 20 Feb 14

Posts: 48

Credit: 2410

RAC: 0

Right, so Richard's were sort

27 Jun 2014 8:31:58 UTC

Message 80142

Quote

(moderation:

)

Right, so Richard's were sort of converging but are all over the place now.

For Juan, I prefer to wait that Richard has done all the hard work and produced a graph :) [Thanks Richard, really appreciated]

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

RE: Right, so Richard's

27 Jun 2014 11:54:12 UTC

Message 80143 in response to message 80142

Quote

(moderation:

)

Quote:

Right, so Richard's were sort of converging but are all over the place now.

For Juan, I prefer to wait that Richard has done all the hard work and produced a graph :) [Thanks Richard, really appreciated]

Two more for your viewing pleasure.

I've started to take out the older hosts, which are returning very few tasks these days, but they served their purpose. Red is now Juan's 10351 (the one he linked two tasks from) - classic view for a new host.

And this is mine, still showing scatter from the new configuration. We'll have to wait a few days before Juan will fit on the same scale (although he validated a couple of my oldies overnight - thank you). I'll keep the configuration stable until sunday night/monday morning, but I'll have to flip back then - I have some held tasks with deadlines.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

juan BFB

Joined: 10 Dec 12

Posts: 8

Credit: 1674320

RAC: 0

Nice graph Richard, maybe you

27 Jun 2014 12:31:22 UTC

Message 80144

Quote

(moderation:

)

Nice graph Richard, maybe you could consider to add one of my 2x690 hosts 10512 or 10352 since there are no 690 on the graph and they produce a lot of WU.

Now i understand what you all are talking about new hosts, their RAC oscilate a lot and converge to a relative stable range (1.5-2.5 K) no matter the GPU or the host used.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

RE: Nice graph Richard,

27 Jun 2014 12:42:51 UTC

Message 80145 in response to message 80144

Quote

(moderation:

)

Quote:

Nice graph Richard, maybe you could consider to add one of my 2x690 hosts 10512 or 10352 since there are no 690 on the graph and they produce a lot of WU.

Now i understand what you all are talking about new hosts, their RAC oscilate a lot and converge to a relative stable range (1.5-2.5 K) no matter the GPU or the host used.

Yes, I'm planning to refresh the graph with new hosts, and they might be suitable.

What is most helpful is finding hosts with a nice, steady, continuous flow of data, and as little variation as possible in the running conditions (so that any noise in the credit granted can be attributed to external causes).

The sheer number of tasks pushed through isn't particularly important, but the consistency is. It didn't help that Zombie took the two hosts I'd picked off to another project (he's still running other hosts - they crop up in my wingmate lists from time to time), and Mikey leaving the project because it isn't exporting public stats would rule him out.

It's quite time-consuming to switch things over, so bear with me - for the time being at least, old results aren't being deleted here, so there's no rush.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

juan BFB

Joined: 10 Dec 12

Posts: 8

Credit: 1674320

RAC: 0

RE: What is most helpful is

27 Jun 2014 13:18:40 UTC

Message 80146 in response to message 80145

Quote

(moderation:

)

Quote:

What is most helpful is finding hosts with a nice, steady, continuous flow of data, and as little variation as possible in the running conditions (so that any noise in the credit granted can be attributed to external causes).

If you kave some time choose any one of my hosts (or more than one if you wish) and tell me, i will leave the host continuously crunching only Albert for a week or more if needed, and since them are running 24/7 with allmost no other apps running, they could give you some of the continuous flow of data you are looking for. If i could, i wish to help all i can to finaly fix the creditscrew problem.

Snow Crash

Joined: 11 Aug 13

Posts: 10

Credit: 5011603

RAC: 0

RH - Please let me know if it

27 Jun 2014 16:51:24 UTC

Message 80147 in response to message 80146

Quote

(moderation:

)

RH - Please let me know if it would be more helpful to simply switch my 7950 from BRP5 to BRP4 or to "remove project" / "add project" (presumably that would create a new host and therefore start credit calcs fresh). Also, is it easier for you if I only run 1 WU at a time?