New App S6LV1

Along with the ongoing test related to OpenCL we will soon begin to test the setup for the next Einstein@Home Gravitational Wave search "S6LV1" (S6 data with "LineVeto", run #1). For now this will be a pure CPU App.

Comments

robertmiles
robertmiles
Joined: 16 Nov 11
Posts: 31
Credit: 4,468,368
RAC: 0

New App S6LV1

So, far, both of those workunits I've had so far gave a computation error near the end of the predicted runtime.

Server status suggests that no one else has completed one of those workunits successfully, either.

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

Thanks. Apparently the

Thanks. Apparently the checkpointing is broken, i.e. the "toplist" structure is broken after the App resumes from a checkpoint, the next "insert" then crashes the App.

Until this is fixed I suspended sending out more S6LV1 work.

If you want to finish the tasks already out there, avoid an App restart (set "leave App in Memory while suspended" to "yes", avoid quitting BOINC or ashutting down your computer).

BM

BM

robertmiles
robertmiles
Joined: 16 Nov 11
Posts: 31
Credit: 4,468,368
RAC: 0

I'll have to disable S6LV1 on

I'll have to disable S6LV1 on one of my computers, then. A certain part of its backup software needs to run every night with BOINC not running.

Upgrading to BOINC 7.0.2 might have fixed the problem requiring frequent reboots on a second computer. It will probably take a few more days to tell. For now, I'll let it keep trying to get S6LV1 workunits.

Gaurav Khanna
Gaurav Khanna
Joined: 8 Nov 04
Posts: 9
Credit: 2,818,895
RAC: 0

Hmm. All the S6 LineVeto work

Hmm. All the S6 LineVeto work units are crashing immediately for me:

http://albertathome.org/task/65949
http://albertathome.org/task/65791

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

Hi Gaurav! Could you stop

Message 78903 in response to message 78902

Hi Gaurav!

Could you stop BOINC and send me a init_data.xml file from a slot directory (e.g. per eMail, plain file, don't just copy&paste the text)?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

Checkpointing should work in

Message 78904 in response to message 78900

Checkpointing should work in the Apps version 1.01 published minutes ago. Testing of S6LV1 resumed.

BM

BM

robertmiles
robertmiles
Joined: 16 Nov 11
Posts: 31
Credit: 4,468,368
RAC: 0

RE: I'll have to disable

Message 78905 in response to message 78901

Quote:

I'll have to disable S6LV1 on one of my computers, then. A certain part of its backup software needs to run every night with BOINC not running.

Upgrading to BOINC 7.0.2 might have fixed the problem requiring frequent reboots on a second computer. It will probably take a few more days to tell. For now, I'll let it keep trying to get S6LV1 workunits.

I've now found that BOINC 7.0.2 has made the problem requiring frequent reboots on the second computer less frequent, but not fully eliminated it. S6LV1 still enabled there.

I've found a way to reduce the problem on the first computer to a few minutes every 24 hours without BOINC running and without it in memory, without rebooting Windows, but it requires staying up until 1 AM to start the backups for that computer manually. I'll check if that if good enough for the newest S6LV1.

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

Right now it's more important

Message 78906 in response to message 78905

Right now it's more important for us to learn whether the App now checkpoints and resumes correctly than to get completed results. You don't need to stay awake until 1AM for this to find out. Just stop BOINC (after running for >5min) and start it again. If the App resumes without crashing, it will continue to do so even after a reboot or whatever else may happen. If it crashes again when resuming, it's better to know that early than to waste more computing time.

BM

BM

robertmiles
robertmiles
Joined: 16 Nov 11
Posts: 31
Credit: 4,468,368
RAC: 0

OK, I'll try that when I get

OK, I'll try that when I get another S6LV1 workunit if there's no other workunit from another BOINC project with an especially long CPU time since the last checkpoint.

Currently, the XXL workunits from RNA World are about the worst for long times
between checkpoints.

robertmiles
robertmiles
Joined: 16 Nov 11
Posts: 31
Credit: 4,468,368
RAC: 0

Gravitational Wave S6

Gravitational Wave S6 LineVeto serch 1.01 (SSE2)
h1_0052.00_S6GC1__50_S6LV1A

Appeared to resume from checkpoint properly after I shut down BOINC for a minute (not left in memory).

Still running now.

Looks like time for a check of whether this had an effect on getting the right answers.

You might want to check the spelling of "serch", though.

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

FWIW we just got our first

FWIW we just got our first pair of results for WU #22235. Both matched and were found valid. None of these tasks was interrupted and resumed from checkpoint, though.

BM

BM

pragmatic prancing periodic problem child, left
pragmatic pranc...
Joined: 26 Jan 05
Posts: 153
Credit: 70,000
RAC: 0

May I point out a small typo,

May I point out a small typo, both in the apps page and the name of the application when showing in BOINC Manager?

It says "Gravitational Wave S6 LineVeto serch" and "Gravitational Wave S6 LineVeto serch 1.01 (SSE2)". "Search" is misspelled.

Am now running two of these beasts. Hopefully their will be adjusted at some point? As they sure don't run for the 6 hours and 45 minutes that they're estimated at. It's been almost 2 hours and progress is only at 7%.

tullio
tullio
Joined: 22 Jan 05
Posts: 53
Credit: 137,342
RAC: 0

After 1 hour it is a 3.550%.

After 1 hour it is a 3.550%. But it does not run in high priority, contrarily to the binary pulsar search.
Tullio

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

RE: May I point out a small

Message 78912 in response to message 78910

Quote:

May I point out a small typo, both in the apps page and the name of the application when showing in BOINC Manager?

It says "Gravitational Wave S6 LineVeto serch" and "Gravitational Wave S6 LineVeto serch 1.01 (SSE2)". "Search" is misspelled.

Thanks. Fixed in the DB, I don't know whether and when this propagates to the Client and then Manager.

Quote:
Hopefully their will be adjusted at some point? As they sure don't run for the 6 hours and 45 minutes that they're estimated at. It's been almost 2 hours and progress is only at 7%.

Actually we hope to get the App to live up to the speed / runtime we designed the workunits for. An important optimization that is in the S6Bucket App still doesn't work with code changes we had to make for S6LV1. We're working on that. The new server & client code should be able to adjust the runtime estimates with time, though.

BM

BM

pragmatic prancing periodic problem child, left
pragmatic pranc...
Joined: 26 Jan 05
Posts: 153
Credit: 70,000
RAC: 0

RE: Actually we hope to get

Message 78913 in response to message 78912

Quote:
Actually we hope to get the App to live up to the speed / runtime we designed the workunits for. An important optimization that is in the S6Bucket App still doesn't work with code changes we had to make for S6LV1. We're working on that. The new server & client code should be able to adjust the runtime estimates with time, though.


OK, that's fair.

In the mean time, it sped up a little. 20.562% for the one at 5h 20m 35s and 16.863% for the other at 4h 41m 38s. Hopefully they survive the trip as they have been suspended and resumed multiple times now.

tullio
tullio
Joined: 22 Jan 05
Posts: 53
Credit: 137,342
RAC: 0

Mine is now at 40.532% after

Mine is now at 40.532% after 12:30:42 hours and running OK.
Tullio

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

RE: Hopefully they survive

Message 78915 in response to message 78913

Quote:
Hopefully they survive the trip as they have been suspended and resumed multiple times now.

The previous error would make the app crash soon after resuming from a checkpoint. If this task was successfully resumed multiple times, there is nothing to worry about.

BM

BM

pragmatic prancing periodic problem child, left
pragmatic pranc...
Joined: 26 Jan 05
Posts: 153
Credit: 70,000
RAC: 0

The first one ended in 77K

The first one ended in 77K seconds run time, 70K CPU time.
http://albertathome.org/task/72454

Neil Polson
Neil Polson
Joined: 17 Dec 05
Posts: 2
Credit: 1,011
RAC: 0

Is there any reason why all

Is there any reason why all the S6 tasks have been cancelled?

EDIT: Just noticed you've released a new app! Problem with validating?

modesti
modesti
Joined: 4 Sep 07
Posts: 1
Credit: 4,133,345
RAC: 0

Hello, just discovered this

Hello, just discovered this thread while searching for an answer.

Have sent back 3 WUs of this kind, but all 3 mention "completed, can't validate" for both my wingman and me.
Here are the WUs:
27982: http://albertathome.org/workunit/27982
27981: http://albertathome.org/workunit/27981
27936: http://albertathome.org/workunit/27936

May it be, as Neil Polson already mentioned, that you have a problem with the validation?

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

On "errors" of the WUs you

Message 78919 in response to message 78918

On "errors" of the WUs you listed you'll see that these WUs have been cancelled, that's why the results say "can't validate".

For the latest App version we needed to change the workunits (command-line), too, so we cancelled the old ones as the new app couldn't run these properly.

The validation problems of S6LV1 appear to be the result of the previous checkpoint problem having been fixed incompletely. We're working on this.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

RE: The validation problems

Message 78920 in response to message 78919

Quote:
The validation problems of S6LV1 appear to be the result of the previous checkpoint problem having been fixed incompletely. We're working on this.

App versions 1.07 released yesterday are meant to fix this.

BM

BM

Christoph
Christoph
Joined: 25 Aug 05
Posts: 30
Credit: 208,211
RAC: 0

Hi, I dont't now if a cross

Hi,

I dont't now if a cross validation problem from 1.03 to 1.07 is of interest.
My first CPU invalid here: http://albertathome.org/task/115586

Christoph

Christoph

Gary Roberts
Gary Roberts
Joined: 9 Feb 05
Posts: 17
Credit: 85,000
RAC: 0

RE: I dont't now if a cross

Message 78922 in response to message 78921

Quote:
I dont't now if a cross validation problem from 1.03 to 1.07 is of interest.
My first CPU invalid here: http://albertathome.org/task/115586


If you look at the whole quorum rather than just your result, it doesn't appear to be just a cross version validation problem since it was initially a disagreement between two 1.03 tasks. Eventually, just one of the 1.03 tasks did validate with a 1.07 task.

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6,218,130
RAC: 0

We have identified a few

We have identified a few problems with validation in the S6LV1 App, but so far none of these concern cross-validation between App versions.

App versions 1.03 (and earlier) have a problem with checkpointing that only shows up in validation. If a task of such an App was interrupted and restarted from a checkpoint the result will ultimately be found invalid.

There is a numerical instability in one of the functions used in the new App code that needs to be fixed there, and there are a few oddities in the behavior of the current validator. Both issues are actively being worked on.

BM

BM

Christoph
Christoph
Joined: 25 Aug 05
Posts: 30
Credit: 208,211
RAC: 0

Ok, than it was that what hit

Message 78924 in response to message 78923

Ok, than it was that what hit me. I'm running other projects too.

Christoph

midomidi2013
midomidi2013
Joined: 15 May 14
Posts: 3
Credit: 0
RAC: 0

very nice reading that

very nice reading that