Difference between revisions of "Condor"
Line 37: | Line 37: | ||
The final summary tells us that at the time of writing, the pool contains 191 PCs, 78 of which has a user logged in, 109 are claimed by condor and 4 remain unclaimed. | The final summary tells us that at the time of writing, the pool contains 191 PCs, 78 of which has a user logged in, 109 are claimed by condor and 4 remain unclaimed. | ||
+ | |||
+ | Another view of the state-of-play is given by '''condor_q''': | ||
+ | |||
+ | <pre> | ||
+ | ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ||
+ | 51.0 ggdagw 4/22 13:21 0+01:30:07 R 0 3.7 test.bat | ||
+ | ... | ||
+ | ... | ||
+ | 51.15 ggdagw 4/22 13:21 0+01:33:46 I 0 3.4 test.bat | ||
+ | ... | ||
+ | ... | ||
+ | 150 jobs; 44 idle, 106 running, 0 held | ||
+ | </pre> | ||
+ | |||
+ | Here we see that the job 51.0 was submitted at 13:21 on the 22nd of April, has been running for just over an hour and a half, and that the job executable was called 'test.bat'. We can also see that job 51.15 is idling, rather than running. In total 106 of the 150 jobs submitted to condor are running, and accordingly 44 are still waiting to run and so are idle. | ||
=Submitting a simple script job= | =Submitting a simple script job= |
Revision as of 13:15, 22 April 2009
Condor: Making best use of the computers in the teaching labs
Introduction
The Condor Project enables us to run batch jobs on the desktop computers around the department, that would otherwise be standing idle. Condor is particularly useful for high 'throughput' computing, such as an ensemble of independent model simulations, used to evaluate explore parameter-space.
Basic commands
Our 'submission host' is condor.ggy.bris.ac.uk. If you login to that machine, you can review the status of all the machines in the 'condor pool' using the command condor_status:
Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@GEOG-B224.gg WINNT51 INTEL Owner Idle 0.000 1661 0+01:30:04 slot2@GEOG-B224.gg WINNT51 INTEL Owner Idle 0.010 1661 0+01:30:05 slot1@geog-a105.gg WINNT51 INTEL Claimed Busy 1.130 1661 0+01:02:57 slot2@geog-a105.gg WINNT51 INTEL Claimed Busy 1.130 1661 0+01:02:58 slot1@geog-c200.gg WINNT51 INTEL Unclaimed Idle 0.000 1662 0+00:00:04 slot2@geog-c200.gg WINNT51 INTEL Unclaimed Idle 0.040 1662 0+00:00:00 ... ... Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/WINNT51 191 78 109 4 0 0 0 Total 191 78 109 4 0 0 0
Typically you will get several screen's full of output, so I've chopped out the middle part of the listing, leaving just a few at the top and the final summary, given at the end.
From the listing, you can see that:
- The PC called GEOG-B224 has someone logged into it, indicated by the keyword Owner, but it is not working hard, as it is Idle.
- In contrast, geog-a105 is marked as Claimed, indicating that it has been grabbed by condor, and it working hard, Busy.
- The third possible state is exemplified by geog-c200, which is neither claimed nor in interactive use.
The final summary tells us that at the time of writing, the pool contains 191 PCs, 78 of which has a user logged in, 109 are claimed by condor and 4 remain unclaimed.
Another view of the state-of-play is given by condor_q:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 51.0 ggdagw 4/22 13:21 0+01:30:07 R 0 3.7 test.bat ... ... 51.15 ggdagw 4/22 13:21 0+01:33:46 I 0 3.4 test.bat ... ... 150 jobs; 44 idle, 106 running, 0 held
Here we see that the job 51.0 was submitted at 13:21 on the 22nd of April, has been running for just over an hour and a half, and that the job executable was called 'test.bat'. We can also see that job 51.15 is idling, rather than running. In total 106 of the 150 jobs submitted to condor are running, and accordingly 44 are still waiting to run and so are idle.