-
Notifications
You must be signed in to change notification settings - Fork 19
Description
The very first observation of this was during some activity on a Decision Engine (DE) instance that talks to an ITB factory (798, running GlideinWMS 3.10.5-1). The glideins were being requested even though the job that was submitted by the DE had completed. Verified to ensure that the requests were not coming from either the DE client in question or some other clients because of jobs being in the respective job queues. A request coming from the client includes two numbers: ReqMaxGlideins and ReqIdleGlideins which are of interest to understand the underlying behavior. Upon further investigation of the glideclient and glidefactoryclient classads, it was found that:
- When jobs were submitted and were present in the DE queue —
condor_qshows jobs in running and idle state (5 processes submitted and each one has a sleep for 10 minutes):
# From the glideclient classad:
[root@factoryhost ~]# condor_status -any 950589_ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service@declient.de_test -l | grep Req
WARNING: GSI authentication is enabled by your security configuration! GSI is no longer supported.
For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/
GlideinParamGLIDECLIENT_ReqNode = "factoryhost.fnal.gov"
ReqEncIdentity = "4a098979b299b9c2cacf843ac6d3e4a4a34384b1f24b0189e4d2e7ec6d52d5b2dee40e62c3942db795ea32f53be8d9c4"
ReqEncKeyCode = "64e12146066e3efbe3480fe726ecc4cd0fd196cd712640c7c1cf96a87b56dbdb8ec6c35b6f28c68f551ea639b2642532240422397730765e95bebfb8e0e843bfe0b964aac909c31f5365e586f6d2aef3ea93bebe9d2f9abba786c0bef344484c6e128c06b881a1d31e31a3c01bf782780aaf52afa7c02238c379fb32b7f8a35dd11ae7b534a03f7b689bdf795d5be339457a77555fb75998d838524d0203268e0400d861b1a00bcffd3881fe76ddba3a9e864b37618957ef87f052bac6aeda07ff445bc7af791ed921a237c9859120125c69b7613e9c0fa462c7f4649757e05f7e8cdd2508bc06059aace4642ae3bb1aa40db54d34ae2496104383f26d5ac124"
ReqGlidein = "ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service"
ReqIdleGlideins = 1
ReqMaxGlideins = 6
# From the glidefactoryclient classad:
[root@factoryhost ~]# condor_status -any ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service@declient.de_test -l | grep Req
WARNING: GSI authentication is enabled by your security configuration! GSI is no longer supported.
For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/
GlideinMonitorRequestedIdle = 1
GlideinMonitorRequestedIdleCores = 1
GlideinMonitorRequestedMaxCores = 6
GlideinMonitorRequestedMaxGlideins = 6
GlideinMonitorTotalRequestedIdle = 1
GlideinMonitorTotalRequestedIdleCores = 1
GlideinMonitorTotalRequestedMaxCores = 6
GlideinMonitorTotalRequestedMaxGlideins = 6
- When there were 2 completed jobs and 3 were in running state in the DE:
# From the glideclient classad:
[root@factoryhost ~]# condor_status -any 950589_ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service@declient.de_test -l | grep Req
WARNING: GSI authentication is enabled by your security configuration! GSI is no longer supported.
For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/
GlideinParamGLIDECLIENT_ReqNode = "factoryhost.fnal.gov"
ReqEncIdentity = "f9ecce3a2fde6d39f57da0198e4dac73b57affcf92d0671b21a34959c1b005bc6c92d2300ff4998558b027365e18dbe0"
ReqEncKeyCode = "8d86570f2f6b737b03798f7fcb053df7f3d8a755c91620e87073cbba80013ed2e244c870ff16ac3482bcc1f3f625119e8f16d9679317a52f98108d9e7987ce3b75b428603117e215c463f128206011110ab109ef1e6edab90dec833eb3cb9e10b8618e547eadb50d3381f49860b04acb912c3ba574ed4b4e160f103c30dee8e9d31c1a5c6e5f07e88e856c905519a574cf169ef0bdf9e2359088f3361562c04259e77064c8b5516813c793b69c06531e78dd9f79f26f36fac2acb1fa1b4d3386be96c42594aae20d9168822d2111d9ac023e9deffab625139289f546b881f1f56519c87966f95fe64436ad5f20c8eaf5a687e0b38cd1806f53cd76283cf1a1eb"
ReqGlidein = "ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service"
ReqIdleGlideins = 1
ReqMaxGlideins = 2
# From the glidefactoryclient classad:
[root@factoryhost ~]# condor_status -any ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service@declient.de_test -l | grep Req
WARNING: GSI authentication is enabled by your security configuration! GSI is no longer supported.
For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/
GlideinMonitorRequestedIdle = 1
GlideinMonitorRequestedIdleCores = 1
GlideinMonitorRequestedMaxCores = 2
GlideinMonitorRequestedMaxGlideins = 2
GlideinMonitorTotalRequestedIdle = 1
GlideinMonitorTotalRequestedIdleCores = 1
GlideinMonitorTotalRequestedMaxCores = 2
GlideinMonitorTotalRequestedMaxGlideins = 2
- When submitted jobs in the DE completed — DE queue was empty upon doing a
condor_q:
# From the glideclient classad:
[root@factoryhost ~]# condor_status -any 950589_ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service@declient.de_test -l | grep Req
WARNING: GSI authentication is enabled by your security configuration! GSI is no longer supported.
For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/
GlideinParamGLIDECLIENT_ReqNode = "factoryhost.fnal.gov"
ReqEncIdentity = "f9ecce3a2fde6d39f57da0198e4dac73b57affcf92d0671b21a34959c1b005bc6c92d2300ff4998558b027365e18dbe0"
ReqEncKeyCode = "8d86570f2f6b737b03798f7fcb053df7f3d8a755c91620e87073cbba80013ed2e244c870ff16ac3482bcc1f3f625119e8f16d9679317a52f98108d9e7987ce3b75b428603117e215c463f128206011110ab109ef1e6edab90dec833eb3cb9e10b8618e547eadb50d3381f49860b04acb912c3ba574ed4b4e160f103c30dee8e9d31c1a5c6e5f07e88e856c905519a574cf169ef0bdf9e2359088f3361562c04259e77064c8b5516813c793b69c06531e78dd9f79f26f36fac2acb1fa1b4d3386be96c42594aae20d9168822d2111d9ac023e9deffab625139289f546b881f1f56519c87966f95fe64436ad5f20c8eaf5a687e0b38cd1806f53cd76283cf1a1eb"
ReqGlidein = "ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service"
ReqIdleGlideins = 1
ReqMaxGlideins = 2
# From the glidefactoryclient classad:
[root@factoryhost ~]# condor_status -any ITB_CE_EL9_SciToken@gfactory_instance@gfactory_service@declient.de_test -l | grep Req
WARNING: GSI authentication is enabled by your security configuration! GSI is no longer supported.
For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/
GlideinMonitorRequestedIdle = 1
GlideinMonitorRequestedIdleCores = 1
GlideinMonitorRequestedMaxCores = 2
GlideinMonitorRequestedMaxGlideins = 2
GlideinMonitorTotalRequestedIdle = 1
GlideinMonitorTotalRequestedIdleCores = 1
GlideinMonitorTotalRequestedMaxCores = 2
GlideinMonitorTotalRequestedMaxGlideins = 2
After excessively requesting glideins, at some point, the glideclient classad vanishes from the factory after which no more glideins are requested in the factory. Since this classad vanishes after its expiration, glideins not being requested makes sense since the classad is no longer present.
This very same behavior of glideins being requested even though the DE job queue is empty was also observed in a couple of instances:
- On a new setup of my DE development instance, multiple glideins are excessively requested and run in the factory, all sources except
source1went offline. Onlytest_channelwas steady butresource_requestwent offline. This further confirmed that it is something on the DE side. - Reached out to Vito regarding this behavior to see if he had experienced something similar with his activity on DE as he was conducting some tests on a daily basis. He also confirmed that he's been seeing the exact behavior where new glideins show up even after his jobs were completed. This was reported at the weekly DE meeting just before winter break to understand if this was the expected behavior. The outcome of the discussion was that this behavior is not okay as glideins queue up and goes wasted as they are not used at all, if there are no further jobs to be run.