User Tools

Site Tools


admin_slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
admin_slurm [2025/06/27 16:44] – created bbruzzoadmin_slurm [2025/07/08 17:14] (current) bbruzzo
Line 60: Line 60:
 **NoDecay** If set, this QOS will not have its GrpTRESMins, GrpWall and UsageRaw decayed by the slurm.conf PriorityDecayHalfLife or PriorityUsageResetPeriod settings. This allows a QOS to provide aggregate limits that, once consumed, will not be replenished automatically. Such a QOS will act as a time-limited quota of resources for an association that has access to it. Account/user usage will still be decayed for associations using the QOS. The QOS GrpTRESMins and GrpWall limits can be increased or the QOS RawUsage value reset to 0 (zero) to again allow jobs submitted with this QOS to run (if pending with QOSGrp{TRES}MinutesLimit or QOSGrpWallLimit reasons, where {TRES} is some type of trackable resource). **NoDecay** If set, this QOS will not have its GrpTRESMins, GrpWall and UsageRaw decayed by the slurm.conf PriorityDecayHalfLife or PriorityUsageResetPeriod settings. This allows a QOS to provide aggregate limits that, once consumed, will not be replenished automatically. Such a QOS will act as a time-limited quota of resources for an association that has access to it. Account/user usage will still be decayed for associations using the QOS. The QOS GrpTRESMins and GrpWall limits can be increased or the QOS RawUsage value reset to 0 (zero) to again allow jobs submitted with this QOS to run (if pending with QOSGrp{TRES}MinutesLimit or QOSGrpWallLimit reasons, where {TRES} is some type of trackable resource).
  
-**GrpTRESMins** The total number of TRES minutes that can possibly be used by past, present and future jobs running from an association and its children. If any limit is reached, all running jobs with that TRES in this group will be killed, and no new jobs will be allowed to run. This usage is decayed (at a rate of PriorityDecayHalfLife). It can also be reset (according to PriorityUsageResetPeriod) in order to allow jobs to run against the association tree. This limit only applies when using the Priority Multifactor plugin. +**GrpTRESMins** The total number of TRES minutes that can possibly be used by past, present and future jobs running from an association and its children. If any limit is reached, all running jobs with that TRES in this group will be killed, and no new jobs will be allowed to run. This usage is decayed (at a rate of PriorityDecayHalfLife). It can also be reset (according to PriorityUsageResetPeriod) in order to allow jobs to run against the association tree. This limit only applies when using the Priority Multifactor plugin
 + 
 +====== Monitoreo de uso ====== 
 + 
 +Para ver la cantidad usada por cada account, usar el comando de sacctmgr: 
 + 
 +<code> 
 +sacct -A  <account> -X --starttime=2025-01-01 --noheader --parsable2 --format=AllocCPUs,ElapsedRaw | awk -F"|" '{sum+=$2*$1} END {print sum/3600, "CPU Hours"}' 
 +</code>  
 + 
 +==== Ver consumo parcial de QOS ==== 
 +<code>scontrol show assoc_mgr | grep "QOS=qosprueba(32)" -A 21</code> 
 + 
 +Buscar la línea ''GrpTRESMins=cpu=780(815)''  
 + 
 +Donde 780 es el límite de minutos que puede usar de CPU la qos, y 815 es el tiempo utilizado (la qos está excedida, esto ocurrió porque antes de lanzar el último job, todavía quedaban minutos disponibles, pero no los suficientes).
admin_slurm.1751042661.txt.gz · Last modified: by bbruzzo