accounting_troubleshooting
This is an old revision of the document!
Table of Contents
Accounting Troubleshooting
Luego de la actualización de Lenovo, nuestro mecanismo para informar a usuarios sobre su consumo de horas se vió afectado, ya que el mismo dependía del comando scontrol show assoc_mgr.
[bbruzzo@snmgt01 ~]$ scontrol show assoc_mgr | grep -A 7 "QOS=qos_pisca_145("
QOS=qos_pisca_145(122)
UsageRaw=0.000000
GrpJobs=N(0) GrpJobsAccrue=N(0) GrpSubmitJobs=N(0) GrpWall=N(0.00)
GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0),gres/gpu:v100=N(0),gres/gpumem=N(0),gres/gpuutil=N(0)
GrpTRESMins=cpu=6000000(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=60000(0),gres/gpu:v100=N(0),gres/gpumem=N(0),gres/gpuutil=N(0)
GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0),gres/gpu:v100=N(0),gres/gpumem=N(0),gres/gpuutil=N(0)
MaxWallPJ=
MaxTRESPJ=
Revisión de Database
Para hacer un dump de la database, desde mmgt02:
sudo mysqldump --single-transaction --databases slurm_acct_db > backup.sql
Parsear desde sacct
sacct -X -a -A pisca_73 --starttime=2025-01-01 --parsable2 --noheader --format=elapsedraw,ncpus | awk -F'|' '{sum+=$1*$2} END {print sum/3600}'
Python Script reporte horas
#!/usr/bin/env python3.10
import subprocess
def get_accounts():
command = ['sacctmgr', '--noheader', 'list', 'account', 'format=account']
output = subprocess.run(command,capture_output=True,encoding='utf-8')
accounts = output.stdout.split()
return accounts
def get_hours(account):
command = ['sacct', '-X', '-a', '-A', str(account), '--starttime=2025-01-01', '--parsable2', '--noheader', '--format=elapsedraw,ncpus']
pipe_command= ['awk', '-F|', '{sum+=$1*$2} END {print sum/3600}']
proc = subprocess.Popen(command,stdout=subprocess.PIPE)
pipe_proc = subprocess.Popen(pipe_command,stdin=proc.stdout,stdout=subprocess.PIPE,encoding='utf-8')
stdout,stderr = pipe_proc.communicate()
print(account)
print(stdout)
if __name__ == '__main__':
accounts = get_accounts()
for account in accounts:
match account:
case account if account.startswith(('pad','pci','pisca')):
get_hours(account)
accounting_troubleshooting.1770146601.txt.gz · Last modified: by bbruzzo
