gpfs
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| gpfs [2025/06/26 19:49] – created joaquintorres | gpfs [2025/09/23 16:39] (current) – joaquintorres | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== GPFS ====== | ====== GPFS ====== | ||
| + | |||
| + | |||
| + | ===== Documentación (GPFS v5.18) ===== | ||
| + | {{ : | ||
| + | {{ : | ||
| + | {{ : | ||
| + | |||
| + | ===== Estructura del cluster GPFS ===== | ||
| + | Un cluster GPFS consiste en varios elementos: | ||
| + | |||
| + | * Cluster manager: 172.27.253.31 (sdmgt01-ib0). Realiza el monitoreo de los disk leases, detecta fallas, elige el nodo de file system manager, determina la continuidad del servicio, maneja los UIDs, etc. | ||
| + | * Quorum nodes: sdmgt01-ib0 y sdmgt02-ib0. Nodos seleccionables para Cluster Manager. | ||
| + | * File system manager. Hace las reparaciones, | ||
| + | * Metanode: uno por archivo/ | ||
| + | * AFM gateway node: Cada "cache fileset" | ||
| + | |||
| + | La info del cluster manager se obtiene de correr: | ||
| + | < | ||
| + | $ mmlsmgr | ||
| + | file system | ||
| + | ---------------- ------------------ | ||
| + | data_fs | ||
| + | home_fs | ||
| + | |||
| + | Cluster manager node: 172.27.253.31 (sdmgt01-ib0) | ||
| + | </ | ||
| + | |||
| + | La data se almacena directamente en el inodo si no hay niveles de indirección, | ||
| + | |||
| + | |||
| + | ===== Directorios compartidos ===== | ||
| + | Los discos de GPFS son /home y /data: | ||
| + | /home: ~46TB de scratch, almacenamiento rápido | ||
| + | |||
| + | /data: ~1PB de almacenamiento frío | ||
| + | |||
| + | Las quotas son | ||
| + | < | ||
| + | mmlsquota | ||
| + | Block Limits | ||
| + | Filesystem type | ||
| + | data_fs | ||
| + | home_fs | ||
| + | mmlsfs all -Q | ||
| + | |||
| + | File system attributes for / | ||
| + | ======================================== | ||
| + | flag value description | ||
| + | ------------------- ------------------------ ----------------------------------- | ||
| + | | ||
| + | user; | ||
| + | none | ||
| + | |||
| + | File system attributes for / | ||
| + | ======================================== | ||
| + | flag value description | ||
| + | ------------------- ------------------------ ----------------------------------- | ||
| + | | ||
| + | user; | ||
| + | none | ||
| + | </ | ||
| + | |||
| + | Asumiendo que la integración con df es correcta, la cantidad de inodos para /data son 128 millones y para /home 47 millones: | ||
| + | |||
| + | < | ||
| + | data_fs | ||
| + | home_fs | ||
| + | </ | ||
| + | |||
| + | / | ||
| + | |||
| + | / | ||
| + | |||
| + | / | ||
| + | |||
| + | ===== Monitoreo de salud ===== | ||
| + | Se puede ver un snapshot de la salud del cluster GPFS con | ||
| + | |||
| + | < | ||
| + | $ sudo / | ||
| + | |||
| + | Component | ||
| + | -------------------------------------------------------------------------------------- | ||
| + | NODE 87 1 1 | ||
| + | GPFS 87 0 1 | ||
| + | NETWORK | ||
| + | FILESYSTEM | ||
| + | DISK | ||
| + | FILESYSMGR | ||
| + | NATIVE_RAID | ||
| + | |||
| + | </ | ||
| + | |||
| + | En cada nodo se puede correr un comando más específico: | ||
| + | < | ||
| + | # / | ||
| + | |||
| + | Node name: cn013-ib0 | ||
| + | Node status: | ||
| + | Status Change: | ||
| + | |||
| + | Component | ||
| + | ------------------------------------------------------------------------------- | ||
| + | GPFS | ||
| + | NETWORK | ||
| + | FILESYSTEM | ||
| + | </ | ||
| + | |||
| + | Para todos los nodos sería mmhealth node show -N all | ||
| + | |||
| + | Y se puede obtener un histórico con | ||
| + | |||
| + | < | ||
| + | # / | ||
| + | Node name: | ||
| + | Timestamp | ||
| + | 2024-10-21 14: | ||
| + | 2024-10-21 14: | ||
| + | 2024-10-21 14: | ||
| + | </ | ||
| + | |||
| + | Los eventos que aparecen **pueden hacer de trigger para algún script en caso de falla**. Eso estaría bueno implementarlo. | ||
| + | |||
| + | ==== Logs ==== | ||
| + | Los logs se encuentran disponibles en los nodos, en / | ||
| + | |||
| + | ===== Rebuild de los módulos del kernel ===== | ||
| + | Cada vez que se instala un kernel nuevo, hace falta correr mmbuildgpl: | ||
| + | |||
| + | < | ||
| + | [root@mmgt02 ~]# mmbuildgpl | ||
| + | -------------------------------------------------------- | ||
| + | mmbuildgpl: Building GPL (5.1.8.2) module begins at Thu Jun 26 15:17:13 -03 2025. | ||
| + | -------------------------------------------------------- | ||
| + | Verifying Kernel Header... | ||
| + | kernel version = 41800477 (418000477075001, | ||
| + | module include dir = / | ||
| + | module build dir = / | ||
| + | kernel source dir = / | ||
| + | Found valid kernel header file under / | ||
| + | Getting Kernel Cipher mode... | ||
| + | Will use skcipher routines | ||
| + | Verifying Compiler... | ||
| + | make is present at /bin/make | ||
| + | cpp is present at /bin/cpp | ||
| + | gcc is present at /bin/gcc | ||
| + | g++ is present at /bin/g++ | ||
| + | ld is present at /bin/ld | ||
| + | Verifying libelf devel package... | ||
| + | Verifying | ||
| + | Command: /bin/rpm -q elfutils-libelf-devel | ||
| + | The required package | ||
| + | Verifying Additional System Headers... | ||
| + | Verifying kernel-headers is installed ... | ||
| + | Command: /bin/rpm -q kernel-headers | ||
| + | The required package kernel-headers is installed | ||
| + | make World ... | ||
| + | make InstallImages ... | ||
| + | -------------------------------------------------------- | ||
| + | mmbuildgpl: Building GPL module completed successfully at Thu Jun 26 15:17:56 -03 2025. | ||
| + | -------------------------------------------------------- | ||
| + | |||
| + | </ | ||
| + | |||
| + | ===== Binarios ===== | ||
| + | |||
| + | |||
| Los binarios de GPFS se encuentran en / | Los binarios de GPFS se encuentran en / | ||
| Line 73: | Line 240: | ||
| mmcesminfuncs | mmcesminfuncs | ||
| </ | </ | ||
| + | ===== Restart ===== | ||
| - | ===== Rebuild de los módulos del kernel ===== | + | Si el nodo se encuentra colgado por problema con los fs, y si estamos seguros de que está colgado por este motivo, se puede ejecutar: |
| - | Cada vez que se instala un kernel nuevo, hace falta correr mmbuildgpl: | + | |
| < | < | ||
| - | [root@mmgt02 ~]# mmbuildgpl | + | mmshutdown; mmstartup |
| - | -------------------------------------------------------- | + | </code> |
| - | mmbuildgpl: Building GPL (5.1.8.2) module begins at Thu Jun 26 15:17:13 -03 2025. | + | |
| - | -------------------------------------------------------- | + | |
| - | Verifying Kernel Header... | + | |
| - | kernel version = 41800477 (418000477075001, | + | |
| - | module include dir = /lib/ | + | |
| - | module build dir = / | + | |
| - | kernel source dir = / | + | |
| - | Found valid kernel header file under / | + | |
| - | Getting Kernel Cipher mode... | + | |
| - | Will use skcipher routines | + | |
| - | Verifying Compiler... | + | |
| - | make is present at /bin/make | + | |
| - | cpp is present at /bin/cpp | + | |
| - | gcc is present at /bin/gcc | + | |
| - | g++ is present at /bin/g++ | + | |
| - | ld is present at /bin/ld | + | |
| - | Verifying libelf devel package... | + | |
| - | Verifying | + | |
| - | Command: /bin/rpm -q elfutils-libelf-devel | + | |
| - | The required package | + | |
| - | Verifying Additional System Headers... | + | |
| - | Verifying kernel-headers is installed ... | + | |
| - | Command: /bin/rpm -q kernel-headers | + | |
| - | The required package kernel-headers is installed | + | |
| - | make World ... | + | |
| - | make InstallImages ... | + | |
| - | -------------------------------------------------------- | + | |
| - | mmbuildgpl: Building GPL module completed successfully at Thu Jun 26 15:17:56 -03 2025. | + | |
| - | -------------------------------------------------------- | + | |
| + | Que intenta desarmar el stack de gpfs (fs, driver, modulos) y volver a cargarlos, vuelve tambien a montar los fs, pero interrumpe todo proceso que de estos dependa. | ||
| + | |||
| + | Si se desea hacer un restart de cero de todo el cluster, se ejecuta: | ||
| + | |||
| + | < | ||
| + | mmstartup -a # levanta | ||
| + | mmgetstate -a #verifica | ||
| + | mmlsfs all #lista fs | ||
| + | mmlsmount all -L #verifica lo montado | ||
| </ | </ | ||
gpfs.1750967355.txt.gz · Last modified: by joaquintorres
