slurm_script_generator.squeue#

Functions

main()

Entry point for the slurm-queue command-line tool.

Classes

SAcct([user, days, partition])

Interface to SLURM job accounting via sacct.

SAcctJob(job_id, user, name, state, ...)

A single job record from SLURM accounting (sacct).

SQueue([user, partition])

Interface to the SLURM job queue via squeue.

SQueueJob(job_id, user, name, state, ...)

A single job entry from the SLURM queue.

class slurm_script_generator.squeue.SAcct(user: str | None = None, days: int = 7, partition: str | None = None)[source]#

Bases: object

Interface to SLURM job accounting via sacct.

Parameters:
  • user (str, optional) – If given, fetch only jobs for this user.

  • days (int) – Number of days of history to look back (default: 7).

  • partition (str, optional) – If given, filter to this partition.

Examples

>>> a = SAcct(user='alice', days=30)
>>> a.summary()
{'total': 42, 'completed': 30, 'failed': 5, ...}
jobs(user: str | None = None, state: str | None = None, partition: str | None = None) List[SAcctJob][source]#

Return accounting records matching the given criteria.

jobs_by_partition() Dict[str, List[SAcctJob]][source]#

Return a mapping of partition -> list of jobs in that partition.

jobs_by_state() Dict[str, List[SAcctJob]][source]#

Return a mapping of state -> list of jobs in that state.

jobs_by_user() Dict[str, List[SAcctJob]][source]#

Return a mapping of username -> list of their historical jobs.

refresh() SAcct[source]#

Re-run sacct and update the cached job list.

summary() dict[source]#

Return a summary dict of job counts and CPU usage.

Returns:

Keys: total, completed, failed, cancelled, timeout, cpu_hours, by_state, users.

Return type:

dict

class slurm_script_generator.squeue.SAcctJob(job_id: int, user: str, name: str, state: str, partition: str, num_nodes: int, num_cpus: int, elapsed: str, cpu_time_raw: int, exit_code: str)[source]#

Bases: object

A single job record from SLURM accounting (sacct).

property cpu_hours: float#
cpu_time_raw: int#
elapsed: str#
exit_code: str#
property is_cancelled: bool#
property is_completed: bool#
property is_failed: bool#
property is_timeout: bool#
job_id: int#
name: str#
num_cpus: int#
num_nodes: int#
partition: str#
state: str#
user: str#
class slurm_script_generator.squeue.SQueue(user: str | None = None, partition: str | None = None)[source]#

Bases: object

Interface to the SLURM job queue via squeue.

Parameters:

user (str, optional) – If given, only fetch jobs belonging to this user by default.

Examples

>>> q = SQueue()
>>> q.summary()
{'total_jobs': 42, 'running': 30, 'pending': 12, 'users': {...}, 'by_state': {...}}
>>> q.wait_until_done(job_name='training_*')
>>> q.wait_until_done(job_id=12345)
>>> q.wait_until_done(user='alice')
jobs(job_name: str | None = None, job_id: int | str | None = None, user: str | None = None, state: str | None = None, partition: str | None = None) List[SQueueJob][source]#

Return jobs matching the given criteria.

Parameters:
  • job_name (str, optional) – Job name or glob pattern (e.g. 'train_*').

  • job_id (int or str, optional) – Exact job ID.

  • user (str, optional) – Username to filter by.

  • state (str, optional) – SLURM state code, e.g. 'R' or 'PD'.

  • partition (str, optional) – Partition name to filter by.

Return type:

list of SQueueJob

jobs_by_partition() Dict[str, List[SQueueJob]][source]#

Return a mapping of partition name -> list of jobs in that partition.

jobs_by_state() Dict[str, List[SQueueJob]][source]#

Return a mapping of state code -> list of jobs in that state.

jobs_by_user() Dict[str, List[SQueueJob]][source]#

Return a mapping of username -> list of their jobs.

pending_jobs() List[SQueueJob][source]#

Return all jobs currently in the PD (Pending) state.

refresh() SQueue[source]#

Re-run squeue and update the cached job list.

Returns:

self, for chaining.

Return type:

SQueue

running_jobs() List[SQueueJob][source]#

Return all jobs currently in the R (Running) state.

summary() dict[source]#

Return a summary dict with total counts, per-user counts, and per-state counts.

Returns:

Keys: total_jobs, running, pending, users (dict of user -> job count), by_state (dict of state code -> job count).

Return type:

dict

users() List[str][source]#

Return a sorted list of unique users with jobs in the queue.

wait_until_done(job_name: str | None = None, job_id: int | str | None = None, user: str | None = None, poll_interval: float = 30.0, timeout: float | None = None, verbose: bool = True) None[source]#

Block until all matching jobs leave the active queue.

Supports glob patterns in job_name (* and ? wildcards). At least one filter argument must be provided.

Parameters:
  • job_name (str, optional) – Job name or glob pattern, e.g. 'train_*'.

  • job_id (int or str, optional) – A specific job ID to wait for.

  • user (str, optional) – Wait for all jobs belonging to this user to finish.

  • poll_interval (float) – Seconds between queue polls. Defaults to 30.

  • timeout (float, optional) – Maximum seconds to wait before raising TimeoutError.

  • verbose (bool) – Print progress messages. Defaults to True.

Raises:
  • ValueError – If no filter is specified.

  • TimeoutError – If timeout is exceeded before all jobs finish.

class slurm_script_generator.squeue.SQueueJob(job_id: int, user: str, name: str, state: str, partition: str, num_nodes: int, num_cpus: int, time_used: str, time_limit: str, reason: str, priority: int)[source]#

Bases: object

A single job entry from the SLURM queue.

property is_active: bool#
property is_pending: bool#
property is_running: bool#
job_id: int#
name: str#
num_cpus: int#
num_nodes: int#
partition: str#
priority: int#
reason: str#
state: str#
property state_name: str#
time_limit: str#
time_used: str#
user: str#
wait_until_done(poll_interval: float = 30.0, timeout: float | None = None, verbose: bool = True) None[source]#

Block until this specific job leaves the active queue.

Parameters:
  • poll_interval (float) – Seconds between queue polls. Defaults to 30.

  • timeout (float, optional) – Maximum seconds to wait before raising TimeoutError.

  • verbose (bool) – Print progress messages. Defaults to True.

slurm_script_generator.squeue.main() None[source]#

Entry point for the slurm-queue command-line tool.

Sub-commands#

show (default)

Print a per-user queue summary table.

list

Print individual jobs, optionally filtered and sorted.

stats

Print partition and state breakdown statistics.

wait

Block until matching jobs leave the active queue.