When jobs are submitted, they stay in the queue until a device with matching type, tags and permissions becomes available. For a variety of reasons, this can take an indefinite amount of time: not enough devices to process all the queued jobs quicker than they get submitted, devices going offline for extended periods of time...
When this happens, typically some lab administrator has to cancel jobs that will never run or are now out of date in order to resolve the situation. One way to avoid this issue would be to set a timeout in the job definition to automatically cancel jobs before they get scheduled. After all, jobs failing to get scheduled in a timely fashion is similar to jobs failing to complete other stages in time.
So here's a proposal to deal with this kind of scenario:
timeouts: queue: hours: 24 job: minutes: 90 action: minutes: 15 connection: minutes: 2