Issue with multi node setup lava-wait, lava-wait-all and lava-sync for long duration tests
- When a test run more than 3hours , graceful exit is not happening on devices.
- Device socket is not accepting any message from other device.
How to reproduce issue :
- Use attached lava-multinode.yaml to reproduce.
Please find below logs and all related yaml files
Check listener.log, not received sync message , cancelled job manually
Git Repo: https://github.com/veeraready/lava-demo
I'm guessing a socket.setsocketopt(socket.TCP_KEEPALIVE) would help's here.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information