Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • lava lava
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 96
    • Issues 96
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 64
    • Merge requests 64
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • lava
  • lavalava
  • Issues
  • #587

Closed
Open
Created Feb 04, 2023 by Veera Reddy@veeraready

Issue with multi node setup lava-wait, lava-wait-all and lava-sync for long duration tests

  • When a test run more than 3hours , graceful exit is not happening on devices.
  • Device socket is not accepting any message from other device.

How to reproduce issue :

  1. Use attached lava-multinode.yaml to reproduce.

Please find below logs and all related yaml files

Check listener.log, not received sync message , cancelled job manually

controller.log

listener.log

talker.log

controller.yaml

lava-multinode.yaml

listener.yaml

talker.yaml

Git Repo: https://github.com/veeraready/lava-demo

I'm guessing a socket.setsocketopt(socket.TCP_KEEPALIVE) would help's here.

https://github.com/Linaro/lava/blob/e149c8075407a443574817195d3f08ed05832820/lava_dispatcher/protocols/multinode.py#L123

Edited Feb 05, 2023 by Veera Reddy
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking