Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • lava lava
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 138
    • Issues 138
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 29
    • Merge requests 29
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • lava
  • lavalava
  • Issues
  • #491

Closed
Open
Created Jun 21, 2021 by Carlos Cañero Fernández@ccanero

Multinode Protocol not working (possible LAVA bug)

CONTEXT

Hello everyone. I'm working in a system which uses LAVA to automate the execution of tests for several devices/targets. However, after several weeks of work and test and after reading all the pages of the LAVA documentation for the use of the Multinode API and the Multinode Protocol (and how to make tests, etc.), I couldn't find a solution for the synchronization of a computer acting like a host and a device acting like a guest, which it must wait to the host to be up before starting itself.

ISSUE

After testing during 2-3 weeks (getting to test dozens of use cases by brute force, varying parameters and configurations), finally I started studying the code of LAVA of the Multinode Protocol implementation, in order to locate the error. I discovered that the error was being raised by the function:

def collate(self, reply, params)

in the protocols/multinode.py file (line 451).

I'm not an expert in Python, but I studied the flow execution of the code, the parameters and the results returned by each function, and I think that there is a bug in the code. Specifically in the function collate(). According to the documentation, in the Multinode Protocol we can be use the lava-send command, specifying only the messageID and not the message. However, when reviewing the collate() function, it seems that if you don't specify a message in the job configuration, the job will fail regardless. Also, when specifying the message, it fails too.

Also, this function is accessed by the Multinode Protocol when using the lava-wait command in the target and it fails when using the lava-send command in the host. I don't know if this function is intended to be used by all the commands of the Multinode Protocol.

ATTACHMENTS

I'm attaching the configuration of the device types and the devices used in LAVA, as well as the configuration of some jobs that I submitted and its execution trace with the errors. I had to delete some data due to confidentiality and security, but it is not related to the problem. Also, I'm attaching a simple sequence diagram, in which I show the main functions used during the Multinode Protocol execution (according to what I learned when studied the code). I hope that this diagram can help you when debugging the problem.

  • Host device type configuration -> testbench.jinja2

  • Host device configuration -> testbench-2.jinja2

  • Guest device type configuration -> t2080rdb.jinja2

  • Guest device configuration -> t2080rdb-1.jinja2

  • Job configuration -> job_configuration.yaml

  • Host results trace -> testbench.log

  • Guest results trace -> t2080rdb.log

  • Sequence diagram -> LAVA_Multinode_Protocol_sequence_diagram


I'm trying to patch this bug by myself, but, as I said, I'm not an expert in Python and I'm not very sure how work all the functions involved in this. I would appreciate all the help that you can provide and, if this is truly a bug in LAVA, I think that fixing it will help us all.

Please, comment on any question, idea or update that you have on this. Thank you very much in advance.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking