Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • lava lava
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 90
    • Issues 90
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 48
    • Merge requests 48
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • lava
  • lavalava
  • Issues
  • #583

Closed
Open
Created Jan 10, 2023 by Larry Shen@atlineContributor

Invalid job definition: expected a dictionary

It's a weird issue because it sometimes show next with high probability, while sometimes it's ok with the same job. It's just for multinode job, ok for normal job.

issue1

When this issue happens, the current job run is ok, but we can't resubmit that job as all contents in resubmit text area is empty.

I checked the database and found: all multinode job which has issue has empty value for sub_id and multinode_definition. But for next code I add some log, found sub_id and multinode_definition has value.

                job.sub_id = "%d.%d" % (
                    parent,
                    node_data["protocols"]["lava-multinode"]["sub_id"],
                )
                logger.fatal(job.sub_id)
                logger.fatal("???assign subid")
                job.multinode_definition = (
                    yaml_data  # store complete submission, inc. comments
                )
                logger.fatal(job.multinode_definition)
                logger.fatal("???assign multinode definition")
                try:
                    job.save()
                except Exception as e:
                    logger.fatal("error now")
                    logger.fatal(e)

I failed to find why, could you give me some inspire on how to debug this issue? Or you know possible reason? Thanks.

It will happen randomly despite of particular multinode job, but I still attach one sample:

job.yaml

Edited Jan 10, 2023 by Larry Shen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking