Stage ordering issue with qemu-from-docker feature
When using qemu-from-docker feature:
- boot: method: qemu docker: image: kevintownsend/lite-qemu5:v1 binary: /usr/bin/qemu-system-arm
I have following output for "boot" stage:
start: 2 boot-image-retry (timeout 00:01:40) [common] start: 2.1 boot-qemu-image (timeout 00:01:40) [common] start: 2.1.1 execute-qemu (timeout 00:01:40) [common] Boot command: docker run --rm --interactive --tty --mount=type=bind,source=/var/lib/lava/dispatcher/tmp,destination=/var/lib/lava/dispatcher/tmp kevintownsend/lite-qemu5:v1 /usr/bin/qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -vga none -serial mon:stdio -nographic -net nic,model=stellaris,macaddr=DE:AD:BE:EF:06:01 -net tap -m 1024 -monitor none -kernel /var/lib/lava/dispatcher/tmp/4/deployimages-_h4nkqm4/zephyr/zephyr.bin started a shell command ... end: 2 boot-image-retry (duration 00:00:01) [common]
I.e., boot stage doesn't seem to do any "real" work, just constructs (and apparently stores) the command line.
Then I have:
start: 3 lava-test-interactive-retry (timeout 00:03:19) [common] start: 3.1 lava-test-interactive (timeout 00:03:19) [common] Sending nothing, waiting Waiting for '>>>' Unable to find image 'kevintownsend/lite-qemu5:v1' locally v1: Pulling from kevintownsend/lite-qemu5 [1A[2Kbbd74bee8c69: Already exists [1B [1A[2Kdbbac58adb4f: Pulling fs layer [1B [1A[2K86df81d00951: Pulling fs layer [1B case: 0_repl definition: lava duration: 199.00 result: fail lava-test-interactive timed out after 199 seconds
I.e. actual docker image pulling happens in the test action, which of course uses up timeout quote for the test action. I need to artificially inflate this timeout, I already went from the original 10s (very reasonable for test) to 20s, then 100s, then 200s. And with my current network connection it still times out. So, I need to set it to very high value for just one-time action of docker image pulling, which will have possible adverse effect on all the future tests runs (e.g. if connections hangs, instead of it being detected quickly, it will hang for a long time before detection).
The solution is simple - "boot" action should actually issue "docker pull" for the requested image, so at the time of test, it's already available locally. Then any timeout increase will go into the "boot" action, which is much more selective than just bumping "test" timeout. Btw, I believe some other docker-related action (perhaps, docker virtual device) already does this pre-pull.
And this may seem as cosmetic issue, but I'd say it noticeably affects developer comfort and workarounds applied then may affect test runner utilization and throughput (prolonged hangs).