Occasional UnicodeDecodeError in pdu-reboot
I have a strange error from time to time when booting my device in LAVA:
start: 188.8.131.52 pdu-reboot (timeout 00:08:00) [common] Calling: 'nice' 'udo' '192.168.20.7' 'reset' Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 243, in run_actions new_connection = action.run(connection, action_max_end_time) File "/usr/lib/python3/dist-packages/lava_dispatcher/power.py", line 114, in run self.run_cmd(cmd, error_msg="Unable to reboot: '%s' failed" % cmd) File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 621, in run_cmd line = proc.stdout.read() File "/usr/lib/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
This appears when all of my devices start running health checks at the same time, which happens when I set my worker from "Maintenance" to "Active". About 4..7 of my 8 devices report this error then. The others pass the health check correctly.
All of the devices use the same reset command:
udo <ip-address-of-the-power-controller> reset.
When I trigger all of the health checks by hand, one after another, all of the health checks pass.
I don't know if this is a bug in LAVA, but I have no idea why this happens, so perhaps someone here might be able to help me.
The according lines in the code are in lava_dispatcher/action.py:
# Start the subprocess self.logger.debug("Calling: '%s'", "' '".join(command_list)) start = time.time() # TODO: when python >= 3.6 use encoding and errors # see https://docs.python.org/3.6/library/subprocess.html#subprocess.Popen proc = subprocess.Popen( # nosec - managed command_list, cwd=cwd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=1, # line buffered universal_newlines=True, # text stream ) # Poll stdout and stderr until the process terminate poller = select.epoll() poller.register(proc.stdout, select.EPOLLIN) poller.register(proc.stderr, select.EPOLLIN) while proc.poll() is None: for fd, event in poller.poll(): # When the process terminate, we might get an EPOLLHUP if event is not select.EPOLLIN: continue # Print stdout or stderr # We can't use readlines as it will block. if fd == proc.stdout.fileno(): line = proc.stdout.readline() self.logger.debug(">> %s", line) elif fd == proc.stderr.fileno(): line = proc.stderr.readline() self.logger.error(">> %s", line)
What I already found out: the call
proc.stdout.readline(), which reads the stdout of my command, obviously tries to decode the output to UTF-8. This happens due to the
universal_newlines=True argument to
The funny thing: my
udo <ip> reset command never outputs anything. It just sends one byte to a tcp port via
nc and then quits.
Does anybody have an idea why this error appears?