Skip to content

Remote task monitoring#6308

Open
pevogam wants to merge 5 commits into
avocado-framework:masterfrom
pevogam:remote-task-monitoring
Open

Remote task monitoring#6308
pevogam wants to merge 5 commits into
avocado-framework:masterfrom
pevogam:remote-task-monitoring

Conversation

@pevogam

@pevogam pevogam commented May 29, 2026

Copy link
Copy Markdown
Contributor

Improve the integration between the avocado task state machine and the remote process spawner in order to properly monitor tasks and respect their configured or default timeouts.

pevogam added 2 commits May 22, 2026 23:13
Make more proper use of the avocado task state machine by detaching
from the task runner. This prevents the respective coroutine from
spending all the time at the task spawning stage instead of properly
monitoring the spawned task. While not fatal, the previous behavior
also led to "task ended too fast" warnings at the end of the long
task spawning wait where the task actually ended but definitely not
too fast.

While benevolent this change will be in need of some supporting
changes to provide enough resilience to the monitoring which come
next.

Signed-off-by: Plamen Dimitrov <plamen.dimitrov@intra2net.com>
A potential async yield due to slightly longer nonzero IO wait of
a forked command could result in the task only being logged as
"successfully spawned" after it is entirely complete if the other
coroutines spend too much time before coming back to this one. This
in turn would once again result in a "task ended too early" warning
all because it was revisited at a much later time. Worse yet, it
will also result in a skipped monitor stage where the task result
might be awaited indefinitely and any potential task timeout ignored.

Also make remote command running entirely in-sync so that even
though a drop at the call is highly unlikely (no IO waits), it will
now be fully prevented.

Signed-off-by: Plamen Dimitrov <plamen.dimitrov@intra2net.com>
@mr-avocado mr-avocado Bot moved this to Review Requested in Default project May 29, 2026
@pevogam pevogam marked this pull request as draft May 29, 2026 11:35

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the asynchronous remote command execution with a synchronous implementation and introduces a synchronous retry loop with time.sleep in is_task_alive. The review feedback highlights that these synchronous calls will block the main asyncio event loop, freezing the application and preventing other concurrent tasks from progressing. It is highly recommended to keep these operations asynchronous.

Comment thread optional_plugins/spawner_remote/avocado_spawner_remote/__init__.py
Comment thread optional_plugins/spawner_remote/avocado_spawner_remote/__init__.py
@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 14.28571% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.04%. Comparing base (99bfac9) to head (d8c32f6).
⚠️ Report is 70 commits behind head on master.

Files with missing lines Patch % Lines
.../spawner_remote/avocado_spawner_remote/__init__.py 14.28% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6308      +/-   ##
==========================================
- Coverage   73.60%   72.04%   -1.57%     
==========================================
  Files         206      206              
  Lines       22505    23356     +851     
==========================================
+ Hits        16565    16826     +261     
- Misses       5940     6530     +590     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pevogam added 2 commits June 3, 2026 22:13
There are observable cases where it might take a very short while
for the process to appear and thus the task be considered alive
so make sure the overall check makes at least a few tries within
a ten second window.

Signed-off-by: Plamen Dimitrov <plamen.dimitrov@intra2net.com>
Prevent errors where we could not retrieve the command status like

  File "/usr/lib/python3.13/site-packages/avocado_spawner_remote/__init__.py", line 218, in wait_task
    if not RemoteSpawner.is_task_alive(runtime_task):
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^

  File "/usr/lib/python3.13/site-packages/avocado_spawner_remote/__init__.py", line 142, in is_task_alive
    status, output = session.cmd_status_output(
                     ~~~~~~~~~~~~~~~~~~~~~~~~~^
        f"pgrep -r R,S -f {runtime_task.task.identifier}"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^

  File "/usr/lib/python3.13/site-packages/aexpect/client.py", line 1491, in cmd_status_output
    raise ShellStatusError(cmd, out) from error

Use the "safe" flag to handle cases where the shell prompt might be
polluted with "[Done] some-background-process" appearing from the
detached avocado task process but also handle any further unexpected
status retrieval errors. There were still rare cases where the safe
flag might miss something yet it does filter our most cases from
needing to catch status errors (which is also worst in terms of
peformance compared to a regular boolean check).

Signed-off-by: Plamen Dimitrov <plamen.dimitrov@intra2net.com>
@pevogam pevogam force-pushed the remote-task-monitoring branch from 1592cc5 to 678478b Compare June 5, 2026 09:40
@pevogam pevogam marked this pull request as ready for review June 5, 2026 09:41
@pevogam

pevogam commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

I assume the failures are due to rebasing on original branch on top of 113.0. I was hoping I can just push from a tag and have enough stability but if it is really needed I will rebase on top of the most recent master and push again. Let me know.

The is_task_alive checks may end up with a false positive outcome
of the test process has spawned its own subprocess which contains
its name as an argument. This can e.g. happen when checking a VT
test task with a windows 11 vm which might have spawned a TPM 2.0
emulator with a socket server like

2732988 pts/8    S+     0:00      \_ /usr/bin/swtpm socket --ctrl type=unixio,path=/root/avocado/data/avocado-vt/swtpm/mw111_tpm0_swtpm.sock,mode=0600 --tpmstate dir=/root/avocado/data/avocado-vt/swtpm/mw111_tpm0_state,mode=0600 --terminate --tpm2 --log file=/mnt/local/results/job-name/test-name/vtpm_mw111_tpm0_swtpm.log

Let's prevent this by also filtering for the task-run prefix to
identify the exact parent process for the test task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Review Requested

Development

Successfully merging this pull request may close these issues.

1 participant