Skip to content

latency-test, latency-histogram: add realtime status bar#4107

Open
grandixximo wants to merge 4 commits into
LinuxCNC:masterfrom
grandixximo:fix/latency-setuid-warning-4044
Open

latency-test, latency-histogram: add realtime status bar#4107
grandixximo wants to merge 4 commits into
LinuxCNC:masterfrom
grandixximo:fix/latency-setuid-warning-4044

Conversation

@grandixximo

@grandixximo grandixximo commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

What

latency-test and latency-histogram now show a status bar along the bottom
with three fields: the running realtime type (or no realtime), the CPU
frequency governor, and isolcpus. A field is coloured red only when it flags a
condition that makes the measured latency unrepresentative, so normal states
stay in the theme colour. On a real RT machine it reads e.g.
RT-PREEMPT_RT / performance / isolcpus=2,3.

Why

In #4044 a run-in-place build was used without sudo make setuid. rtapi_app
ran unprivileged, the latency numbers blew up, and it was mistaken for a code
regression. A visible realtime indicator surfaces that immediately. The
governor and isolcpus fields cover the other two common reasons a latency
measurement is misleading.

How

The state is read via realtime verify (from #4132) for the authoritative
realtime yes/no, with the type label derived from the same kernel signals rtapi
uses; the governor from /sys and isolcpus from /proc/cmdline. latency-test
renders the bar as a pyvcp footer; latency-histogram as a Tk status bar at the
bottom. The histogram's earlier modal no realtime popup is replaced by this
persistent bar (the console warning is kept for non-X runs).

Docs: install/latency-test.adoc gains a status-bar section explaining each
field and linking the fixes (realtime kernel, setuid/setcap, isolcpus), plus a
CPU frequency governor tuning note.

Follow-ups (separate PRs)

Depends on #4132 (merged). Closes #4044.

@grandixximo grandixximo force-pushed the fix/latency-setuid-warning-4044 branch from 77c1e3e to 8274d2d Compare June 2, 2026 13:19
@BsAtHome

BsAtHome commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

These tests are moot when you run on a non-RT kernel (like I do in dev). I'm not sure the noise is really necessary in that case.

@grandixximo

Copy link
Copy Markdown
Contributor Author

silenced on non-rt

@rodw-au

rodw-au commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Great. Every little bit helps. Reduces user frustration and more importantly less developer time wasted on spurious issues.

@hdiethelm

Copy link
Copy Markdown
Contributor

Hmm, in #4044 in the images in the background you clearly see:
grafik

So the new warning will not help that much. If desired, you can add it in the place where Note: Using POSIX non-realtime is printed, so at least i is shown always, not only in these test tools.

@hdiethelm

Copy link
Copy Markdown
Contributor

Connected to this:
#4118

A general way for GUI's to show "You don't have realtime" warnings that you can not overlook would help the most. When milling, I start linuxcnc with the link. So no console. If I accidentally start the wrong kernel, bad luck.

@BsAtHome

BsAtHome commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

On a production system, you may want to warn all the time when this is amiss. Maybe something with a background color turning from gray into gray-red tinted and do something similar consistently in all GUIs.

For dev builds you don't really want this because you want to see what the operator sees while you are working on stuff. I'd go for an opt-in choice by setting a value in the INI file. Maybe something like a boolean [DISPLAY]VISUAL_WARN_NONRT, defaulting to false. Then add the entry commented out in our configs and add a choice in pnconf and friends.

@hdiethelm

Copy link
Copy Markdown
Contributor

I was just checking the code. @grandixximo You already added a better warning in the C code in your nonroot patch, so this was probably not even applied in the screenshot. Now there is a double warning in the console. The note from C++ and the Warning from this PR:
grafik

With latency-histogram there is also a pop-up. With latency-test, this doesn't work. And the most important app, linuxcnc, also just shows noting if you don't start it in a console.

Anyone has a good idea to check easily and globally for real time capability?

There is already a function rtapi_is_realtime(). This is not 100% reliable, if harden_rt() fails, it will return true, even if it runs in SCHED_OTHER. But this can be fixed.

rtapi_is_realtime() is also linked to userspace apps but there it will not work, it checks if the userspace app has realtime... ;-)

I could add a halcmd that checks for realtime. Or a pin that is true when all is ok, false otherwise.

This could then be used in all gui's for an opt-in or opt-out warning. But I am not that deep into all these various gui's and how they communicate with the RT part.

@hdiethelm

Copy link
Copy Markdown
Contributor

On a production system, you may want to warn all the time when this is amiss. Maybe something with a background color turning from gray into gray-red tinted and do something similar consistently in all GUIs.

For dev builds you don't really want this because you want to see what the operator sees while you are working on stuff. I'd go for an opt-in choice by setting a value in the INI file. Maybe something like a boolean [DISPLAY]VISUAL_WARN_NONRT, defaulting to false. Then add the entry commented out in our configs and add a choice in pnconf and friends.

Might be an option that is default on when you deploy an default off in rip-mode? But this is annoying to test.

Otherwise, I would tend for default on, dev's will manage it better to switch it off than users will fight not knowing that they don't have real time enabled. Instead of in ini, might be an environment variable LINUXCNC_NO_RT_WARN. Dev's can set it on their dev machine in .profile if they are annoyed and it works for all test configs.

BTW, just brainstorming options.

@hdiethelm

hdiethelm commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Just a POC, if you think a halcmd getrt (or better name) would help I can create a PR. Was easy.
You can use that everywhere and it will return 1 if failed / 0 if good.

../bin/halcmd getrt ; echo Return value $?
<commandline>:0: exit value: 1
<commandline>:0: No realtime available
Return value 1

make setuid

../bin/halcmd getrt ; echo Return value $?
Realtime available
Return value 0

You find it on my fork:
https://github.com/hdiethelm/linuxcnc-fork/tree/halcmd_getrt
hdiethelm/linuxcnc-fork@master...hdiethelm:linuxcnc-fork:halcmd_getrt

@hdiethelm

Copy link
Copy Markdown
Contributor

Meanwhile, I found also something that looks like it is exposed to the python code:

PyModule_AddIntConstant(m, "is_rt", rtapi_is_realtime());

But this is broken: #4129

@grandixximo

Copy link
Copy Markdown
Contributor Author

Thanks both, this is more useful than my original per-tool heuristic.

I have pivoted the PR to use @hdiethelm's halcmd getrt as the single source of truth instead of hand-rolling a setuid-bit / getcap probe in bash and tcl. Both scripts now just run halcmd getrt and warn only when it reports No realtime available. An rtai/non-uspace build, an older halcmd without getrt, or a working realtime setup all stay silent, so the check rides on the authoritative rtapi_is_realtime() path rather than guessing from file permissions. This also drops the weaker logic @hdiethelm rightly flagged.

A few things I would like your input on, since they touch the broader direction in #4118:

  1. Console double-warning. rtapi already prints Note: Using POSIX non-realtime at the source (uspace_posix.cc). For a console tool like latency-test that note is arguably enough, and a second line from the script is the duplication you saw. I am inclined to keep the script warning only for its actionable hint (the make setuid / make setcap pointer) and let the GUI popup be the real value-add in latency-histogram. Happy to drop the latency-test console line entirely if you would rather the C note be the only console source.

  2. getrt invocation/cleanup. Since getrt goes through hal_systemv and a HAL init, calling it standalone before the test seems to bring up an rtapi instance. @hdiethelm, where do you intend callers to invoke it, and does it need a halrun -U afterward so it does not collide with the session the tool then starts? I did not want to bake in a cleanup that could disturb a running setup.

  3. Dev opt-out policy. I wired a LINUXCNC_NO_RT_WARN env opt-out per @hdiethelm's suggestion, which keeps @BsAtHome's dev boxes quiet without a per-kernel heuristic. If the consensus in Feature: Properly warn if no realtime kernel is active #4118 lands on an INI key like [DISPLAY]VISUAL_WARN_NONRT instead, I will switch to that. The env var is easy to set once for all test configs, which is why I started there.

This now depends on the getrt command landing. @hdiethelm, if you open that as its own PR I will rebase on top and reference it.

@BsAtHome

BsAtHome commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Instantiating a HAL memory segment on invocation may be problematic. It will surely confuse because you have to remember to call halrun -U afterwards. That is not a good design.

Opt-out policies are generally designed to force you to do a thing, even if you do not want to. That is why they should be avoided.

@hdiethelm

Copy link
Copy Markdown
Contributor

Instantiating a HAL memory segment on invocation may be problematic. It will surely confuse because you have to remember to call halrun -U afterwards. That is not a good design.

The way halcmd getrt runs this in the brackground for uspace is by executing rtapi_app getrt. Now there are two possibility's:

  • rtapi_app is already running (for example you start latency_test in an other terminal): The command is executed and the result returned.
  • rtapi_app is not yet running: master starts, runs the command and exits again due to instance_count==0. So nothing stays behind. No halrun -U needed. However, this is kind of a low-likelyness race condition: If you manage to break realtime somehow between halcmd getrt and halrun lat.hal, then no error is reported.

You see that in the following test where I added a message when rtapi_app exits:

halcmd loadrt and2
#Note: Using POSIX realtime
halcmd getrt
#Realtime available
halcmd getrt
#Realtime available
pgrep rtapi_app
#3799
halrun -U
#exit master

vs

halcmd getrt
#Note: Using POSIX realtime
#exit master
#Realtime available
halcmd getrt
#Note: Using POSIX realtime
#exit master
#Realtime available
pgrep rtapi_app
#no process running

I don't see any big downside in doing it like that. But i also do not 100% like starting up rtapi_app just to exit right away. Better would be running it with or after halrun in the script. Or might be an approach using a signal / parameter.

Of course, also RTAI / doc and so on needs to be checked / updated before I will call that ready.

Better ideas are welcome. But I prefer using a check executing the same code path for realtime checks always instead the variant before where you then have most likely diverging real time checks spread in all possible apps.

@grandixximo grandixximo force-pushed the fix/latency-setuid-warning-4044 branch from 228e6ef to a3df81c Compare June 5, 2026 07:19
@grandixximo

Copy link
Copy Markdown
Contributor Author

@hdiethelm I have restructured to avoid the standalone HAL instantiation @BsAtHome flagged:

  • latency-histogram now calls halcmd getrt from inside its own running session (right after hal start), so it attaches to the realtime already up rather than spinning up a segment that needs a separate halrun -U.
  • latency-test drops its script-side check entirely and relies on the existing Note: Using POSIX non-realtime from rtapi, which already lands on the same console. That also removes the double-warning you saw.
  • Dropped the LINUXCNC_NO_RT_WARN opt-out per @BsAtHome; the dev-suppression policy can be decided in Feature: Properly warn if no realtime kernel is active #4118 rather than baked in here.

That keeps the scripts honest, but @BsAtHome's deeper point lands on getrt itself: do_getrt_cmd goes through hal_systemv + a HAL init, so any standalone caller instantiates a segment. Is it worth making getrt probe rtapi_is_realtime() without a full hal_init, so a GUI can ask "is realtime available" cheaply before starting anything? That would let every GUI (including linuxcnc started from a launcher, the #4118 case) query it without session side effects. If you think that is the right shape, this PR can depend on that and I will rebase on top once your getrt lands as its own PR.

@hdiethelm

hdiethelm commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

@grandixximo Nice. I have to test it.

I created a PR, feel free to rebase:
https://github.com/LinuxCNC/linuxcnc/pull/4132/changes

@BsAtHome
Yes, rtapi_app has the annoying behavior that it always creates this memory segment and initializes RT. Even if you exit right away after. However, it looks like this doesn't hurt anything, you can start any app after rtapi_app getrt or other commands that do not increase the instance counter or do anything else than initializing this segment and exiting afterwards.

Any hint's what should be done in this case? Before my pr rtapi_app rework pr, even rtapi_app exit initialized a memory segment when it was not running. ;-)

@grandixximo

Copy link
Copy Markdown
Contributor Author

Crossed posts, @hdiethelm. Good, your "run it with/after halrun in the script" is exactly what I did for latency-histogram (getrt after hal start, inside the session), so no stray rtapi_app and it also avoids the break-in-between race you noted. For latency-test I leaned on the existing rtapi note rather than a second getrt call; happy to switch it to getrt inside its HAL flow if you would rather every tool go through the one path. I will rebase on your getrt PR once it is up with the RTAI/doc bits.

@hdiethelm

hdiethelm commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Hmm, just an idea:
Something like this for the test scripts? I find this huge popup's a bit annoying.
grafik

@BsAtHome
What do you think about this for all GUI apps? TBD how to inhibit but there will be a way.
4803bb1
Somehow gmocappy doesn't show this error. I guess there is a bug that startup errors are not shown?
grafik

@BsAtHome

BsAtHome commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

A (forced) popup is the equivalent of slapping someone in the face.

The error message added to the GUI is actually not an error. Not running RT on a production system may be considered an error.

If you look closely in AXIS' status bar you see "Kein Werkzeug". That is also the place where you want to warn the user. Add a status bar field that is obvious (light red background) yet not invasive.

@hdiethelm

hdiethelm commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

You have a point. I also get annoyed of all this popup's when using the good old microslop... :-D

Now on the status bar: Good idea. How to do that? I am already somewhat deep in the C code, so I can add any needed support there but for GUI's, someone else has to take over.

@grandixximo Can you do this based on whatever from the hal? halcmd / parameter / pin is easy to add for me.

@grandixximo

Copy link
Copy Markdown
Contributor Author

@hdiethelm thanks, I will rebase this onto #4132 once it settles. @BsAtHome agreed the popup is too much; I will drop the tk_messageBox and warn non-invasively in the test tools instead.

On the cross-GUI status bar, that is the right shape and I am happy to do the AXIS side (a status-bar field with a light-red background, like the existing tool slot) once @hdiethelm's "realtime ok" signal from #4132 exists. The question is where it lives.

My preference is to keep #4107 narrow: rebased on getrt, popup gone, scoped to the latency tools. It can ship as soon as #4132 lands. The GUI-wide warning cannot be written until the HAL pin exists and touches AXIS, gmoccapy and qtvcp, so I would do it as a separate PR tracking #4118 rather than make this small change wait on the slowest part.

That said, if you would rather have one PR own the whole intent, I am fine rescoping #4107 to the GUI-wide warning and retitling it; it just becomes larger and slower. Either way works for me. Which do you prefer?

@grandixximo

Copy link
Copy Markdown
Contributor Author

@hdiethelm to answer your "halcmd / parameter / pin" question directly: a bool pin is best for the GUIs. AXIS, gmoccapy and qtvcp already monitor HAL pins, so they can reflect realtime state live in the status bar without polling a command. I would steer away from a param given those are heading for deprecation, and a halcmd is the least convenient since a GUI would have to shell out to poll it.

@BsAtHome

BsAtHome commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

I think we first have to agree on the proper conceptual design of how to detect in the different scenarios and what to do with it.

@grandixximo

Copy link
Copy Markdown
Contributor Author

@BsAtHome agreed, let me put a concrete proposal on the table.

Detection: one source of truth. @hdiethelm's rtapi_is_realtime() path, exposed as a single bool HAL pin ("realtime ok"). Every app reads the same signal, so there is no divergent per-app logic, which was the original concern.

What to do with it: per UI, not one mechanism. The right surface differs by app, so each owns its own rendering rather than forcing a single widget everywhere:

  • Console tools (latency-test): the existing Note: Using POSIX non-realtime already covers them.
  • GUIs (AXIS, gmoccapy, qtvcp): an in-window, non-invasive indicator. Obvious but not a slap, e.g. a light-red status-bar field, no forced popups.

One thing to rule out: coloring the window title bar / decoration is not reliable. Plenty of setups have no title bar at all (fullscreen/kiosk panels, some Wayland/WM configs), so the signal has to live inside the app window, not in the chrome.

Still open (defer): production-vs-dev suppression policy. That can ride with #4118 once the pin and the per-UI rendering exist.

Does that match how you see the scenarios?

@hdiethelm

Copy link
Copy Markdown
Contributor

Sounds like a plan. I will create new PR with a signal. Then we can test how this feels and continue from there.

I can do that tomorrow, right now I have other things to do.

About the title bar: The idea was to only use this for the two test apps. If this is cumbersome, might be just modify the text that is already displayed in them.

@grandixximo Can you mark this PR as a draft until we are done?

Sorry about the for- and back. If i dont have a good solution yet, this is often my way of brainstoming. Try things and discard until it is good. Hope this is ok for you.

@BsAtHome

BsAtHome commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Detection: one source of truth. @hdiethelm's rtapi_is_realtime() path, exposed as a single bool HAL pin ("realtime ok"). Every app reads the same signal, so there is no divergent per-app logic, which was the original concern.

That is only partly satisfactory because for this realtime needs to be running.

You want to know in advance whether your system will be capable of running RT without starting any of it. Then, when you are running, you want to know from various applications what the actual status is by using generic API call or/and HAL pin.

@grandixximo grandixximo marked this pull request as draft June 5, 2026 09:09
@grandixximo

Copy link
Copy Markdown
Contributor Author

Hope this is ok for you.

No problem, we brain storm it and come up with something that sticks.

@hdiethelm

Copy link
Copy Markdown
Contributor

Ok, so I stick with the following and drop the pin idea.

  1. when nothing is started to see if the system is actually capable of running RT
    • realtime verify
  2. at startup
    • not implemented except console output.
  3. when running
    • realtime verify (For scripts, while running)
    • hal_data->realtime_status + hal_realtime_status() + the hal.realtime_status() Python binding

Due to realtime_status vs. realtime_type is redundant, the following suggestion:

realtime_status = -1 Uninitialized (this is the case until rtapi_app is not running, starts at first call of loadrt)
realtime_status = 0 NonRT
realtime_status = 1 ... n Different variants active

So you can use:

  • hal_realtime_status() < 0: Bug, you call it to early
  • hal_realtime_status() == 0: Non-RT
  • hal_realtime_status() > 0: RT
  • hal_realtime_status() == RT_STATUS_XENOMAI3 (and all implemented variants)
    So you can add specific checks for xenomai / rtai / posix if this is needed somewhere.

The pin idea sounded nice until I started implementing it. And now I also don't like it any more.

Two nits: hal_realtime_status() should be (void) in hal.h, and realtime_status could go at the end of hal_data_t to limit offset churn. The realtime-type enum is a good addition and fits the same shm-field pattern, no pin needed.

  • void: Sure, thanks. POC phase, I don't really try to have good code quality.
  • limit offset churn: Can you elaborate? There is also exact_base_period which should be moved always in this case I think.

I will start cleaning everything up.

@BsAtHome

BsAtHome commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

The code in #4132 is complex. The HAL interface is both ugly and unnecessary when you simply call rtapi_is_realtime() because that is available in the hal module anyway as a constant property. The only real fix would then be to ensure that rtapi_is_realtime() returns the correct value (and could make it in halmodule a live property).

Seen in that light and that we want hal to be "clean" at start and stop... we should drop the fixed created component and may add a separately loadable component if anybody wants or needs it. That is a much cleaner solution. And, FWIW, a components also has access to rtapi_is_realtime().

@hdiethelm

Copy link
Copy Markdown
Contributor

Probably our comments crossed.

rtapi_is_realtime() only works in real time context. In user components / python run in a different process, so even having a static value in code that is the status will not work due to this is a different process. shm (HAL) or an other way is needed to pass the process boundary.

See #4129

Are there any other standard ways to communicate with the real time process except HAL pin / shm?

@hdiethelm

Copy link
Copy Markdown
Contributor

An option would be a rewrite / fix of rtapi_is_realtime()
At the moment, it uses can_set_sched_fifo() / has_setuid_root() which only work when called in the rtapi_app process.

Instead, what we could do is check the rtapi_app executable from externally. More or less:

geteuid() == 0 -> stat(EMC2_BIN_DIR "/rtapi_app", &st)

cap_get_proc() -> cap_get_file(EMC2_BIN_DIR "/rtapi_app")

rtapi_is_realtime() is already exposed in python: PyModule_AddIntConstant(m, "is_rt", rtapi_is_realtime());

@grandixximo

Copy link
Copy Markdown
Contributor Author

@hdiethelm the stat(rtapi_app) path was already tried and removed in commit 49523cddb7 (the rtapi_is_realtime() rework from #3928). uspace_common.h:450-453 documents the three reasons: it only checked the setuid bit (not file caps), it stat-ed EMC2_BIN_DIR/rtapi_app rather than the running binary (breaking wrapper installs like NixOS /run/wrappers), and it ran before LINUXCNC_FORCE_REALTIME was checked. The current can_set_sched_fifo() probe was the deliberate replacement.

So the shm field in #4132 is the right path: it is written by rtapi_app into shared memory and readable from any process without re-probing. It also directly fixes #4129: hal.is_rt/is_sim would derive from hal_data->realtime_status rather than the in-process probe, making them correct for unprivileged callers. The socket path (realtime verify) works too but spawns a process, fine for scripts, wasteful to poll from a GUI.

@hdiethelm

Copy link
Copy Markdown
Contributor

Agreed.
The shm field / checking it internally is the bullet prove way.
Even thought we can check caps / LINUXCNC_FORCE_REALTIME externally, checking it externally will always have a strange edge case resulting in:

  • externally, all looks good
  • internally, rt fails

For example, LINUXCNC_FORCE_REALTIME is set in the context where the check is performed but not in the context where rtapi_app is started, just to name one.

@BsAtHome

BsAtHome commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Checking the shm field (function wrapped) will advise both RT and non-RT what the status is of the running system (as long as HAL is originally instantiated and initialized by the right process).
This is also the value that halmodule should return. Is there a path where halmodule starts before RT is running? I.e., can halmodule instantiate the HAL shm? If so, then there might be a problem.

@hdiethelm

hdiethelm commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Checking the shm field (function wrapped) will advise both RT and non-RT what the status is of the running system (as long as HAL is originally instantiated and initialized by the right process). This is also the value that halmodule should return. Is there a path where halmodule starts before RT is running? I.e., can halmodule instantiate the HAL shm? If so, then there might be a problem.

Yes, there is. If you do:

import hal 

h = hal.component("passthrough")
print("HAL realtime_status " + str(hal.realtime_status()))

the result is:
HAL realtime_status -1

Described in #4107 (comment)

If you do halcmd loadrt and2, realtime is started and you will get:
HAL realtime_status 1

This can be mitigated by starting rtapi_app at the first instantiation of the hal and only exiting at exit. However, this might have other side effects. So I would like to to this later.

@hdiethelm

hdiethelm commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

All this rtapi_app start / exit is a bit fuzzy. It is started at the first load_rt command and exits when there is no rt module loaded any more.

This results in sometimes funny behavior: halcmd debug will only have an effect when it is after the first loadrt command. If it is before, rtapi_app starts, does the debug command and exits immediately. So the status is lost and the command has no effect.

Better would be one of those:

  • start at start command / exit at exit command.
  • rtapi_app_deamon / rtapi_app_cmd (one is the previous master / one the previous client)

But this opens a new can of worms I would like to avoid right now.

@hdiethelm

hdiethelm commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

So, I continued:
ed8ef46: Removes the persistent component again
f80af10: Moves the detection to uspace_rtapi_main.cc
97bb377: Reworks the whole detection and realtime_status() is now a status exactly reflecting the effective status.
All still WIP and a bit rough, but the concept should be visible.

@grandixximo grandixximo force-pushed the fix/latency-setuid-warning-4044 branch from a3df81c to 752a271 Compare June 8, 2026 03:15
@grandixximo

grandixximo commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I've rebased this onto realtime verify from #4132 (latency-histogram now asks the realtime layer instead of the removed halcmd getrt).

Correction to myself: latency-test is not console-only, it brings up a pyvcp panel, so a user launching it from a menu would not see the Note: Using POSIX non-realtime on the console either. So the question of how to surface an RT warning in a GUI applies to latency-test as much as to latency-histogram, and that ties into the still-open GUI-warning discussion (#4118) rather than something to settle here. I'll hold on changing latency-test until we agree on how in-UI warnings should look.

@grandixximo grandixximo force-pushed the fix/latency-setuid-warning-4044 branch from 752a271 to 21060e3 Compare June 10, 2026 01:27
@hdiethelm

hdiethelm commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Back to the discussion of how to show it, options:

  • Popup
  • Sneek in a title bar change at the low level UI functions
diff --git a/src/hal/user_comps/pyvcp.py b/src/hal/user_comps/pyvcp.py
index f8b9bb2216..06f2178eec 100755
--- a/src/hal/user_comps/pyvcp.py
+++ b/src/hal/user_comps/pyvcp.py
@@ -40,6 +40,7 @@ sys.path.insert(0, os.path.join(BASE, "lib", "python"))
 
 import vcpparse
 import hal
+import lcnc_realtime
 
 import tkinter as Tkinter
 from tkinter import Tk
@@ -87,6 +88,10 @@ def main():
    
     vcpparse.filename=filename
     pycomp=vcpparse.create_vcp(compname=component_name, master=pyvcp0)
+
+    if not lcnc_realtime.verify():
+        pyvcp0.title(pyvcp0.title() + " NO REALTIME")
+
     pycomp.ready()
 
     try:
Bildschirmfoto vom 2026-06-24 16-49-58

and similar for other UI classes.

  • UI specific
    • latency-test / latency histogram: Add red text where the OS information is
    • axis: Somewhere in the footer next to "No Tool"
    • ...

For me, a popup that I can disable would be fine.
Either with an environment variable like LINUXCNC_DISABLE_RT_WARN for example or in a config like ~/.linuxcncrc. But it should be global, not per machine configuration.
This popup should show a way how to disable it. But there are other opinions.

Without 'sudo make setuid' (or 'sudo make setcap') rtapi_app runs
unprivileged: no SCHED_FIFO, no locked memory, so latency readings are
wildly inflated and easy to mistake for a code regression. Warn, for a
non-root user, when rtapi_app is neither setuid root nor carries the
cap_sys_nice capability.

Closes LinuxCNC#4044
Only warn under PREEMPT_RT or RTAI; on a non-RT kernel the privileges
do not matter, so the check would be noise.
…istic

Query realtime status with 'realtime verify' (from LinuxCNC#4132) rather than
probing the setuid bit. latency-histogram asks the realtime layer
directly; latency-test relies on the existing "POSIX non-realtime" note.
@grandixximo

grandixximo commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

I like the footer bar better even for latency test and hist

image image

can pack more info, and it's more visible than the title bar IMO
I stared at your picture for a couple of minutes before realizing NO REALTIME was in the title bar.
Me like what I did better, but that's called bias, thoughts?

@rodw-au

rodw-au commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Seeing you are working on this, what would be a useful feature to add would be a command line switch that ran for a specific --time --quiet(ly) and print the latency results to the console so scripts could report latency. I've wanted to do this recently and had a lot of trouble finding a way to do this. cyclictest allows you to do this but it's still not easy and not within the LinuxCNC ecosystem.

@hdiethelm

hdiethelm commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Looks nice in the footer! Visible but not a slap in the face.

I did not know how to do this, mainly the reason why I used the title bar to add some ideas.

Is this also possible in axis / gmoccapy?

latency-test and latency-histogram now show a status bar along the bottom
with three fields: the running realtime type (or "no realtime"), the CPU
frequency governor, and isolcpus. A field is coloured red only when it
flags a condition that makes the measured latency unrepresentative, no
realtime, a non-performance governor, so normal states stay in the theme
colour. The bar reads the state via the realtime script and /proc//sys,
and on a real RT machine shows e.g. RT-PREEMPT_RT / performance / isolcpus=2,3.

The histogram's earlier modal "no realtime" popup is replaced by this
persistent bar (the console warning is kept for non-X runs).

Docs: describe the status bar and what each red field means in
install/latency-test.adoc, link the fixes to the existing realtime-kernel,
setuid/setcap and isolcpus material, and add a CPU frequency governor
tuning note (set to performance) which was previously undocumented.
@grandixximo grandixximo force-pushed the fix/latency-setuid-warning-4044 branch from 21060e3 to 940cc00 Compare June 25, 2026 12:24
@grandixximo

grandixximo commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Seeing you are working on this, what would be a useful feature to add would be a command line switch...

Good idea, belongs as a --quiet/--seconds mode of latency-test, please open an issue, it's a different PR.

Is this also possible in axis / gmoccapy?

Yes, but On the main GUIs I'd show just the realtime field (RT-<type> / no realtime); the governor and isolcpus stay in latency-test / latency-histogram.

Placement, reusing each GUI's existing status area:

  • Axis: the existing status footer (the "No Tool" / position row).
  • QtDragon: its status bar.
  • Gmoccapy: there's room near the clock. (I don't really see anywhere else, and I don't know if we are allowed to make it bigger with a footer, to discuss in Feature: Properly warn if no realtime kernel is active #4118?)
  • Touchy: it already has a status bar; it goes there.

Just the one field via lcnc_realtime.verify() plus the type for the label.

Keep it as its own PR, separate from this one, it's the cross-GUI convention #4118 should settle. This PR lands as the latency-tools footer.

@grandixximo grandixximo marked this pull request as ready for review June 25, 2026 12:33
@grandixximo grandixximo changed the title latency-test, latency-histogram: warn when rtapi_app lacks RT privileges latency-test, latency-histogram: add realtime status bar Jun 25, 2026
Comment thread scripts/latency-histogram
}
if {$rt_ok} {
set t SCHED_FIFO
if {[string match -nocase *-rtai* [exec uname -r]]} {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a maintenance trap. One future change in rtapi_app and this breaks.

Comment thread scripts/latency-test
# the same kernel signals rtapi uses, for display only.
RED="#cc4444"
if realtime verify >/dev/null 2>&1; then
if uname -r | grep -qi -- '-rtai'; then RT_T="RTAI"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@hdiethelm

Copy link
Copy Markdown
Contributor

It looks good with the footer.

However, I don't like the now 2x copy paste real time check code. This is guaranteed to brake soon and be not consistent. For example, for xenomai, rtapi_app still needs setuid, so it is already not consistent when I use setcap on a xenomai kernel:
grafik

Now I wold rather not show the type than doing it this way.

Right now, you can check the effectively running type by using python type = hal.get_realtime_type() but this is a bit cumbersome, it works only when rtapi_app is running, so you would have to do that after halrun lat.hal in a separate thread.

Alternatives:

  • realtime verify: I could print the realtime type on stdout. So I can capture it in python and forward it.
  • Using non-standard return values: Probably a bad idea. I would have to use 1...9 for the types and 0 for error
  • The hard way: Properly separate rtapi_app in master / client and create a client library, so library calls can be used to communicate with the rtapi_app master.

@hdiethelm

hdiethelm commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

I tried out the variant passing the text from rtapi_app trough the socket up to realtime verify and python:
https://github.com/hdiethelm/linuxcnc-fork/tree/fix/latency-setuid-warning-4044
hdiethelm@4d80249

It works but just quickly coded together, needs some doc and tidy up. Do you think this will do the job?

I changed only latency-test to use this. TCL is unreadable for me, would take me hours to do it... ;-)

@andypugh

Copy link
Copy Markdown
Collaborator

Happy to drop the latency-test console line entirely if you would rather the C note be the only console source.

I don't think this is worthwhile, as that line is also useful to tell you which realtime system you are using.
So I think it is good to say which RT you are on, and additionally warn that it is not RT.

  1. Dev opt-out policy. ... The env var is easy to set once for all test configs, which is why I started there.

I think I like the env var. the vast majority of users need to be warned. Those who habitually test-run the code on a non-RT system are unlikely to mind clicking-away a warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Latency test discrepancy after ethercat feature adding

5 participants