Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.18.4: (bug) segfault during cog_launcher_dispose (wayland platform) #744

Open
afk11 opened this issue Nov 26, 2024 · 0 comments
Open

0.18.4: (bug) segfault during cog_launcher_dispose (wayland platform) #744

afk11 opened this issue Nov 26, 2024 · 0 comments

Comments

@afk11
Copy link

afk11 commented Nov 26, 2024

I realized I opened an issue in the wrong repository when i opened it here (Igalia/WPEBackend-fdo#197)

Bug description
I have a buildroot system running cog built from buildroot. When we restart or stop cog via systemd, or just by sending a SIGTERM / CTRL+C - it encounters a SEGFAULT.

How To Reproduce

Ordered steps to reproduce the behavior:

First way: using systemd

Since the cog.service file brings up cog at startup, all we have to do is restart cog to trigger the fault

  • systemctl restart cog
    Here's our systemd file:
[Unit]
Description=WPE launcher and webapp container

After=weston.service
BindsTo=weston.service

StartLimitIntervalSec=0

[Service]
User=root

Environment=WAYLAND_DISPLAY=wayland-1
Environment=XDG_RUNTIME_DIR=/run/user/root

Type=simple
ExecStart=/bin/cog --platform=wl --set-permissions=all --enable-media=true --enable-media-capabilities=true --enable-media-stream=true --enable-mediasource=true --enable-write-console-messages-to-stdout=true --allow-file-access-from-file-urls=true --allow-universal-access-from-file-urls=true https://google.com

Restart=always

[Install]
WantedBy=multi-user.target

Second way: start cog via command line, send SIGTERM

  1. From terminal 1, run WAYLAND_DISPLAY=wayland-1 XDG_RUNTIME_DIR=/run/user/root /bin/cog --platform=wl --set-permissions=all --enable-media=true --enable-media-capabilities=true --enable-media-stream=true --enable-mediasource=true --enable-write-console-messages-to-stdout=true --allow-file-access-from-file-urls=true --allow-universal-access-from-file-urls=true https://google.com
  2. From a second terminal, then do kill -s SIGTERM $(pidof cog)
  3. Observe Segmentation fault (core dumped)

Third way: start cog via command line, do CTRL+C

  1. From terminal 1, run WAYLAND_DISPLAY=wayland-1 XDG_RUNTIME_DIR=/run/user/root /bin/cog --platform=wl --set-permissions=all --enable-media=true --enable-media-capabilities=true --enable-media-stream=true --enable-mediasource=true --enable-write-console-messages-to-stdout=true --allow-file-access-from-file-urls=true --allow-universal-access-from-file-urls=true https://google.com
  2. From the same terminal, do CTRL+C
  3. Observe Segmentation fault (core dumped)

Relevant information and input files to reproduce the behavior:

  • Envars of during execution (via strings /proc/pidof cog/environ)
LANG=C.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
UPDATE_FLAG=install
WG0_IP=abcd # I trimmed out some unrelated vars
WG0_IP_CUT=abcd
WG0_PUBKEY=abcd
WG1_IP=abcd
WG1_IP_CUT=abcd
WG1_PUBKEY=abcd
USER=root
LOGNAME=root
HOME=/root
SHELL=/bin/sh
INVOCATION_ID=abcd
JOURNAL_STREAM=8:12780
SYSTEMD_EXEC_PID=1809
MEMORY_PRESSURE_WATCH=/sys/fs/cgroup/system.slice/cog.service/memory.pressure
MEMORY_PRESSURE_WRITE=abcd
WAYLAND_DISPLAY=wayland-1
XDG_RUNTIME_DIR=/run/user/root
  • Base system: [e.g. Yocto (kirkstone), Buildroot, Linux distribution and
    version, local build, ...]
    buildroot, e82217622ea4778148de82a4b77972940b5e9a9e

  • Hardware target: [e.g. rpi4 64bits]
    amd64

  • Version of the relevant components:

buildroot e82217622ea4778148de82a4b77972940b5e9a9e
wpewebkit 2.44.4
weston 14.0.0
wpebackend-fdo 1.14.3
cog 0.18.4 (WPE WebKit 2.44.4)

  • In case of a runtime error
    • systemctl restart cog
    • The command exits successfully, however monitoring logs journalctl -f we see there was a segfault
Nov 20 15:00:47 buildroot systemd[1]: Stopping WPE launcher and webapp container...
Nov 20 15:00:48 buildroot audit[1809]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1809 comm="cog" exe="/usr/bin/cog" sig=11 res=1
Nov 20 15:00:48 buildroot kernel: cog[1809]: segfault at 55d45800584c ip 00007f5abe94852f sp 00007ffd56360ce0 error 4 in libWPEBackend-fdo-1.0.so.1.9.5[a52f,7f5abe945000+9000] likely on CPU 0 (core 0, socket 0)
Nov 20 15:00:48 buildroot kernel: Code: 66 2e 0f 1f 84 00 00 00 00 00 90 48 85 f6 74 53 53 80 7e 10 00 48 89 f3 74 29 48 8b 07 c6 46 10 00 48 8b 76 18 48 85 f6 74 11 <48> 8b 78 10 5b e9 87 04 00 00 0f 1f 80 00 00 00 00 5b c3 66 0f 1f
Nov 20 15:00:48 buildroot kernel: audit: type=1701 audit(1732114848.024:199): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1809 comm="cog" exe="/usr/bin/cog" sig=11 res=1
Nov 20 15:00:48 buildroot systemd[1]: cog.service: Main process exited, code=dumped, status=11/SEGV
Nov 20 15:00:48 buildroot systemd[1]: cog.service: Failed with result 'core-dump'.
Nov 20 15:00:48 buildroot systemd[1]: Stopped WPE launcher and webapp container.
Nov 20 15:00:48 buildroot systemd[1]: cog.service: Consumed 9.205s CPU time, 180M memory peak, 88K memory swap peak.
Nov 20 15:00:48 buildroot systemd[1]: Started WPE launcher and webapp container.
Nov 20 15:00:49 buildroot cog[2008]: xkbcommon: ERROR: couldn't find a Compose file for locale "C" (mapped to "C")
Nov 20 15:00:49 buildroot cog[2008]: Could not determine the accessibility bus address
Nov 20 15:00:50 buildroot cog[2008]: <https://google.com/> Load started.
Nov 20 15:00:50 buildroot cog[2008]: <https://www.google.com/> Redirected.
Nov 20 15:00:50 buildroot cog[2008]: <https://www.google.com/> Loading...
  • A backtrace of the crash:
    • Instead of using the systemd-coredump utility I am writing the coredump to /tmp, here is the backtrace:
# gdb /usr/bin/cog core_cog.1809_1732114848
GNU gdb (GDB) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-buildroot-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/cog...

warning: Can't open file /memfd:WebKitSharedMemory (deleted) during file-backed mapping note processing

warning: Can't open file /memfd:wayland-cursor (deleted) during file-backed mapping note processing
[New LWP 1809]
[New LWP 1811]
[New LWP 1812]
[New LWP 1816]
[New LWP 1817]
[New LWP 1822]
[New LWP 2003]
[New LWP 1831]
[New LWP 1829]
[New LWP 1830]
[New LWP 1814]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/bin/cog --platform=wl --set-permissions=all --enable-media=true --enable-media'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  (anonymous namespace)::ClientBundleEGL::releaseImage (this=<optimized out>, image=0x55d1050cdf10) at ../src/view-backend-exportable-fdo-egl.cpp:255

warning: 255    ../src/view-backend-exportable-fdo-egl.cpp: No such file or directory
[Current thread is 1 (Thread 0x7f5abe045f00 (LWP 1809))]
(gdb) bt
#0  (anonymous namespace)::ClientBundleEGL::releaseImage (this=<optimized out>, image=0x55d1050cdf10) at ../src/view-backend-exportable-fdo-egl.cpp:255
#1  wpe_view_backend_exportable_fdo_egl_dispatch_release_exported_image (exportable=0x55d104ffc730, image=0x55d1050cdf10) at ../src/view-backend-exportable-fdo-egl.cpp:339
#2  0x00007f5a7002aa96 in cog_wl_platform_finalize (object=0x55d10500efc0) at ../platform/wayland/cog-platform-wl.c:2543
#3  0x00007f5ac1f9e5ac in g_object_unref (_object=0x55d10500efc0) at ../gobject/gobject.c:3938
#4  g_object_unref (_object=0x55d10500efc0) at ../gobject/gobject.c:3802
#5  0x000055d0ffddc03e in cog_launcher_dispose (object=0x55d104fec2f0) at ../launcher/cog-launcher.c:482
#6  0x00007f5ac1f9e4e0 in g_object_unref (_object=0x55d104fec2f0) at ../gobject/gobject.c:3891
#7  g_object_unref (_object=0x55d104fec2f0) at ../gobject/gobject.c:3802
#8  0x000055d0ffddac86 in glib_autoptr_clear_GApplication (_ptr=0x55d104fec2f0) at /home/PRIVATE/work/PRIVATE/PRIVATEfirmware-debug-ba/ba/output/per-package/cog/host/x86_64-buildroot-linux-gnu/sysroot/usr/include/glib-2.0/gio/gio-autocleanups.h:32
#9  glib_autoptr_cleanup_GApplication (_ptr=<synthetic pointer>) at /home/PRIVATE/work/PRIVATE/PRIVATEfirmware-debug-ba/ba/output/per-package/cog/host/x86_64-buildroot-linux-gnu/sysroot/usr/include/glib-2.0/gio/gio-autocleanups.h:32
#10 main (argc=11, argv=0x7ffd56360ee8) at ../launcher/cog.c:40

  • valgrind output of the crash:
# tail -f cog-valgrind.log 
==5440== Memcheck, a memory error detector
==5440== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==5440== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==5440== Command: /bin/cog --platform=wl --set-permissions=all --enable-media=true --enable-media-capabilities=true --enable-media-stream=true --enable-mediasource=true --enable-write-console-messages-to-stdout=true --allow-file-access-from-file-urls=true --allow-universal-access-from-file-urls=true https://google.com
==5440== Parent PID: 1
==5440== 
==5440== Warning: set address range perms: large range [0xf23c000, 0x4f23e000) (defined)
==5440== Invalid read of size 8
==5440==    at 0xE53151F: _M_ptr (unique_ptr.h:193)
==5440==    by 0xE53151F: get (unique_ptr.h:464)
==5440==    by 0xE53151F: wpe_view_backend_exportable_fdo_egl_dispatch_release_exported_image (view-backend-exportable-fdo-egl.cpp:339)
==5440==    by 0x57A54A95: cog_wl_platform_finalize (cog-platform-wl.c:2543)
==5440==    by 0xAEB95AB: g_object_unref (gobject.c:3938)
==5440==    by 0xAEB95AB: g_object_unref (gobject.c:3802)
==5440==    by 0x11003D: cog_launcher_dispose (cog-launcher.c:482)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3891)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3802)
==5440==    by 0x10EC85: glib_autoptr_clear_GApplication (gio-autocleanups.h:32)
==5440==    by 0x10EC85: glib_autoptr_cleanup_GApplication (gio-autocleanups.h:32)
==5440==    by 0x10EC85: main (cog.c:40)
==5440==  Address 0xf108470 is 0 bytes inside a block of size 16 free'd
==5440==    at 0x4848280: operator delete(void*) (vg_replace_malloc.c:1131)
==5440==    by 0x5583506: ~_WebKitWebViewBackend (WebKitWebViewBackend.cpp:51)
==5440==    by 0x5583506: webkitWebViewBackendUnref (WebKitWebViewBackend.cpp:73)
==5440==    by 0x5583506: webkitWebViewBackendUnref (WebKitWebViewBackend.cpp:69)
==5440==    by 0x5583506: void WTF::derefGPtr<_WebKitWebViewBackend>(_WebKitWebViewBackend*) (WebKitWebViewBackend.cpp:131)
==5440==    by 0x556C45C: webkit_web_view_finalize(_GObject*) (WebKitWebView.cpp:365)
==5440==    by 0xAEB95AB: g_object_unref (gobject.c:3938)
==5440==    by 0xAEB95AB: g_object_unref (gobject.c:3802)
==5440==    by 0x4864C38: cog_shell_dispose (cog-shell.c:316)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3891)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3802)
==5440==    by 0x10FFC6: cog_launcher_dispose (cog-launcher.c:463)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3891)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3802)
==5440==    by 0x10EC85: glib_autoptr_clear_GApplication (gio-autocleanups.h:32)
==5440==    by 0x10EC85: glib_autoptr_cleanup_GApplication (gio-autocleanups.h:32)
==5440==    by 0x10EC85: main (cog.c:40)
==5440==  Block was alloc'd at
==5440==    at 0x4844F3F: operator new(unsigned long) (vg_replace_malloc.c:487)
==5440==    by 0xE531415: wpe_view_backend_exportable_fdo_egl_create (view-backend-exportable-fdo-egl.cpp:325)
==5440==    by 0x57A518CD: cog_wl_platform_get_view_backend (cog-platform-wl.c:2579)
==5440==    by 0x1107AB: cog_launcher_create_view (cog-launcher.c:283)
==5440==    by 0xEA0A2A9: ffi_call_unix64 (unix64.S:104)
==5440==    by 0xEA0973F: ffi_call_int (ffi64.c:673)
==5440==    by 0xEA09E3C: ffi_call (ffi64.c:710)
==5440==    by 0xAEB4D3B: g_cclosure_marshal_generic (gclosure.c:1536)
==5440==    by 0xAEB44F7: g_closure_invoke (gclosure.c:832)
==5440==    by 0xAEC7295: signal_emit_unlocked_R.isra.0 (gsignal.c:3802)
==5440==    by 0xAECDC9C: g_signal_emit_valist (gsignal.c:3565)
==5440==    by 0xAECE73E: g_signal_emit (gsignal.c:3612)
==5440== 
==5440== 
==5440== HEAP SUMMARY:
==5440==     in use at exit: 530,205 bytes in 5,573 blocks
==5440==   total heap usage: 82,137 allocs, 76,564 frees, 12,594,299 bytes allocated
==5440== 

Expected behavior
The cog software should restart cleanly without a segfault.

Actual behavior
The software begins to restart, a segfault is detected, and systemd begins the systemd-coredump process to record the coredump.

Screenshots
n/a

Additional context
The file seems to be https://github.com/Igalia/WPEBackend-fdo/blob/master/src/view-backend-exportable-fdo-egl.cpp - I noticed some recent changes to it, so I checked, and my file and the one on the link are identical3

I rebuilt with debug symbols to get the backtrace

my rough take:

The traces in question are about the wpe_view_backend_exportable_fdo* returned by wpe_view_backend_exportable_fdo_egl_create. This structure embeds the ClientBundleEGL, and the wpe_view_backend*.

We can see that the wpe_view_backend_exportable_fdo* is created in wpe_view_backend_exportable_fdo_egl_create

We can see that the memory was freed in:

==5440==  Address 0xf108470 is 0 bytes inside a block of size 16 free'd
==5440==    at 0x4848280: operator delete(void*) (vg_replace_malloc.c:1131)
==5440==    by 0x5583506: ~_WebKitWebViewBackend (WebKitWebViewBackend.cpp:51)
==5440==    by 0x5583506: webkitWebViewBackendUnref (WebKitWebViewBackend.cpp:73)
==5440==    by 0x5583506: webkitWebViewBackendUnref (WebKitWebViewBackend.cpp:69)
==5440==    by 0x5583506: void WTF::derefGPtr<_WebKitWebViewBackend>(_WebKitWebViewBackend*) (WebKitWebViewBackend.cpp:131)
==5440==    by 0x556C45C: webkit_web_view_finalize(_GObject*) (WebKitWebView.cpp:365)
==5440==    by 0xAEB95AB: g_object_unref (gobject.c:3938)
==5440==    by 0xAEB95AB: g_object_unref (gobject.c:3802)
==5440==    by 0x4864C38: cog_shell_dispose (cog-shell.c:316)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3891)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3802)
==5440==    by 0x10FFC6: cog_launcher_dispose (cog-launcher.c:463)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3891)
==5440==    by 0xAEB94DF: g_object_unref (gobject.c:3802)
==5440==    by 0x10EC85: glib_autoptr_clear_GApplication (gio-autocleanups.h:32)
==5440==    by 0x10EC85: glib_autoptr_cleanup_GApplication (gio-autocleanups.h:32)
==5440==    by 0x10EC85: main (cog.c:40)

And we can see that later in cog_launcher_dispose, we refer to the wpe_view_backend_exportable_fdo* once more. We attempt to get exportable->clientBundle, although the wpe_view_backend_exportable_fdo* memory has already been cleaned up. Hence the reference to clientBundle is also presumably wrong..

Anyway, the invalid read happens when resolving the ClientBundleEGL, where we then attempt to run releaseImage, which dispatches to viewBackend which we know is also cleaned up..

==5440== Warning: set address range perms: large range [0xf23c000, 0x4f23e000) (defined)
==5440== Invalid read of size 8
==5440==    at 0xE53151F: _M_ptr (unique_ptr.h:193)
==5440==    by 0xE53151F: get (unique_ptr.h:464)
==5440==    by 0xE53151F: wpe_view_backend_exportable_fdo_egl_dispatch_release_exported_image (view-backend-exportable-fdo-egl.cpp:339)
==5440==    by 0x57A54A95: cog_wl_platform_finalize (cog-platform-wl.c:2543)
==5440==    by 0xAEB95AB: g_object_unref (gobject.c:3938)
==5440==    by 0xAEB95AB: g_object_unref (gobject.c:3802)
==5440==    by 0x11003D: cog_launcher_dispose (cog-launcher.c:482)

Not sure if people ran into this or not, the issue may be specific to the wl platform, but it seems there's been a fair bit of refactoring work taking place since 0.18.4, so I am wondering if authors have considered bumping the buildroot cog version?

@afk11 afk11 changed the title 0.18.4: (bug) segfault during shutdown/restart from systemd 0.18.4: (bug) segfault during cog_launcher_dispose (wayland platform) Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant