Read Time:9 Minute, 21 Second
In the Linux kernel, the following vulnerability has been
resolved: tls: fix use-after-free on failed backlog decryption When the
decrypt request goes to the backlog and crypto_aead_decrypt returns -EBUSY,
tls_do_decryption will wait until all async decryptions have completed. If
one of them fails, tls_do_decryption will return -EBADMSG and
tls_decrypt_sg jumps to the error path, releasing all the pages. But the
pages have been passed to the async callback, and have already been
released by tls_decrypt_done. The only true async case is when
crypto_aead_decrypt returns -EINPROGRESS. With -EBUSY, we already waited so
we can tell tls_sw_recvmsg that the data is available for immediate copy,
but we need to notify tls_decrypt_sg (via the new ->async_done flag) that
the memory has already been released.)(CVE-2024-26800)
In the Linux kernel, the following vulnerability has been
resolved: inet: inet_defrag: prevent sk release while still in use
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released. This affects skb fragments
reassembled via netfilter or similar modules, e.g. openvswitch or ct_act.c,
when run as part of tx pipeline. Eric Dumazet made an initial analysis of
this bug. Quoting Eric: Calling ip_defrag() in output path is also implying
skb_orphan(), which is buggy because output path relies on sk not
disappearing. A relevant old patch about the issue was : 8282f27449bf
(‘inet: frag: Always orphan skbs inside ip_defrag()’) [..
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an inet
socket, not an arbitrary one. If we orphan the packet in ipvlan, then
downstream things like FQ packet scheduler will not work properly. We need
to change ip_defrag() to only use skb_orphan() when really needed, ie
whenever frag_list is going to be used. Eric suggested to stash sk in
fragment queue and made an initial patch. However there is a problem with
this: If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree. IOW,
we have no choice but to fix up sk_wmem accouting to reflect the fully
reassembled skb, else wmem will underflow. This change moves the orphan
down into the core, to last possible moment. As ip_defrag_offset is aliased
with sk_buff->sk member, we must move the offset into the FRAG_CB, else
skb->sk gets clobbered. This allows to delay the orphaning long enough to
learn if the skb has to be queued or if the skb is completing the reasm
queue. In the former case, things work as before, skb is orphaned. This is
safe because skb gets queued/stolen and won’t continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to the
head skb, and fix up wmem accouting when inet_frag inflates truesize.)(CVE-2024-26921)
In the Linux kernel, the following vulnerability has been
resolved: mm: swap: fix race between free_swap_and_cache() and swapoff()
There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread. This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map. This is a
theoretical problem and I haven’t been able to provoke it from a test case.
But there has been agreement based on code review that this is possible
(see link below). Fix it by using get_swap_device()/put_swap_device(),
which will stall swapoff(). There was an extra check in _swap_info_get() to
confirm that the swap entry was not free. This isn’t present in
get_swap_device() because it doesn’t make sense in general due to the race
between getting the reference and swapoff. So I’ve added an equivalent
check directly in free_swap_and_cache(). Details of how to provoke one
possible issue (thanks to David Hildenbrand for deriving this): –8try_to_unuse() will stop as soon as soon as
si->inuse_pages==0. So the question is: could someone reclaim the folio and
turn si->inuse_pages==0, before we completed
swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the
swapcache. Only 2 subpages are still references by swap entries. Process 1
still references subpage 0 via swap entry. Process 2 still references
subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). ->
count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2
quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE Process 2
goes ahead, passes swap_page_trans_huge_swapped(), and calls
__try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()-> … WRITE_ONCE(si->inuse_pages,
si->inuse_pages – nr_entries); What stops swapoff to succeed after process
2 reclaimed the swap cache but before process1 finished its call to
swap_page_trans_huge_swapped()? –8 [ 95.890755] dump_stack_lvl+0x45/0x110 [
95.890755] print_address_description+0x78/0x390 [ 95.890755
print_report+0x11b/0x250 [ 95.890755] ? __virt_addr_valid+0xbe/0xf0 [
95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755
kasan_report+0x139/0x170 [ 95.890755] ? update_load_avg+0xe5/0x9f0 [
95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755
kasan_check_range+0x2c3/0x2e0 [ 95.890755] sco_sock_timeout+0x5e/0x1c0 [
95.890755] process_one_work+0x561/0xc50 [ 95.890755
worker_thread+0xab2/0x13c0 [ 95.890755] ? pr_cont_work+0x490/0x490 [
95.890755] kthread+0x279/0x300 [ 95.890755] ? pr_cont_work+0x490/0x490 [
95.890755] ? kthread_blkcg+0xa0/0xa0 [ 95.890755] ret_from_fork+0x34/0x60 [
95.890755] ? kthread_blkcg+0xa0/0xa0 [ 95.890755
ret_from_fork_asm+0x11/0x20 [ 95.890755] [ 95.890755] [ 95.890755
Allocated by task 506: [ 95.890755] kasan_save_track+0x3f/0x70 [ 95.890755
__kasan_kmalloc+0x86/0x90 [ 95.890755] __kmalloc+0x17f/0x360 [ 95.890755
sk_prot_alloc+0xe1/0x1a0 [ 95.890755] sk_alloc+0x31/0x4e0 [ 95.890755
bt_sock_alloc+0x2b/0x2a0 [ 95.890755] sco_sock_create+0xad/0x320 [
95.890755] bt_sock_create+0x145/0x320 [ 95.890755
__sock_create+0x2e1/0x650 [ 95.890755] __sys_socket+0xd0/0x280 [ 95.890755
__x64_sys_socket+0x75/0x80 [ 95.890755] do_syscall_64+0xc4/0x1b0 [
95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [
95.890755] Freed by task 506: [ 95.890755] kasan_save_track+0x3f/0x70 [
95.890755] kasan_save_free_info+0x40/0x50 [ 95.890755
poison_slab_object+0x118/0x180 [ 95.890755] __kasan_slab_free+0x12/0x30 [
95.890755] kfree+0xb2/0x240 [ 95.890755] __sk_destruct+0x317/0x410 [
95.890755] sco_sock_release+0x232/0x280 [ 95.890755] sock_close+0xb2/0x210
[ 95.890755] __fput+0x37f/0x770 [ 95.890755] task_work_run+0x1ae/0x210 [
95.890755] get_signal+0xe17/0xf70 [ 95.890755
arch_do_signal_or_restart+0x3f/0x520 [ 95.890755
syscall_exit_to_user_mode+0x55/0x120 [ 95.890755] do_syscall_64+0xd1/0x1b0
[ 95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [
95.890755] The buggy address belongs to the object at ffff88800c388000 [
95.890755] which belongs to the cache kmalloc-1k of size 1024 [ 95.890755
The buggy address is located 128 bytes inside of [ 95.890755] freed
1024-byte region [ffff88800c388000, ffff88800c388400) [ 95.890755] [
95.890755] The buggy address belongs to the physical page: [ 95.890755
page: refcount:1 mapcount:0 mapping:0000000000000000
index:0xffff88800c38a800 pfn:0xc388 [ 95.890755] head: order:3
entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 95.890755] ano
—truncated—)(CVE-2024-27398)
In the Linux kernel, the following vulnerability has been
resolved: watchdog: cpu5wdt.c: Fix use-after-free bug caused by
cpu5wdt_trigger When the cpu5wdt module is removing, the origin code uses
del_timer() to de-activate the timer. If the timer handler is running,
del_timer() could not stop it and will return directly. If the port region
is released by release_region() and then the timer handler
cpu5wdt_trigger() calls outb() to write into the region that is released,
the use-after-free bug will happen. Change del_timer() to
timer_shutdown_sync() in order that the timer handler could be finished
before the port region is released.)(CVE-2024-38630)
In the Linux kernel, the following vulnerability has been
resolved: exec: Fix ToCToU between perm check and set-uid/gid usage When
opening a file for exec via do_filp_open(), permission checking is done
against the file’s metadata at that moment, and on success, a file pointer
is passed back. Much later in the execve() code path, the file metadata
(specifically mode, uid, and gid) is used to determine if/how to set the
uid and gid. However, those values may have changed since the permissions
check, meaning the execution may gain unintended privileges. For example,
if a file could change permissions from executable and not set-id:
———x 1 root root 16048 Aug 7 13:16 target to set-id and non-
executable: —S—— 1 root root 16048 Aug 7 13:16 target it is possible
to gain root privileges when execution should have been disallowed. While
this race condition is rare in real-world scenarios, it has been observed
(and proven exploitable) when package managers are updating the setuid bits
of installed programs. Such files start with being world-executable but
then are adjusted to be group-exec with a set-uid bit. For example, ‘chmod
o-x,u+s target’ makes ‘target’ executable only by uid ‘root’ and gid
‘cdrom’, while also becoming setuid-root: -rwxr-xr-x 1 root cdrom 16048 Aug
7 13:16 target becomes: -rwsr-xr– 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group ‘cdrom’ membership can get
the permission to execute ‘target’ just before the chmod, and when the
chmod finishes, the exec reaches brpm_fill_uid(), and performs the setuid
to root, violating the expressed authorization of ‘only cdrom group members
can setuid to root’. Re-check that we still have execute permissions in
case the metadata has changed. It would be better to keep a copy from the
perm-check time, but until we can do that refactoring, the least-bad option
is to do a full inode_permission() call (under inode lock). It is
understood that this is safe against dead-locks, but hardly optimal.)(CVE-2024-43882)
In the Linux kernel, the following vulnerability has been
resolved: vsock/virtio: Initialization of the dangling pointer occurring in
vsk->trans During loopback communication, a dangling pointer can be created
in vsk->trans, potentially leading to a Use-After-Free condition. This
issue is resolved by initializing vsk->trans to NULL.)(CVE-2024-50264)