Discussion:
afaef01c00 ("x86/entry: Add STACKLEAK erasing the kernel stack .."): double fault: 0000 [#1]
kernel test robot
2018-11-09 13:23:58 UTC
Permalink
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit afaef01c001537fa97a25092d7f54d764dc7d8c1
Author: Alexander Popov <***@linux.com>
AuthorDate: Fri Aug 17 01:16:58 2018 +0300
Commit: Kees Cook <***@chromium.org>
CommitDate: Tue Sep 4 10:35:47 2018 -0700

x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls

The STACKLEAK feature (initially developed by PaX Team) has the following
benefits:

1. Reduces the information that can be revealed through kernel stack leak
bugs. The idea of erasing the thread stack at the end of syscalls is
similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel
crypto, which all comply with FDP_RIP.2 (Full Residual Information
Protection) of the Common Criteria standard.

2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712,
CVE-2010-2963). That kind of bugs should be killed by improving C
compilers in future, which might take a long time.

This commit introduces the code filling the used part of the kernel
stack with a poison value before returning to userspace. Full
STACKLEAK feature also contains the gcc plugin which comes in a
separate commit.

The STACKLEAK feature is ported from grsecurity/PaX. More information at:
https://grsecurity.net/
https://pax.grsecurity.net/

This code is modified from Brad Spengler/PaX Team's code in the last
public patch of grsecurity/PaX based on our understanding of the code.
Changes or omissions from the original code are ours and don't reflect
the original grsecurity/PaX code.

Performance impact:

Hardware: Intel Core i7-4770, 16 GB RAM

Test #1: building the Linux kernel on a single core
0.91% slowdown

Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
4.2% slowdown

So the STACKLEAK description in Kconfig includes: "The tradeoff is the
performance impact: on a single CPU system kernel compilation sees a 1%
slowdown, other systems and workloads may vary and you are advised to
test this feature on your expected workload before deploying it".

Signed-off-by: Alexander Popov <***@linux.com>
Acked-by: Thomas Gleixner <***@linutronix.de>
Reviewed-by: Dave Hansen <***@linux.intel.com>
Acked-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Kees Cook <***@chromium.org>

57361846b5 Linux 4.19-rc2
afaef01c00 x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
24ccea7e10 Merge tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
442b8cea24 Add linux-next specific files for 20181109
+---------------------------------------------------------------+-----------+------------+------------+---------------+
| | v4.19-rc2 | afaef01c00 | 24ccea7e10 | next-20181109 |
+---------------------------------------------------------------+-----------+------------+------------+---------------+
| boot_successes | 498 | 171 | 165 | 26 |
| boot_failures | 0 | 2 | 8 | 1 |
| double_fault:#[##] | 0 | 2 | 8 | |
| RIP:ftrace_ops_test | 0 | 2 | 8 | 1 |
| WARNING:stack_recursion | 0 | 2 | 8 | 1 |
| WARNING:at(____ptrval____)for_ip_syscall_return_via_sysret/0x | 0 | 2 | 8 | 1 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 2 | 8 | 1 |
+---------------------------------------------------------------+-----------+------------+------------+---------------+

[main] Setsockopt(0 8 68b000 4) on fd 376 [2:1:0]
[main] Setsockopt(29 1a 68b000 2f) on fd 377 [10:2:0]
[main] Setsockopt(1 2c 68b000 4) on fd 379 [2:1:0]
[main] Setsockopt(0 13 68b000 1) on fd 380 [10:1:0]
[main] 375 sockets created based on info from socket cachefile.
[ 127.808225] double fault: 0000 [#1]
[ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
[ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
[ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
[ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
[ 127.813991] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000800
[ 127.814802] RDX: 0000000000000000 RSI: ffffffff811c4560 RDI: ffff8800158c2d20
[ 127.815652] RBP: ffff8800158c2d20 R08: 0000000000000000 R09: 0000000000000000
[ 127.816494] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff811c4560
[ 127.817357] R13: 0000000000000000 R14: ffffffff82400160 R15: 0000000000000800
[ 127.818178] FS: 00007fac9f0de700(0000) GS:ffffffff83044000(0000) knlGS:0000000000000000
[ 127.819099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
[ 127.820583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 127.821406] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 127.822234] Call Trace:
[ 127.822530] <ENTRY_TRAMPOLINE>
[ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
[ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
[ 127.823922] ftrace_call+0x5/0x34
[ 127.824318] ? stackleak_erase+0x5/0xf0
[ 127.824789] ? stackleak_erase+0x43/0xf0
[ 127.825260] stackleak_erase+0x5/0xf0
[ 127.825699] syscall_return_via_sysret+0x61/0x81
[ 127.826238] WARNING: stack recursion on stack type 4
[ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
[ 127.826246] </ENTRY_TRAMPOLINE>
[ 127.828342] ---[ end trace e9f96d3f45575499 ]---
[ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0

# HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 651022382c7f8da46cb4872a545ee1da6d097d2a v4.19 --
git bisect good 685f7e4f161425b137056abe35ba8ef7b669d83d # 14:15 G 167 0 0 0 Merge tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good 519f64bf15dccb4f64af34b74ed186c32363ab59 # 14:45 G 157 0 0 0 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect bad 63c6e188f639b5828bf744e675270bb5e2adc139 # 14:59 B 0 1 15 0 Merge tag 'riscv-for-linus-4.20-mw3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
git bisect good 82aa4671516a3203261c835e98c3eecab10c994d # 16:03 G 158 0 0 0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect bad 34c7685a177a7bc98066f7e5daa42eef621d0bdb # 16:31 B 27 1 0 0 Merge tag 'devicetree-fixes-for-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 2d6bb6adb714b133db92ccd4bfc9c20f75f71f3f # 17:08 B 32 1 0 0 Merge tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
git bisect good 6444ccfd699cda8db5edaac7fa469d6a29aa9a47 # 18:02 G 162 0 0 0 Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu
git bisect good 7c6c54b505b8aea1782ce6a6e8f3b8297d179937 # 18:32 G 161 0 0 0 Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect bad c8d126275a5fa59394fe17109bdb9812fed296b8 # 18:55 B 10 3 0 0 fs/proc: Show STACKLEAK metrics in the /proc file system
git bisect bad 10e9ae9fabaf96c8e5227c1cd4827d58b3aa406d # 19:15 B 13 3 0 0 gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
git bisect bad afaef01c001537fa97a25092d7f54d764dc7d8c1 # 19:33 B 15 1 0 0 x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
# first bad commit: [afaef01c001537fa97a25092d7f54d764dc7d8c1] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
git bisect good 57361846b52bc686112da6ca5368d11210796804 # 20:29 G 475 0 0 0 Linux 4.19-rc2
# extra tests with debug options
git bisect bad afaef01c001537fa97a25092d7f54d764dc7d8c1 # 20:47 B 44 2 0 0 x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
# extra tests on HEAD of linux-devel/devel-spot-201811090824
git bisect bad 8992397c6e1a0adf719e7263a0c965fce4629b15 # 20:47 B 59 2 0 0 0day head guard for 'devel-spot-201811090824'
# extra tests on tree/branch linus/master
git bisect bad 24ccea7e102de8cbc93ab3befb123bbd18532be9 # 21:06 B 10 1 0 0 Merge tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
# extra tests on tree/branch linux-next/master
git bisect bad 442b8cea2477fa95c22f28ca982addb5bc6b0845 # 21:22 B 22 1 0 0 Add linux-next specific files for 20181109

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/lkp Intel Corporation
Jann Horn
2018-11-09 20:06:27 UTC
Permalink
+Andy, Thomas, Ingo
Post by kernel test robot
0day kernel testing robot got the below dmesg and the first bad commit is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit afaef01c001537fa97a25092d7f54d764dc7d8c1
AuthorDate: Fri Aug 17 01:16:58 2018 +0300
CommitDate: Tue Sep 4 10:35:47 2018 -0700
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
[...]
Post by kernel test robot
[ 127.808225] double fault: 0000 [#1]
[ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
[ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
[ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
[ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
[...]
Post by kernel test robot
[ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
[...]
Post by kernel test robot
[ 127.822530] <ENTRY_TRAMPOLINE>
[ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
[ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
[ 127.823922] ftrace_call+0x5/0x34
[ 127.824318] ? stackleak_erase+0x5/0xf0
[ 127.824789] ? stackleak_erase+0x43/0xf0
[ 127.825260] stackleak_erase+0x5/0xf0
[ 127.825699] syscall_return_via_sysret+0x61/0x81
[ 127.826238] WARNING: stack recursion on stack type 4
[ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
[ 127.826246] </ENTRY_TRAMPOLINE>
[ 127.828342] ---[ end trace e9f96d3f45575499 ]---
[ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
CR2: fffffe0000001fa8, RSP: 0018:fffffe0000001fb8; this is a pagefault
on the stack. fffffe0000000000 is CPU_ENTRY_AREA_RO_IDT;
fffffe0000001000 is CPU_ENTRY_AREA_PER_CPU; so fffffe0000002000 is the
page with the entry stack for cpu 0, and you overflowed from that into
the readonly gdt at fffffe0000001000, which doubles as a guard page
for the entry stack:

struct cpu_entry_area {
char gdt[PAGE_SIZE];

/*
* The GDT is just below entry_stack and thus serves (on x86_64) as
* a a read-only guard page.
*/
struct entry_stack_page entry_stack_page;
[...]
};

In other words: You're calling C code on the entry trampoline stack;
this C code can call into ftrace; and the entry trampoline stack isn't
big enough for ftrace shenanigans. I think you probably shouldn't be
calling C code on the entry stack, but maybe one of the X86 folks has
a different opinion?
Andy Lutomirski
2018-11-09 20:46:29 UTC
Permalink
Post by Jann Horn
+Andy, Thomas, Ingo
Post by kernel test robot
0day kernel testing robot got the below dmesg and the first bad commit is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit afaef01c001537fa97a25092d7f54d764dc7d8c1
AuthorDate: Fri Aug 17 01:16:58 2018 +0300
CommitDate: Tue Sep 4 10:35:47 2018 -0700
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
[...]
Post by kernel test robot
[ 127.808225] double fault: 0000 [#1]
[ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
[ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
[ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
[ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
[...]
Post by kernel test robot
[ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
[...]
Post by kernel test robot
[ 127.822530] <ENTRY_TRAMPOLINE>
[ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
[ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
[ 127.823922] ftrace_call+0x5/0x34
[ 127.824318] ? stackleak_erase+0x5/0xf0
[ 127.824789] ? stackleak_erase+0x43/0xf0
[ 127.825260] stackleak_erase+0x5/0xf0
[ 127.825699] syscall_return_via_sysret+0x61/0x81
[ 127.826238] WARNING: stack recursion on stack type 4
[ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
[ 127.826246] </ENTRY_TRAMPOLINE>
[ 127.828342] ---[ end trace e9f96d3f45575499 ]---
[ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
CR2: fffffe0000001fa8, RSP: 0018:fffffe0000001fb8; this is a pagefault
on the stack. fffffe0000000000 is CPU_ENTRY_AREA_RO_IDT;
fffffe0000001000 is CPU_ENTRY_AREA_PER_CPU; so fffffe0000002000 is the
page with the entry stack for cpu 0, and you overflowed from that into
the readonly gdt at fffffe0000001000, which doubles as a guard page
struct cpu_entry_area {
char gdt[PAGE_SIZE];
/*
* The GDT is just below entry_stack and thus serves (on x86_64) as
* a a read-only guard page.
*/
struct entry_stack_page entry_stack_page;
[...]
};
In other words: You're calling C code on the entry trampoline stack;
this C code can call into ftrace; and the entry trampoline stack isn't
big enough for ftrace shenanigans. I think you probably shouldn't be
calling C code on the entry stack, but maybe one of the X86 folks has
a different opinion?
My opinion was that, on x86_32, the entry stack ought to be fairly large so that NMIs could execute on the entry stack. I don’t remember what the code actually does, though.

But stackleak_erase should probably not run on the entry stack. That seems like it’s just asking for trouble.
Alexander Popov
2018-11-09 23:09:09 UTC
Permalink
Post by Andy Lutomirski
Post by Jann Horn
+Andy, Thomas, Ingo
Post by kernel test robot
0day kernel testing robot got the below dmesg and the first bad commit is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit afaef01c001537fa97a25092d7f54d764dc7d8c1
AuthorDate: Fri Aug 17 01:16:58 2018 +0300
CommitDate: Tue Sep 4 10:35:47 2018 -0700
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
[...]
Post by kernel test robot
[ 127.808225] double fault: 0000 [#1]
[ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
[ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
[ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
[ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
[...]
Post by kernel test robot
[ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
[...]
Post by kernel test robot
[ 127.822530] <ENTRY_TRAMPOLINE>
[ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
[ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
[ 127.823922] ftrace_call+0x5/0x34
[ 127.824318] ? stackleak_erase+0x5/0xf0
[ 127.824789] ? stackleak_erase+0x43/0xf0
[ 127.825260] stackleak_erase+0x5/0xf0
[ 127.825699] syscall_return_via_sysret+0x61/0x81
[ 127.826238] WARNING: stack recursion on stack type 4
[ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
[ 127.826246] </ENTRY_TRAMPOLINE>
[ 127.828342] ---[ end trace e9f96d3f45575499 ]---
[ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
CR2: fffffe0000001fa8, RSP: 0018:fffffe0000001fb8; this is a pagefault
on the stack. fffffe0000000000 is CPU_ENTRY_AREA_RO_IDT;
fffffe0000001000 is CPU_ENTRY_AREA_PER_CPU; so fffffe0000002000 is the
page with the entry stack for cpu 0, and you overflowed from that into
the readonly gdt at fffffe0000001000, which doubles as a guard page
struct cpu_entry_area {
char gdt[PAGE_SIZE];
/*
* The GDT is just below entry_stack and thus serves (on x86_64) as
* a a read-only guard page.
*/
struct entry_stack_page entry_stack_page;
[...]
};
In other words: You're calling C code on the entry trampoline stack;
this C code can call into ftrace; and the entry trampoline stack isn't
big enough for ftrace shenanigans. I think you probably shouldn't be
calling C code on the entry stack, but maybe one of the X86 folks has
a different opinion?
My opinion was that, on x86_32, the entry stack ought to be fairly large so
that NMIs could execute on the entry stack. I don’t remember what the code
actually does, though.
But stackleak_erase should probably not run on the entry stack. That seems
like it’s just asking for trouble.
Hello Jann and Andy,


The stackleak_erase() function is called on the trampoline stack at the end of
syscall, it erases the used part of the kernel thread stack after the syscall is
handled.


I've reproduced such a double fault with function tracing for stackleak_erase():

# mount -t tracefs nodev /sys/kernel/tracing
# echo 'p:myprobe stackleak_erase' > /sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable


I think we should simply not allow function tracing for stackleak_*() functions:

diff --git a/kernel/Makefile b/kernel/Makefile
index 7343b3a..0906f6d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_MULTIUSER) += groups.o
ifdef CONFIG_FUNCTION_TRACER
# Do not trace internal ftrace files
CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_stackleak.o = $(CC_FLAGS_FTRACE)
endif


With this patch setting kprobe event for stackleak_erase() is not allowed. This
is the corresponding dmesg output:
[ 75.660478] trace_kprobe: Could not probe notrace function stackleak_erase


If you agree, I'll prepare the patch for LKML.

Best regards,
Alexander
Kees Cook
2018-11-09 23:21:56 UTC
Permalink
Post by Alexander Popov
Post by Andy Lutomirski
Post by Jann Horn
+Andy, Thomas, Ingo
Post by kernel test robot
0day kernel testing robot got the below dmesg and the first bad commit is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit afaef01c001537fa97a25092d7f54d764dc7d8c1
AuthorDate: Fri Aug 17 01:16:58 2018 +0300
CommitDate: Tue Sep 4 10:35:47 2018 -0700
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
[...]
Post by kernel test robot
[ 127.808225] double fault: 0000 [#1]
[ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
[ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
[ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
[ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
[...]
Post by kernel test robot
[ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
[...]
Post by kernel test robot
[ 127.822530] <ENTRY_TRAMPOLINE>
[ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
[ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
[ 127.823922] ftrace_call+0x5/0x34
[ 127.824318] ? stackleak_erase+0x5/0xf0
[ 127.824789] ? stackleak_erase+0x43/0xf0
[ 127.825260] stackleak_erase+0x5/0xf0
[ 127.825699] syscall_return_via_sysret+0x61/0x81
[ 127.826238] WARNING: stack recursion on stack type 4
[ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
[ 127.826246] </ENTRY_TRAMPOLINE>
[ 127.828342] ---[ end trace e9f96d3f45575499 ]---
[ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
CR2: fffffe0000001fa8, RSP: 0018:fffffe0000001fb8; this is a pagefault
on the stack. fffffe0000000000 is CPU_ENTRY_AREA_RO_IDT;
fffffe0000001000 is CPU_ENTRY_AREA_PER_CPU; so fffffe0000002000 is the
page with the entry stack for cpu 0, and you overflowed from that into
the readonly gdt at fffffe0000001000, which doubles as a guard page
struct cpu_entry_area {
char gdt[PAGE_SIZE];
/*
* The GDT is just below entry_stack and thus serves (on x86_64) as
* a a read-only guard page.
*/
struct entry_stack_page entry_stack_page;
[...]
};
In other words: You're calling C code on the entry trampoline stack;
this C code can call into ftrace; and the entry trampoline stack isn't
big enough for ftrace shenanigans. I think you probably shouldn't be
calling C code on the entry stack, but maybe one of the X86 folks has
a different opinion?
My opinion was that, on x86_32, the entry stack ought to be fairly large so
that NMIs could execute on the entry stack. I don’t remember what the code
actually does, though.
But stackleak_erase should probably not run on the entry stack. That seems
like it’s just asking for trouble.
Hello Jann and Andy,
The stackleak_erase() function is called on the trampoline stack at the end of
syscall, it erases the used part of the kernel thread stack after the syscall is
handled.
# mount -t tracefs nodev /sys/kernel/tracing
# echo 'p:myprobe stackleak_erase' > /sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
diff --git a/kernel/Makefile b/kernel/Makefile
index 7343b3a..0906f6d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_MULTIUSER) += groups.o
ifdef CONFIG_FUNCTION_TRACER
# Do not trace internal ftrace files
CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_stackleak.o = $(CC_FLAGS_FTRACE)
endif
Yeah, that's what I was suspecting on IRC. This looks like the right
fix. Can you send that to me as a "regular" patch with changelog, etc,
and I'll send it up to Linus.

Reviewed-by: Kees Cook <***@chromium.org>

Thanks for everyone's attention on this! I've been travelling this
week, so I've been a little slow. :)

-Kees
Post by Alexander Popov
With this patch setting kprobe event for stackleak_erase() is not allowed. This
[ 75.660478] trace_kprobe: Could not probe notrace function stackleak_erase
If you agree, I'll prepare the patch for LKML.
Best regards,
Alexander
--
Kees Cook
Loading...