1337289 Commits

Author SHA1 Message Date
Kent Overstreet
edaed8ee8c bcachefs: BCH_JSET_ENTRY_log_bkey
Add a journal entry type for logging - but logging a bkey, not a string;
to be used for data move path debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30 18:25:12 -04:00
Kent Overstreet
2b47102b93 bcachefs: Reorder error messages that include journal debug
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30 16:36:27 -04:00
Kent Overstreet
393a05a741 bcachefs: Don't use designated initializers for disk_accounting_pos
Not all compilers fully initialize these - they're not guaranteed to
because of the union shenanigans.

Fixes: https://github.com/koverstreet/bcachefs/issues/844
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30 16:35:13 -04:00
Kent Overstreet
f548db4d31 bcachefs: Silence errors after emergency shutdown
We don't care about errors from asynchronous ops that were because we
did an emergency shutdown; silence them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30 16:35:13 -04:00
Kent Overstreet
458e2ef882 bcachefs: fix units in rebalance_status
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30 16:35:13 -04:00
Kent Overstreet
707549600c bcachefs: bch2_ioctl_subvolume_destroy() fixes
bch2_evict_subvolume_inodes() was getting stuck - due to incorrectly
pruning the dcache.

Also, fix missing permissions checks.

Reported-by: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30 16:33:22 -04:00
Kent Overstreet
b3981564ca bcachefs: Clear fs_path_parent on subvolume unlink
This fixes recursive subvolume removal.

Subvolume deletion is asynchronous; fs_path_parent, and thus the entry
in the subvolume_children btree, need to be cleared when the subvolume
is unlinked from the fs heirarchy - else we'll spuriously think a
subvolume has children and deletion will fail.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-29 20:16:49 -04:00
Kent Overstreet
63c3b8f616 bcachefs: Change btree_insert_node() assertion to error
Debug for https://github.com/koverstreet/bcachefs/issues/843

Print useful debug info and go emergency read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-29 14:24:48 -04:00
Kent Overstreet
6d77ce4a27 bcachefs: Better printing of inconsistency errors
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.

Also, add better indenting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-29 13:26:13 -04:00
Kent Overstreet
7337f9f14e bcachefs: bch2_count_fsck_err()
Factor out a helper from __bch2_fsck_err(), for counting the error in
the superblock and deciding whether to print or ratelimit - will be used
to replace some log_fsck_err() calls, where we want to lift out printing
the error message.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-29 13:26:13 -04:00
Kent Overstreet
b00750c2e5 bcachefs: Better helpers for inconsistency errors
An inconsistency error often happens as part of an event with multiple
error messages, and we want to build up one single error message with
proper indenting to produce more readable log messages that don't get
garbled.

Add new helpers that emit messages to a printbuf instead of printing
them directly, next patch will convert to use them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 22:31:47 -04:00
Kent Overstreet
1ece53237e bcachefs: Consistent indentation of multiline fsck errors
Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 22:31:47 -04:00
Kent Overstreet
a7cdf2276e bcachefs: Add an "ignore unknown" option to bch2_parse_mount_opts()
To be used by the mount helper in userspace, where we still have options
to be parsed by other layers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 22:31:47 -04:00
Kent Overstreet
daa771332e bcachefs: bch2_time_stats_init_no_pcpu()
Add a mode to disable automatic switching to percpu mode, useful when a
time_stats will only be used by one thread and we don't want to have to
flush the percpu buffers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 22:31:47 -04:00
Florian Albrechtskirchinger
7c4cb50e1a bcachefs: Fix bch2_fs_get_tree() error path
When a filesystem is mounted read-only, subsequent attempts to mount it
as read-write fail with EBUSY. Previously, the error path in
bch2_fs_get_tree() would unconditionally call __bch2_fs_stop(),
improperly freeing resources for a filesystem that was still actively
mounted. This change modifies the error path to only call
__bch2_fs_stop() if the superblock has no valid root dentry, ensuring
resources are not cleaned up prematurely when the filesystem is in use.

Signed-off-by: Florian Albrechtskirchinger <falbrechtskirchinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 13:51:09 -04:00
Kent Overstreet
6b1e0b9e18 bcachefs: fix logging in journal_entry_err_msg()
We want to log errors all at once, not spread across multiple printks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 12:36:32 -04:00
Kent Overstreet
ff4e0f7de6 bcachefs: add missing newline in bch2_trans_updates_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 12:36:32 -04:00
Kent Overstreet
35a11506a3 bcachefs: print_string_as_lines: fix extra newline
Don't print a newline on empty string; this was causing us to also print
an extra newline when we got to the end of th string.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 12:36:32 -04:00
Kent Overstreet
3c72d3eea9 bcachefs: Fix WARN() in bch2_bkey_pick_read_device()
syzbot discovered that this one is possible: we have pointers, but none
of them are to valid devices.

Reported-by: syzbot+336a6e6a2dbb7d4dba9a@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 12:35:05 -04:00
Kent Overstreet
af3d4c276a bcachefs: Don't return 0 size holes from bch2_seek_hole()
The hole we find in the btree might be fully dirty in the page cache. If
so, keep searching.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 11:30:14 -04:00
Kent Overstreet
1f4bb8254c bcachefs: Fix bch2_seek_hole() locking
We can't call bch2_seek_pagecache_hole(), and block on page locks, with
btree locks held.

This is easily fixed because we're at the end of the transaction - we
can just unlock, we don't need a drop_locks_do().

Reported-by: https://github.com/nagalun
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 11:30:14 -04:00
Kent Overstreet
2dd202dbaf bcachefs: Recovery no longer holds state_lock
state_lock guards against devices coming or leaving, changing state, or
the filesystem changing between ro <-> rw.

But it's not necessary for running recovery passes, and holding it
blocks asynchronous events that would cause us to go RO or kick out
devices.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 11:13:25 -04:00
Kent Overstreet
c6c6a39109 bcachefs: Fix permissions on version modparam
There's no reason for this not to be world readable - it provides the
currently supported on disk format version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 11:13:23 -04:00
Kent Overstreet
d0fb2a266a bcachefs: cond_resched() in journal_key_sort_cmp()
Fixes "task out to lunch" warnings during recovery on large machines
with lots of dirty data in the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-26 16:26:35 -04:00
Kent Overstreet
ef488bb5d0 bcachefs: Fix 'hung task' messages in btree node scan
btree node scan has to wait on kthread workers that scan each device,
potentially for awhile.

We would like this to be interruptible, but we may need a different
mechanism than signals for that - we've had bugs in the past where
mounts were failing due to checking for signals, and no explanation on
where they came from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-26 16:26:35 -04:00
Kent Overstreet
9314e2fb26 bcachefs: Fix btree iter flags in data move (2)
Data move -> move_get_io_opts -> bch2_get_update_rebalance_opts

requires a not_extents iterator; this fixes the path where we're walking
the extents btree and chase a reflink pointer into the reflink btree.

bch2_lookup_indirect_extent() requires working with an extents iterator
(due to peek_slot() semantics), so we implement
bch2_lookup_indirect_extent_for_move().

This is simplified because there's no need to report
indirect_extent_missing_errors here, that can be deferred until fsck or
when a user reads that data.

Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-26 16:26:35 -04:00
Kent Overstreet
19ff84b20d bcachefs: Don't unnecessarily decrypt data when moving
There's various checks for "are we going to compress this" - but we're
not going to compress if we know it's incompressible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-26 16:26:35 -04:00
Kent Overstreet
a44e4f8f00 bcachefs: Document disk accounting keys and conuters
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-26 16:26:35 -04:00
Kent Overstreet
9c893face2 bcachefs: Validate number of counters for accounting keys
We weren't checking that accounting keys have the expected number of
accounters. Originally we probably wanted to be flexible on this, but it
doesn't look like that will be required - accounting is extended by
adding new counter types, not more counters to an existing type.

This means we can drop a BUG_ON() that popped once in automated testing,
and the new validation will make that bug easier to track down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-26 16:26:35 -04:00
Kent Overstreet
e1e50a6330 bcachefs: Use print_string_as_lines() for journal stuck messages
They were being truncated, printk has a 1k limit per call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-25 11:49:46 -04:00
Kent Overstreet
a76db26a96 bcachefs: Fix duplicate checksum error messages in write path
Also, improve the message in prep_encoded_data() - it now prints
good/bad checksums, and checksum type.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-25 11:49:43 -04:00
Kent Overstreet
3ba0240a87 bcachefs: Fix silent short reads in data read retry path
__bch2_read, before calling __bch2_read_extent(), sets bvec_iter.bi_size
to "the size we can read from the current extent" with a swap, and
restores it to "the size for the total read" after the read_extent call
with another swap.

But we neglected to do the restore before the "if (ret) goto err;" -
which is a problem if we're retrying those errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-25 11:49:16 -04:00
Kent Overstreet
5af61dbd96 bcachefs: Fix nonce inconsistency in bch2_write_prep_encoded_data()
If we're moving an extent that was partially overwritten,
bch2_write_rechecksum() will trim it to the currenty live range.

If we then also want to compress it, it'll be decrypted - but the nonce
has been advanced for the overwritten start of the extent that we
dropped, and we were using the nonce we calculated before rechecksum().

Reported-by: Gabriel de Perthuis <g2p.code@gmail.com>
Fixes: 127d90d2823e ("bcachefs: bch2_write_prep_encoded_data() now returns errcode")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-25 11:40:35 -04:00
Kent Overstreet
d8bdc8daac bcachefs: Kill unnecessary bch2_dev_usage_read()
bch2_dev_usage_read() is fairly expensive, we should optimize this more.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
2adfa46734 bcachefs: btree node write errors now print btree node
It turned out a user was wondering why we were going read-only after a
write error, and he didn't realize he didn't have replication enabled -
this will make that more obvious, and we should be printing it anyways.

Link: https://www.reddit.com/r/bcachefs/comments/1jf9akl/large_data_transfers_switched_bcachefs_to_readonly/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
739200c573 bcachefs: Fix race in print_chain()
00636 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b0
00636 Mem abort info:
00636   ESR = 0x0000000096000005
00636   EC = 0x25: DABT (current EL), IL = 32 bits
00636   SET = 0, FnV = 0
00636   EA = 0, S1PTW = 0
00636   FSC = 0x05: level 1 translation fault
00636 Data abort info:
00636   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
00636   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
00636   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
00636 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000101b10000
00636 [00000000000000b0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00636 Internal error: Oops: 0000000096000005 [#1] SMP
00636 Modules linked in:
00636 CPU: 12 UID: 0 PID: 79369 Comm: cat Not tainted 6.14.0-rc6-ktest-g3783b8973ab7 #17757
00636 Hardware name: linux,dummy-virt (DT)
00636 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00636 pc : print_chain+0xb8/0x170
00636 lr : print_chain+0xa0/0x170
00636 sp : ffffff80d9c1bbb0
00636 x29: ffffff80d9c1bbb0 x28: 0000000000000002 x27: ffffff80c1be8250
00636 x26: ffffff80dd9b0000 x25: 0000000000000020 x24: 000000000000002d
00636 x23: 000000000000003c x22: ffffffc080a54518 x21: ffffff80da6e00d0
00636 x20: ffffff80da6e0170 x19: ffffff80c1a1d240 x18: 00000000ffffffff
00636 x17: 3535303937202d3c x16: 203139202d3c2035 x15: 00000000ffffffff
00636 x14: 0000000000000000 x13: ffffff80d71b63f1 x12: 0000000000000006
00636 x11: ffffffc080beb1c0 x10: 0000000000000020 x9 : 00000000000134cc
00636 x8 : 0000000000000020 x7 : 0000000000000004 x6 : 0000000000000020
00636 x5 : ffffff80d71b63f7 x4 : ffffffc080a5451b x3 : 0000000000000000
00636 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
00636 Call trace:
00636  print_chain+0xb8/0x170 (P)
00636  bch2_check_for_deadlock+0x444/0x5a0
00636  bch2_btree_deadlock_read+0xb4/0x1c8
00636  full_proxy_read+0x74/0xd8
00636  vfs_read+0x90/0x300

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
0b4fd56726 bcachefs: btree_trans_restart_foreign_task()
In debug mode, we save the call stack on transaction restart - but
there's no locking, so we can't touch it if we're issuing the restart
from another thread.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
f4a584f4bf bcachefs: bch2_disk_accounting_mod2()
We're hitting some issues with uninitialized struct padding, flagged by
kmsan.

They appear to be falso positives, otherwise bch2_accounting_validate()
would have flagged them as "junk at end". But for now, we'll need to
initialize disk_accounting_pos with memset().

This adds a new helper, bch2_disk_accounting_mod2(), that initializes a
disk_accounting_pos and does the accounting mod all at once - so overall
things actually get slightly more ergonomic.

BCH_DISK_ACCOUNTING_replicas keys are left for now; KMSAN isn't warning
about them and they're a bit special.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
5ae6f33053 bcachefs: zero init journal bios
fix a kmsan splat

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
9ea24b287b bcachefs: Eliminate padding in move_bucket_key
We appear to be tripping over a compiler/kmsan bug with padding fields -
this is an easy workaround.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
1f88c35674 bcachefs: Fix a KMSAN splat in btree_update_nodes_written()
We may sometimes read from uninitialized memory; we know, and that's ok.

We check if a btree node has been reused before waiting on any
outstanding IO.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
28aa859b6b bcachefs: kmsan asserts
Catching these early makes them a lot easier to track down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
53cf2a3daa bcachefs: Fix kmsan warnings in bch2_extent_crc_pack()
We store to all fields, so the kmsan warnings were spurious - but
initializing via stores to bitfields appear to have been giving the
compiler/kmsan trouble, and they're not necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
9c3a2c9b47 bcachefs: Disable asm memcpys when kmsan enabled
kmsan doesn't know about inline assembly, obviously; this will close a
ton of syzbot bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
962322475b bcachefs: Handle backpointers with unknown data types
New data types might be added later, so we don't want to disallow
unknown data types - that'll be a compatibility hassle later. Instead,
ignore them.

Reported-by: syzbot+3a290f5ff67ca3023834@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
6a9f681ef6 bcachefs: Count BCH_DATA_parity backpointers correctly
These are counted as stripe data in the corresponding alloc keys.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
04e90891be bcachefs: Run bch2_check_dirent_target() at lookup time
More on the "full online self healing" project:

We now run most of the dirent <-> inode consistency checks, with repair
code, at runtime - the exact same check and repair code that fsck runs.

This will allow us to repair the "dirent points to inode that does not
point back" inconsistencies that have been popping up at runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
9b0d00a369 bcachefs: Refactor bch2_check_dirent_target()
Prep work for calling bch2_check_dirent_target() from bch2_lookup().

- Add an inline wrapper, if the target and backpointer match we can skip
  the function call.

- We don't (yet?) want to remove the dirent we did the lookup from (when
  we find a directory or subvol with multiple valid dirents pointing to
  it), we can defer on that until later. For now, add an "are we in
  fsck?" parameter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
758ea4ff81 bcachefs: Move bch2_check_dirent_target() to namei.c
We're gradually running more and more fsck.c checks at runtime,
whereever applicable; when we do so they get moved out of fsck.c.

Next patch will call bch2_check_dirent_target() from bch2_lookup().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
4fcd4de0a6 bcachefs: fs-common.c -> namei.c
name <-> inode, code for managing the relationships between inodes and
dirents.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00