David Howells 4882ba7857 afs: Fix afs_server ref accounting
The current way that afs_server refs are accounted and cleaned up sometimes
cause rmmod to hang when it is waiting for cell records to be removed.  The
problem is that the cell cleanup might occasionally happen before the
server cleanup and then there's nothing that causes the cell to
garbage-collect the remaining servers as they become inactive.

Partially fix this by:

 (1) Give each afs_server record its own management timer that rather than
     relying on the cell manager's central timer to drive each individual
     cell's maintenance work item to garbage collect servers.

     This timer is set when afs_unuse_server() reduces a server's activity
     count to zero and will schedule the server's destroyer work item upon
     firing.

 (2) Give each afs_server record its own destroyer work item that removes
     the record from the cell's database, shuts down the timer, cancels any
     pending work for itself, sends an RPC to the server to cancel
     outstanding callbacks.

     This change, in combination with the timer, obviates the need to try
     and coordinate so closely between the cell record and a bunch of other
     server records to try and tear everything down in a coordinated
     fashion.  With this, the cell record is pinned until the server RCU is
     complete and namespace/module removal will wait until all the cell
     records are removed.

 (3) Now that incoming calls are mapped to servers (and thus cells) using
     data attached to an rxrpc_peer, the UUID-to-server mapping tree is
     moved from the namespace to the cell (cell->fs_servers).  This means
     there can no longer be duplicates therein - and that allows the
     mapping tree to be simpler as there doesn't need to be a chain of
     same-UUID servers that are in different cells.

 (4) The lock protecting the UUID mapping tree is switched to an
     rw_semaphore on the cell rather than a seqlock on the namespace as
     it's now only used during mounting in contexts in which we're allowed
     to sleep.

 (5) When it comes time for a cell that is being removed to purge its set
     of servers, it just needs to iterate over them and wake them up.  Once
     a server becomes inactive, its destroyer work item will observe the
     state of the cell and immediately remove that record.

 (6) When a server record is removed, it is marked AFS_SERVER_FL_EXPIRED to
     prevent reattempts at removal.  The record will be dispatched to RCU
     for destruction once its refcount reaches 0.

 (7) The AFS_SERVER_FL_UNCREATED/CREATING flags are used to synchronise
     simultaneous creation attempts.  If one attempt fails, it will abandon
     the attempt and allow another to try again.

     Note that the record can't just be abandoned when dead as it's bound
     into a server list attached to a volume and only subject to
     replacement if the server list obtained for the volume from the VLDB
     changes.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-15-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-11-dhowells@redhat.com/ # v4
2025-03-10 09:47:15 +00:00
..
2025-01-07 15:55:25 +01:00
2025-01-07 15:55:25 +01:00
2025-03-10 09:47:15 +00:00
2023-12-24 15:22:53 +00:00
2025-03-10 09:47:05 +00:00
2025-03-10 09:47:15 +00:00
2025-03-10 09:47:15 +00:00
2025-03-10 09:47:15 +00:00
2023-12-24 15:22:53 +00:00
2025-03-10 09:47:15 +00:00
2025-01-20 09:29:11 -08:00
2025-01-20 11:40:48 -08:00