1350927 Commits

Author SHA1 Message Date
Ming Lei
57ed58c132 selftests: ublk: enable zero copy for stripe target
Use io_uring vectored fixed kernel buffer for handling stripe IO.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250325135155.935398-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-02 07:07:00 -06:00
Ming Lei
1045afae4b io_uring: support vectored kernel fixed buffer
io_uring has supported fixed kernel buffer via io_buffer_register_bvec()
and io_buffer_unregister_bvec().

The vectored fixed buffer has been ready, so it is natural to support
fixed kernel buffer, one use case is ublk.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250325135155.935398-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-02 07:06:59 -06:00
Ming Lei
149974fdb8 block: add for_each_mp_bvec()
Add helper of for_each_mp_bvec() for io_uring to import fixed kernel
buffer.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250325135155.935398-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-02 07:06:59 -06:00
Ming Lei
8622b20f23 io_uring: add validate_fixed_range() for validate fixed buffer
Add helper of validate_fixed_range() for validating fixed buffer
range.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250325135155.935398-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-02 07:06:43 -06:00
Michael Ellerman
892c4e465c docs: Fix references to IBM CAPI (cxl) removal version
The IBM CAPI (cxl) driver was removed in 6.15, not 6.14.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2025-04-02 23:09:52 +11:00
Takashi Iwai
02dc9b9617 ASoC: Fixes for v6.15
A relatively large set of fixes that came in since the release, mostly
 for Qualcomm platforms.  The biggest block of fixes is the set from
 Srini which fixes various quality and glitching issues on AudioReach
 systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmftJp4ACgkQJNaLcl1U
 h9AdNgf+IX/gRsFJ3KXTeId2Vqiac1DKBSMq5xc+LcPlgUisgE3fMj83yIUiLDTH
 SAyOL0g9vP8W/VpRgwIB4P09D5zXySKk+54HsW1MoDftsh7TveGQ9QrZ6ksMGbjn
 WfKGFESJUsh3s5OAPDGhQmWtlHQl//IE5kJBVbY/SwoLFl5ag5LvpTs/ky9AREo9
 esgfQKprgaKUV9d8MDNX+4ZNAENoPuk/WC1VqVPtOCnec7SatvBURh3G9qDqtSk7
 TmoeSEQ+9MD8AYIeY+ByG+1v3qSp0LcZWKMoOuK7QQG9gmteMA/okQpaz53FY2iP
 RQ52jzbDvvblOt3KU4uJ3cYqaiLcjA==
 =yiK+
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v6.15-merge-window' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.15

A relatively large set of fixes that came in since the release, mostly
for Qualcomm platforms.  The biggest block of fixes is the set from
Srini which fixes various quality and glitching issues on AudioReach
systems.
2025-04-02 14:02:41 +02:00
Florian Fainelli
e19c1272c8
spi: bcm2835: Restore native CS probing when pinctrl-bcm2835 is absent
The lookup table forces the use of the "pinctrl-bcm2835" GPIO chip
provider and essentially assumes that there is going to be such a
provider, and if not, we will fail to set-up the SPI device.

While this is true on Raspberry Pi based systems (2835/36/37, 2711,
2712), this is not true on 7712/77122 Broadcom STB systems which use the
SPI driver, but not the GPIO driver.

There used to be an early check:

       chip = gpiochip_find("pinctrl-bcm2835", chip_match_name);
       if (!chip)
               return 0;

which would accomplish that nicely, bring something similar back by
checking for the compatible strings matched by the pinctrl-bcm2835.c
driver, if there is no Device Tree node matching those compatible
strings, then we won't find any GPIO provider registered by the
"pinctrl-bcm2835" driver.

Fixes: 21f252cd29f0 ("spi: bcm2835: reduce the abuse of the GPIO API")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250401233603.2938955-1-florian.fainelli@broadcom.com
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-02 12:55:32 +01:00
Takashi Iwai
8983dc1b66 ALSA: hda/realtek: Fix built-in mic on another ASUS VivoBook model
There is another VivoBook model which built-in mic got broken recently
by the fix of the pin sort.  Apply the correct quirk
ALC256_FIXUP_ASUS_MIC_NO_PRESENCE to this model for addressing the
regression, too.

Fixes: 3b4309546b48 ("ALSA: hda: Fix headset detection failure due to unstable sort")
Closes: https://lore.kernel.org/Z95s5T6OXFPjRnKf@eldamar.lan
Link: https://patch.msgid.link/20250402074208.7347-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-04-02 09:43:23 +02:00
Kailang Yang
22c7f77247 ALSA: hda/realtek - Support mute led function for HP platform
This patch was integrated CS Amp and support mute led function for HP platform.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Link: https://lore.kernel.org/2c960ab58b4d4090ad4ee075f8cfdffd@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-04-02 09:27:50 +02:00
Namjae Jeon
c8b5b7c5da ksmbd: fix null pointer dereference in alloc_preauth_hash()
The Client send malformed smb2 negotiate request. ksmbd return error
response. Subsequently, the client can send smb2 session setup even
thought conn->preauth_info is not allocated.
This patch add KSMBD_SESS_NEED_SETUP status of connection to ignore
session setup request if smb2 negotiate phase is not complete.

Cc: stable@vger.kernel.org
Tested-by: Steve French <stfrench@microsoft.com>
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-26505
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 23:02:20 -05:00
Linus Torvalds
acc4d5ff0b Rather tiny PR, mostly so that we can get into our trees your fix
to the x86 Makefile.
 
 Current release - regressions:
 
  - Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc",
    error queue accounting was missed
 
 Current release - new code bugs:
 
  - 5 fixes for the netdevice instance locking work
 
 Previous releases - regressions:
 
  - usbnet: restore usb%d name exception for local mac addresses
 
 Previous releases - always broken:
 
  - rtnetlink: allocate vfinfo size for VF GUIDs when supported,
    avoid spurious GET_LINK failures
 
  - eth: mana: Switch to page pool for jumbo frames
 
  - phy: broadcom: Correct BCM5221 PHY model detection
 
 Misc:
 
  - selftests: drv-net: replace helpers for referring to other files
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfrLiQACgkQMUZtbf5S
 IrvDsg//VH5VmkMau/TUF1WpwA03Wb/X0UqtM72lufVCMGYCogqjHZzs9popfhuu
 CCi4ZiBofEJHfN9WYJoumUL8aOdux0Q875o7h5gvfqKCokkUbJDk3W32QiLj1RqZ
 L14TorNOyS9Tg9m60pLa/6H3WS0kduhUGLuLzgTmOtT/YVJrOGmBtk/tRXFAKclH
 f3CPucvZGS5Fr4HfUc3yWXiocjibDJnv+jE+xW6S6YHpghvsK7anUvqIGTqQV8Vp
 s6AuvFULGO00968hFHcO0N8f8MnaCCJr2bcHCOXyjssdEbdvDOqzhFN4KhxWEPbK
 kCl3rLkPdkXo+ekOC7gIxXXaKVz3IVm28pegtEws8fda/iLuqZTp+BKo7kdrf3Iy
 br0rP/iK3eFN0M1XpVUIbmEuJ6VamztCzK88uvDKdI+Ol3GLAfy9v5NmkbzJI+aE
 cw+SyE6NgbeDeHBxvOu2F7G2sWMBkTEGaHMNXCv7I/VAvQsbk48onTVnpA+GlFD5
 vFqMxiZHLBUfFOfUcHxmw8KAkZ44pc1xEkpw/4s8GZOfq+1oWz2LrSQ8M8Hjs2VN
 NTuE44OOsBDPOJ54iDAIOr5jyF+ZDeWEuPbvWbHGri80gII+2iF2788N2ToyAkyR
 R0t4VtJwXF+8D6IxTG41OIvAuq9Zc8AqM6O7VnOyFhZtKqezBTI=
 =BDoM
 -----END PGP SIGNATURE-----

Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Rather tiny pull request, mostly so that we can get into our trees
  your fix to the x86 Makefile.

  Current release - regressions:

   - Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc", error
     queue accounting was missed

  Current release - new code bugs:

   - 5 fixes for the netdevice instance locking work

  Previous releases - regressions:

   - usbnet: restore usb%d name exception for local mac addresses

  Previous releases - always broken:

   - rtnetlink: allocate vfinfo size for VF GUIDs when supported, avoid
     spurious GET_LINK failures

   - eth: mana: Switch to page pool for jumbo frames

   - phy: broadcom: Correct BCM5221 PHY model detection

  Misc:

   - selftests: drv-net: replace helpers for referring to other files"

* tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits)
  Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc"
  bnxt_en: bring back rtnl lock in bnxt_shutdown
  eth: gve: add missing netdev locks on reset and shutdown paths
  selftests: mptcp: ignore mptcp_diag binary
  selftests: mptcp: close fd_in before returning in main_loop
  selftests: mptcp: fix incorrect fd checks in main_loop
  mptcp: fix NULL pointer in can_accept_new_subflow
  octeontx2-af: Free NIX_AF_INT_VEC_GEN irq
  octeontx2-af: Fix mbox INTR handler when num VFs > 64
  net: fix use-after-free in the netdev_nl_sock_priv_destroy()
  selftests: net: use Path helpers in ping
  selftests: net: use the dummy bpf from net/lib
  selftests: drv-net: replace the rpath helper with Path objects
  net: lapbether: use netdev_lockdep_set_classes() helper
  net: phy: broadcom: Correct BCM5221 PHY model detection
  net: usb: usbnet: restore usb%d name exception for local mac addresses
  net/mlx5e: SHAMPO, Make reserved size independent of page size
  net: mana: Switch to page pool for jumbo frames
  MAINTAINERS: Add dedicated entries for phy_link_topology
  net: move replay logic to tc_modify_qdisc
  ...
2025-04-01 20:00:51 -07:00
Linus Torvalds
3491aa0478 VFIO updates for v6.15-rc1
- Relax IGD support code to match display class device rather than
    specifically requiring a VGA device. (Tomita Moeko)
 
  - Accelerate DMA mapping of device MMIO by iterating at PMD and PUD
    levels to take advantage of huge pfnmap support added in v6.12.
    (Alex Williamson)
 
  - Extend virtio vfio-pci variant driver to include migration support
    for block devices where enabled by the PF. (Yishai Hadas)
 
  - Virtualize INTx PIN register for devices where the platform does
    not route legacy PCI interrupts for the device and the interrupt
    is reported as IRQ_NOTCONNECTED. (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmfq5nEbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsi3KAP/2MQcQaKTZ6/+dG6YdKT
 ZFaY4+xJ14DnUN/z96UlIWLk8bWgSyDFxdoFMbtFGENKRslEWxZ7In9Caow7f6ux
 7/usBjSvJa5Yx9YWRGsblrx7IyYfSW6R1V+jH3xPd+K8Ir4K7SUvb1CJLVPdfEYh
 OWer8eRpZ5tw3R2X4o+QxZ+H4Fx1zVQourW35h4daqrjnn7kOQMJIzGYOwHSDlCy
 lW0X0yD3sGgw9w7qAmEDmw9UbKGf245AVylIl5T1a7c3RaO+eKdKPZfNa18g0J/Q
 5pRMK+2PvZ+S0OTYxotcF9GtEJ3iBxY8W4QnlLiyTs9XyZ7tLMzGvLEKmCDKA0U8
 yAtoJ5T00PVXjMxkZx1+oMGja9Hx+b7gABTYpbf5wRtab6EdNWln++I1HCLKgZZ+
 yStvQNsMYGbJsLfwiGouMwD24JT+xg3A+Dv2Cx+Ai4NVJebxTD8Lhc0lz2I6IpOh
 wFBpBzBIPpcG53oQ1Syb9GLESQ0Acb4LUMjsSxIg7QFSrWgAAlq/PiLXv852S3xJ
 pUEh7r/YByQytUsQajgE7ekKqyXw0gn99Z+UTk0LUIq/y7SxrIPeqzq2qRf490RV
 wnkOrMxrAWj84lkIv8hLCiLsXMmvV4rsMJgV+s8KQZ+hPv38mPbkFYGdHj4h5RPA
 5J5h32dDkHLK+X5u//gBY8xh
 =fJPO
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Relax IGD support code to match display class device rather than
   specifically requiring a VGA device (Tomita Moeko)

 - Accelerate DMA mapping of device MMIO by iterating at PMD and PUD
   levels to take advantage of huge pfnmap support added in v6.12
   (Alex Williamson)

 - Extend virtio vfio-pci variant driver to include migration support
   for block devices where enabled by the PF (Yishai Hadas)

 - Virtualize INTx PIN register for devices where the platform does not
   route legacy PCI interrupts for the device and the interrupt is
   reported as IRQ_NOTCONNECTED (Alex Williamson)

* tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Handle INTx IRQ_NOTCONNECTED
  vfio/virtio: Enable support for virtio-block live migration
  vfio/type1: Use mapping page mask for pfnmaps
  mm: Provide address mask in struct follow_pfnmap_args
  vfio/type1: Use consistent types for page counts
  vfio/type1: Use vfio_batch for vaddr_get_pfns()
  vfio/type1: Convert all vaddr_get_pfns() callers to use vfio_batch
  vfio/type1: Catch zero from pin_user_pages_remote()
  vfio/pci: match IGD devices in display controller class
2025-04-01 19:35:19 -07:00
Linus Torvalds
4b98d5dcd1 virtio: features, fixes, cleanups
A small number of improvements all over the place:
 
 shutdown has been reworked to reset devices.
 virtio fs is now allowed in vduse.
 vhost-scsi memory use has been reduced.
 
 cleanups, fixes all over the place.
 
 A couple more fixes are being tested and will be merged after rc1.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmfqw70PHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpnHwIAL+3kc7Na6iFu995K7iVaqPf5l+xqflvLXDU
 bdm+oxz7SPlM39qgX3LwaRW8aILH4+bQ5zP8f0O16Ou+1Hraf+BZT0cah7zjPMdn
 WRE/zsHaNrXNKuPERI2lG5v26NhD9UIqpufutBjKu1OVQ55kt/20Rt/HlyWjCzU8
 /gROCs5nlUr47+PRLQraRMA6hKpLjVaIEBLLlWVnB1nr9TQZmtVwn742RFu5urr+
 wosaCSq0JCFdxtoUhHHGUhO1rq+3LO3kPpuKE5AeBJUqVo9IMwZVfcK7s+5zTECb
 FylXu3vNUNWG5Cr2lnzONhkjg2i0BXUTHDz/iCmcMQks9cdojUE=
 =DbWQ
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "A small number of improvements all over the place:

   - shutdown has been reworked to reset devices

   - virtio fs is now allowed in vduse

   - vhost-scsi memory use has been reduced

   - cleanups, fixes all over the place

  A couple more fixes are being tested and will be merged after rc1"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-scsi: Reduce response iov mem use
  vhost-scsi: Allocate iov_iter used for unaligned copies when needed
  vhost-scsi: Stop duplicating se_cmd fields
  vhost-scsi: Dynamically allocate scatterlists
  vhost-scsi: Return queue full for page alloc failures during copy
  vhost-scsi: Add better resource allocation failure handling
  vhost-scsi: Allocate T10 PI structs only when enabled
  vhost-scsi: Reduce mem use by moving upages to per queue
  vduse: add virtio_fs to allowed dev id
  sound/virtio: Fix cancel_sync warnings on uninitialized work_structs
  vdpa/mlx5: Fix oversized null mkey longer than 32bit
  vdpa/mlx5: Fix mlx5_vdpa_get_config() endianness on big-endian machines
  vhost-scsi: Fix handling of multiple calls to vhost_scsi_set_endpoint
  tools: virtio/linux/module.h add MODULE_DESCRIPTION() define.
  tools: virtio/linux/compiler.h: Add data_race() define.
  tools/virtio: Add DMA_MAPPING_ERROR and sg_dma_len api define for virtio test
  virtio: break and reset virtio devices on device_shutdown()
2025-04-01 18:52:54 -07:00
Linus Torvalds
48552153cf iommufd 6.15 merge window pull
Two significant new items:
 
 - Allow reporting IOMMU HW events to userspace when the events are clearly
   linked to a device. This is linked to the VIOMMU object and is intended to
   be used by a VMM to forward HW events to the virtual machine as part of
   emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
   mechanism. Like the existing fault events the data is delivered through
   a simple FD returning event records on read().
 
 - PASID support in VFIO. "Process Address Space ID" is a PCI feature that
   allows the device to tag all PCI DMA operations with an ID. The IOMMU
   will then use the ID to select a unique translation for those DMAs. This
   is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to
   manage each PASID entry. The support is generic so any VFIO user could
   attach any translation to a PASID, and the support should work on ARM
   SMMUv3 as well. AMD requires additional driver work.
 
 Some minor updates, along with fixes:
 
 - Prevent using nested parents with fault's, no driver support today
 
 - Put a single "cookie_type" value in the iommu_domain to indicate what
   owns the various opaque owner fields
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ+q6NgAKCRCFwuHvBreF
 YZ3zAQDbl4/Z0O+CLN2AXq4Zeiyq1HTSoF94hzqmm7lQ17zTIwD8CCdyLXHvupaq
 tkBIv5IovpaxlrSk6M0kh2K8vPCk9Qk=
 =CIM3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Two significant new items:

   - Allow reporting IOMMU HW events to userspace when the events are
     clearly linked to a device.

     This is linked to the VIOMMU object and is intended to be used by a
     VMM to forward HW events to the virtual machine as part of
     emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
     mechanism. Like the existing fault events the data is delivered
     through a simple FD returning event records on read().

   - PASID support in VFIO.

     The "Process Address Space ID" is a PCI feature that allows the
     device to tag all PCI DMA operations with an ID. The IOMMU will
     then use the ID to select a unique translation for those DMAs. This
     is part of Intel's vIOMMU support as VT-D HW requires the
     hypervisor to manage each PASID entry.

     The support is generic so any VFIO user could attach any
     translation to a PASID, and the support should work on ARM SMMUv3
     as well. AMD requires additional driver work.

  Some minor updates, along with fixes:

   - Prevent using nested parents with fault's, no driver support today

   - Put a single "cookie_type" value in the iommu_domain to indicate
     what owns the various opaque owner fields"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits)
  iommufd: Test attach before detaching pasid
  iommufd: Fix iommu_vevent_header tables markup
  iommu: Convert unreachable() to BUG()
  iommufd: Balance veventq->num_events inc/dec
  iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()
  iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
  iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
  ida: Add ida_find_first_range()
  iommufd/selftest: Add coverage for iommufd pasid attach/detach
  iommufd/selftest: Add test ops to test pasid attach/detach
  iommufd/selftest: Add a helper to get test device
  iommufd/selftest: Add set_dev_pasid in mock iommu
  iommufd: Allow allocating PASID-compatible domain
  iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  iommufd: Enforce PASID-compatible domain for RID
  iommufd: Support pasid attach/replace
  iommufd: Enforce PASID-compatible domain in PASID path
  iommufd/device: Add pasid_attach array to track per-PASID attach
  ...
2025-04-01 18:03:46 -07:00
Linus Torvalds
792b8307ec - A single fix making sure the EDAC subtree is included in the
documentation table of contents
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfsWe8ACgkQEsHwGGHe
 VUqBKxAAtsYfH5r5wCMl02uYbd2SpIpFv5kN7yByNUdnbFTElD93pAzHvqAhPr8a
 Jaj3rOLDnqbf1bxwkI7tlD+vWOwRxSCSg/8MhbSS2J8teJPCnPt9YF4/Pkjgl35W
 SbdIPdRV4UOyWpSHAjJD+ldOyP1U6oFkIqSGcG+wJrKA2QQE1yw+eNgITWLbZokc
 zjkcwMHt0z72G3+M3ySc+4DjWO5p00OLo5IS5QaoFUKL5nvV8wIU+0fgVWl8dnRS
 +qzJljFMV2EFv9gUss8WEJk6jsKgdIW7uNAPHbKp9jmDlANwiQhjFAB/7+ORCxy0
 ekzfL+6bODG0FUeEK7KtAjr6DDU1rvPWkpEE5qDj7Bo9+apisJb/+oRzGOc5sp5+
 PkpdBxvlIywfbT5Y9RDmNRfevlw5QaS0J5rGQqxUHQa9aGFmTspWh4nBgh1JfrhN
 tu0OSDTSRsk7VIjrZzrItJfe0Q3/oeNp3NWSD2YgvRflK45OdOMW6jgiVDNMRrz4
 970Qy0zXZf/+J/8RC5SL+QAsvNr2toI0e+CiTy+R6YvYgKKI3w9Gbqcllz9MGakw
 QXqj4FItfmtkfsJD6SChAJybQaLLFEPY+jaSjt4yLuSrFxgLO2jA8o3lhh7E2QM3
 7DMDoDsy5Ofq6q/1K0Wg4OcHalQVKQTeTNHj0OPkaetd7rtoD6M=
 =5eoC
 -----END PGP SIGNATURE-----

Merge tag 'edac_urgent_for_v6.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC documentation fix from Borislav Petkov:

 - A single fix making sure the EDAC subtree is included in the
   documentation table of contents

* tag 'edac_urgent_for_v6.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  Documentation/EDAC: Fix warning document isn't included in any toctree
2025-04-01 17:46:45 -07:00
Michael Ellerman
64f7efb0f5 Merge branch 'topic/cxl' into next
This merges in the removal of the IBM CAPI "cxl" driver.
2025-04-02 11:07:44 +11:00
Linus Torvalds
8868485d6b More thermal control updates for 6.15-rc1
- Use dev_err_probe() helpers to simplify the init code in the Qoriq
    thermal driver (Frank Li).
 
  - Power down the Qoriq's TMU at suspend time (Alice Guo).
 
  - Add ipq5332, ipq5424 compatible to the QCom's tsens thermal driver
    and TSENS enable / calibration support for V2 (Praveenkumar I).
 
  - Add missing rk3328 mapping entry (Trevor Woerner).
 
  - Remove duplicate struct declaration from the thermal core header
    file (Xueqin Luo).
 
  - Disable the monitoring mode during suspend in the LVTS Mediatek
    driver to prevent temperature acquisition glitches (Nícolas F. R. A.
    Prado).
 
  - Disable Stage 3 thermal threshold in the LVTS Mediatek driver
    because it disables the suspend ability and does not have an
    interrupt handler (Nícolas F. R. A. Prado).
 
  - Fix low temperature offset interrupt in the LVTS Mediatek driver
    to prevent multiple interrupts from triggering when the system is at
    its normal functionning temperature (Nícolas F. R. A. Prado).
 
  - Enable interrupts in the LVTS Mediatek driver only on sensors that
    are in use (Nícolas F. R. A. Prado).
 
  - Add the BCM74110 compatible DT binding and the corresponding code
    to support a chip based on a different process node than previous
    chips (Florian Fainelli).
 
  - Correct indentation and style in DTS example (Krzysztof Kozlowski).
 
  - Unify hexadecimal annotatation in the rcar_gen3 driver (Niklas
    Söderlund).
 
  - Factor out the code logic to read fuses on Gen3 and Gen4 in the
    rcar_gen3 thermal driver (Niklas Söderlund).
 
  - Drop unused driver data from the QCom's spmi temperature alarm
    driver (Johan Hovold).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfq7sQSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO15hAH/imWw7djjvVup1ZDxTHICUHbAjS+QwmT
 EZr5zy6SSHlnQ/h5eWwCziXBEYY04Ra3qXn/H8cqVKcgeSVdYJQ49adKTK7V0/5O
 eszOGQQDtog0xeIYavLDcXFmN3jVa7BF6Zq6XQrezw6HELyWH14EYVVZ4qvgqFvt
 jXiMdMzkzfQ/TYkso69lMK+asX0M9V/aRtf4Fzw4WH5qTmkqNkoSLOY7yPidwrYy
 t4zXhTWpDmzB0mRaljaZ+Z9rDbyTMe8cRL+K9+PLGLVhPhU2DpVTvAEGs5w4IaoB
 Nja+QixI8lOi/m1ErqqeduMtT4Rk4FC6T4uC58tMzLYLYY96snEg/EY=
 =5IMD
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more thermal control updates from Rafael Wysocki:
 "These are mostly assorted updates of thermal drivers used on ARM
  platforms:

   - Use dev_err_probe() helpers to simplify the init code in the Qoriq
     thermal driver (Frank Li)

   - Power down the Qoriq's TMU at suspend time (Alice Guo)

   - Add ipq5332, ipq5424 compatible to the QCom's tsens thermal driver
     and TSENS enable / calibration support for V2 (Praveenkumar I)

   - Add missing rk3328 mapping entry (Trevor Woerner)

   - Remove duplicate struct declaration from the thermal core header
     file (Xueqin Luo)

   - Disable the monitoring mode during suspend in the LVTS Mediatek
     driver to prevent temperature acquisition glitches (Nícolas F. R.
     A. Prado)

   - Disable Stage 3 thermal threshold in the LVTS Mediatek driver
     because it disables the suspend ability and does not have an
     interrupt handler (Nícolas F. R. A. Prado)

   - Fix low temperature offset interrupt in the LVTS Mediatek driver to
     prevent multiple interrupts from triggering when the system is at
     its normal functionning temperature (Nícolas F. R. A. Prado)

   - Enable interrupts in the LVTS Mediatek driver only on sensors that
     are in use (Nícolas F. R. A. Prado)

   - Add the BCM74110 compatible DT binding and the corresponding code
     to support a chip based on a different process node than previous
     chips (Florian Fainelli)

   - Correct indentation and style in DTS example (Krzysztof Kozlowski)

   - Unify hexadecimal annotatation in the rcar_gen3 driver (Niklas
     Söderlund)

   - Factor out the code logic to read fuses on Gen3 and Gen4 in the
     rcar_gen3 thermal driver (Niklas Söderlund)

   - Drop unused driver data from the QCom's spmi temperature alarm
     driver (Johan Hovold)"

* tag 'thermal-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal/drivers/qcom-spmi-temp-alarm: Drop unused driver data
  thermal: rcar_gen3: Reuse logic to read fuses on Gen3 and Gen4
  thermal: rcar_gen3: Use lowercase hex constants
  dt-bindings: thermal: Correct indentation and style in DTS example
  thermal/drivers/brcmstb_thermal: Add support for BCM74110
  dt-bindings: thermal: Update for BCM74110
  thermal/drivers/mediatek/lvts: Only update IRQ enable for valid sensors
  thermal/drivers/mediatek/lvts: Start sensor interrupts disabled
  thermal/drivers/mediatek/lvts: Disable low offset IRQ for minimum threshold
  thermal/drivers/mediatek/lvts: Disable Stage 3 thermal threshold
  thermal/drivers/mediatek/lvts: Disable monitor mode during suspend
  thermal: core: Remove duplicate struct declaration
  thermal/drivers/rockchip: Add missing rk3328 mapping entry
  thermal/drivers/tsens: Add TSENS enable and calibration support for V2
  dt-bindings: thermal: tsens: Add ipq5332, ipq5424 compatible
  thermal/drivers/qoriq: Power down TMU on system suspend
  thermal/drivers/qoriq: Use dev_err_probe() simplify the code
2025-04-01 16:51:44 -07:00
Linus Torvalds
1df7752800 I3C for 6.15
Core:
  - Fix a possible NULL pointer dereference due to IBI coming when the target
    driver is not yet probed.
 
 Drivers:
  - mipi-i3c-hci: Use I2C DMA-safe api
  - svc: add Nuvoton npcm845 support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmfsY9QACgkQY6TcMGxw
 OjKS1g//aSPDC6TjvHLlm2AZnbTmbf0Z5skABHeL1043g5Q5tutTw/+4NtB/qOEe
 +q8gLpdjU3Vm1npKKRX5t+ywWgNRV9Sq7FaBMDz2USAPMTMKjKCEatx/6p2/zaai
 DX6ZHoVuVin35GcK3xszQ/SdY2saH+J0fFoX2ab7taLjbNt10GOFGzRyM6GErmW5
 Zn9gOkD9SudYnjfnpfdHLMFA4MhTqhTKGEH1fZVRbahR9dc21xkf75+fWe2ggqmT
 O3H7Vdf8OFihMhESjZ9XX6HMf8o9gkmR/XJt430daA1++XwGDciTGDXD9LrYwgU5
 fJI36hN5W2kTgAwIQJE/fCock0WhMpnIOQEethX/9QVNNPqLyHGBPk4zXyAPmpan
 O0eQueMLZoMxL/q4iThMDWWQAO5U91YOCeQ1eFha1dVGu/njrav3H+t0yZTqUu9I
 BOhdnk7g3FehR/aRADjTeA7VJpS3LPFVznpXKF1sx6OfqGPdeTJi8+5uRibdVfVS
 kRQG8DWuvpkvSeIOHCWHW/N6G0FML/hNKGqXFmVZsMrzwG7A5oCZ6irJiRorKc94
 aO3Q4rZvR2tFoEtDgHQAOMCfMos/KbtJZ+d7bPQr+aCyA67tbC+eXD5/K6+kgVc+
 l9ZIqMeWQfH8omg6drrgfjqCcDAly2Yga7I4NoWTLdYo2VjPWXA=
 =pT5I
 -----END PGP SIGNATURE-----

Merge tag 'i3c/for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull i3c updates from Alexandre Belloni:
 "The silvaco driver gets support for the integration of the IP in the
  Nuvoton npcm845 SoC. There is also a fix for a possible NULL pointer
  dereference that can happen with early IBIs. Summary:

  Core:

    - Fix a possible NULL pointer dereference due to IBI coming when the
      target driver is not yet probed.

  Drivers:

    - mipi-i3c-hci: Use I2C DMA-safe api

    - svc: add Nuvoton npcm845 support"

* tag 'i3c/for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: Add NULL pointer check in i3c_master_queue_ibi()
  i3c: master: Drop duplicate check before calling OF APIs
  i3c: master: svc: Fix implicit fallthrough in svc_i3c_master_ibi_work()
  i3c: master: svc: Fix missing STOP for master request
  i3c: master: svc: Use readsb helper for reading MDB
  i3c: master: svc: Fix missing the IBI rules
  i3c: master: svc: Fix i3c_master_get_free_addr return check
  i3c: master: svc: Fix npcm845 DAA process corruption
  i3c: master: svc: Fix npcm845 invalid slvstart event
  i3c: master: svc: Fix npcm845 FIFO empty issue
  i3c: master: svc: Add support for Nuvoton npcm845 i3c
  dt-bindings: i3c: silvaco: Add npcm845 compatible string
  dt-bindings: i3c: dw: Add power-domains
  i3c: master: svc: Flush FIFO before sending Dynamic Address Assignment(DAA)
  i3c: mipi-i3c-hci: Use I2C DMA-safe api
  i3c: Remove the const qualifier from i2c_msg pointer in i2c_xfers API
  MAINTAINERS: Add Frank Li to Silvaco I3C
  MAINTAINERS: Remove Conor Culhane from Silvaco I3C
2025-04-01 16:47:47 -07:00
Linus Torvalds
696c45bcc3 linux-watchdog 6.15-rc1 tag
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iEYEABECAAYFAmfpF+QACgkQ+iyteGJfRsrwWwCfRGcEokqFsbjVOKGcE8HKISMp
 hnMAoL7AGOrZdcgYYVlebyMvcMjMGb3z
 =YR76
 -----END PGP SIGNATURE-----

Merge tag 'linux-watchdog-6.15-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

 - Add watchdog driver for Lenovo SE30 platform

 - Add support for Allwinner A523

 - Add i.MX94 support

 - watchdog framework: Convert to use device property

 - renesas,wdt: Document RZ/G3E support

 - Various other fixes and improvemenents

* tag 'linux-watchdog-6.15-rc1' of git://www.linux-watchdog.org/linux-watchdog:
  watchdog: sunxi_wdt: Add support for Allwinner A523
  dt-bindings: watchdog: sunxi: add Allwinner A523 compatible string
  watchdog: aspeed: fix 64-bit division
  watchdog: npcm: Remove unnecessary NULL check before clk_prepare_enable/clk_disable_unprepare
  dt-bindings: watchdog: renesas,wdt: Document RZ/G3E support
  watchdog: Convert to use device property
  watchdog: lenovo_se30_wdt: include io.h for devm_ioremap()
  dt-bindings: watchdog: fsl-imx7ulp-wdt: Add i.MX94 support
  watchdog: nic7018_wdt: tidy up ACPI ID table
  watchdog: s3c2410_wdt: Fix PMU register bits for ExynosAutoV920 SoC
  watchdog: lenovo_se30_wdt: Watchdog driver for Lenovo SE30 platform
  watchdog: Enable RZV2HWDT driver depend on ARCH_RENESAS
  watchdog: cros-ec: Add newlines to printks
  watchdog: aspeed: Update bootstatus handling
2025-04-01 16:33:36 -07:00
Florian Fainelli
d669101052
spi: bcm2835: Do not call gpiod_put() on invalid descriptor
If we are unable to lookup the chip-select GPIO, the error path will
call bcm2835_spi_cleanup() which unconditionally calls gpiod_put() on
the cs->gpio variable which we just determined was invalid.

Fixes: 21f252cd29f0 ("spi: bcm2835: reduce the abuse of the GPIO API")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250401224238.2854256-1-florian.fainelli@broadcom.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-02 00:04:40 +01:00
T Pratham
8b46fdaea8 lib: scatterlist: fix sg_split_phys to preserve original scatterlist offsets
The split_sg_phys function was incorrectly setting the offsets of all
scatterlist entries (except the first) to 0.  Only the first scatterlist
entry's offset and length needs to be modified to account for the skip. 
Setting the rest entries' offsets to 0 could lead to incorrect data
access.

I am using this function in a crypto driver that I'm currently developing
(not yet sent to mailing list).  During testing, it was observed that the
output scatterlists (except the first one) contained incorrect garbage
data.

I narrowed this issue down to the call of sg_split().  Upon debugging
inside this function, I found that this resetting of offset is the cause
of the problem, causing the subsequent scatterlists to point to incorrect
memory locations in a page.  By removing this code, I am obtaining
expected data in all the split output scatterlists.  Thus, this was indeed
causing observable runtime effects!

This patch removes the offending code, ensuring that the page offsets in
the input scatterlist are preserved in the output scatterlist.

Link: https://lkml.kernel.org/r/20250319111437.1969903-1-t-pratham@ti.com
Fixes: f8bcbe62acd0 ("lib: scatterlist: add sg splitting function")
Signed-off-by: T Pratham <t-pratham@ti.com>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kamlesh Gurudasani <kamlesh@ti.com>
Cc: Praneeth Bajjuri <praneeth@ti.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:20:46 -07:00
Kent Overstreet
e2a33a2a32 lib/sort.c: add _nonatomic() variants with cond_resched()
bcachefs calls sort() during recovery to sort all keys it found in the
journal, and this may be very large - gigabytes on large machines.

This has been causing "task blocked" warnings, so needs a
cond_resched().

[kent.overstreet@linux.dev: fix kerneldoc]
  Link: https://lkml.kernel.org/r/cgsr5a447pxqomc4gvznsp5yroqmif4omd7o5lsr2swifjhoic@yzjjrx2bvrq7
Link: https://lkml.kernel.org/r/20250326152606.2594920-1-kent.overstreet@linux.dev
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:20:46 -07:00
Nicolas Schier
c8b6d5dd34 mailmap: add an entry for Nicolas Schier
Add mapping to linux.dev address as mail usage at work is limited and my
mail provider for private mails is suffering from its own domain name
block lists.

Link: https://lkml.kernel.org/r/20250326-mailmap-add-entry-for-nicolas-v1-1-3c26579a7fdf@fjasle.eu
Signed-off-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:20:45 -07:00
Jeff Xu
e20706d538 mseal sysmap: add arch-support txt
Add Documentation/features/core/mseal_sys_mappings/arch-support.txt

N/A: the arch is 32bits only and mseal is not supported in 32 bits,
     therefore N/A (until mseal is available in 32 bits kernel).

[jeffxu@chromium.org: update to v3]
  Link: https://lkml.kernel.org/r/20250324151537.1106542-2-jeffxu@google.com
Link: https://lkml.kernel.org/r/20250321032627.4147562-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Eric Dumaze <edumazet@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Meghana Malladi <m-malladi@ti.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:17 -07:00
Heiko Carstens
24e3f9fbbd mseal sysmap: enable s390
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on s390, covering the
vdso.

[hca@linux.ibm.com: update supported architectures]
  Link: https://lkml.kernel.org/r/20250317131917.1332402-1-hca@linux.ibm.com
Link: https://lkml.kernel.org/r/20250311123326.2686682-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:16 -07:00
Jeff Xu
b481341e4c selftest: test system mappings are sealed
Add sysmap_is_sealed.c to test system mappings are sealed.

Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in
config file.

Link: https://lkml.kernel.org/r/20250305021711.3867874-8-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:16 -07:00
Jeff Xu
a8c15bb400 mseal sysmap: update mseal.rst
Update memory sealing documentation to include details about system
mappings.

Link: https://lkml.kernel.org/r/20250305021711.3867874-7-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:16 -07:00
Jeff Xu
3d38922abf mseal sysmap: uprobe mapping
Provide support to mseal the uprobe mapping.

Unlike other system mappings, the uprobe mapping is not established during
program startup.  However, its lifetime is the same as the process's
lifetime.  It could be sealed from creation.

Test was done with perf tool, and observe the uprobe mapping is sealed.

Link: https://lkml.kernel.org/r/20250305021711.3867874-6-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:16 -07:00
Jeff Xu
0061b6e162 mseal sysmap: enable arm64
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on arm64, covering the
vdso, vvar, and compat-mode vectors and sigpage mappings.

Production release testing passes on Android and Chrome OS.

Link: https://lkml.kernel.org/r/20250305021711.3867874-5-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:15 -07:00
Jeff Xu
3049def198 mseal sysmap: enable x86-64
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on x86-64, covering the
vdso, vvar, vvar_vclock.

Production release testing passes on Android and Chrome OS.

Link: https://lkml.kernel.org/r/20250305021711.3867874-4-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:15 -07:00
Heiko Carstens
1d6fad7b84 mseal sysmap: generic vdso vvar mapping
With the introduction of the generic vdso data storage the VM_SEALED_SYSMAP
vm flag must be moved from the architecture specific
_install_special_mapping() call [1] [2] which maps the vvar mapping to
generic code.

[1] https://lkml.kernel.org/r/20250305021711.3867874-4-jeffxu@google.com
[2] https://lkml.kernel.org/r/20250305021711.3867874-5-jeffxu@google.com

Link: https://lkml.kernel.org/r/20250311123326.2686682-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:15 -07:00
Jeff Xu
7b0141daf3 selftests: x86: test_mremap_vdso: skip if vdso is msealed
Add code to detect if the vdso is memory sealed, skip the test if it is.

Link: https://lkml.kernel.org/r/20250305021711.3867874-3-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:15 -07:00
Jeff Xu
5796d3967c mseal sysmap: kernel config and header change
Patch series "mseal system mappings", v9.

As discussed during mseal() upstream process [1], mseal() protects the
VMAs of a given virtual memory range against modifications, such as the
read/write (RW) and no-execute (NX) bits.  For complete descriptions of
memory sealing, please see mseal.rst [2].

The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system.  For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.

The system mappings are readonly only, memory sealing can protect them
from ever changing to writable or unmmap/remapped as different attributes.

System mappings such as vdso, vvar, vvar_vclock, vectors (arm
compat-mode), sigpage (arm compat-mode), are created by the kernel during
program initialization, and could be sealed after creation.

Unlike the aforementioned mappings, the uprobe mapping is not established
during program startup.  However, its lifetime is the same as the
process's lifetime [3].  It could be sealed from creation.

The vsyscall on x86-64 uses a special address (0xffffffffff600000), which
is outside the mm managed range.  This means mprotect, munmap, and mremap
won't work on the vsyscall.  Since sealing doesn't enhance the vsyscall's
security, it is skipped in this patch.  If we ever seal the vsyscall, it
is probably only for decorative purpose, i.e.  showing the 'sl' flag in
the /proc/pid/smaps.  For this patch, it is ignored.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations.  UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings. 
Consequently, this feature cannot be universally enabled across all
systems.  As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.

To support mseal of system mappings, architectures must define
CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
mappings calls to pass mseal flag.  Additionally, architectures must
confirm they do not unmap/remap system mappings during the process
lifetime.  The existence of this flag for an architecture implies that it
does not require the remapping of thest system mappings during process
lifetime, so sealing these mappings is safe from a kernel perspective.

This version covers x86-64 and arm64 archiecture as minimum viable feature.

While no specific CPU hardware features are required for enable this
feature on an archiecture, memory sealing requires a 64-bit kernel.  Other
architectures can choose whether or not to adopt this feature.  Currently,
I'm not aware of any instances in the kernel code that actively
munmap/mremap a system mapping without a request from userspace.  The PPC
does call munmap when _install_special_mapping fails for vdso; however,
it's uncertain if this will ever fail for PPC - this needs to be
investigated by PPC in the future [4].  The UML kernel can add this
support when KUnit tests require it [5].

In this version, we've improved the handling of system mapping sealing
from previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it.  This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.

Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker.  This approach revealed several
issues:

- The PT_LOAD header may report an incorrect length for vdso, (smaller
  than its actual size).  The dynamic linker, which relies on PT_LOAD
  information to determine mapping size, would then split and partially
  seal the vdso mapping.  Since each architecture has its own vdso/vvar
  code, fixing this in the kernel would require going through each
  archiecture.  Our initial goal was to enable sealing readonly mappings,
  e.g.  .text, across all architectures, sealing vdso from kernel since
  creation appears to be simpler than sealing vdso at glibc.

- The [vvar] mapping header only contains address information, not
  length information.  Similar issues might exist for other special
  mappings.

- Mappings like uprobe are not covered by the dynamic linker, and there
  is no effective solution for them.

This feature's security enhancements will benefit ChromeOS, Android, and
other high security systems.

Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
  i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
  Android to ensure the sealing doesn't affect the functionality of
  Chromebook and Android phone.

I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
  normal operations such as browsing the web, open/edit doc are OK.

Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@mail.gmail.com/ [4]
Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]


This patch (of 7):

Provide infrastructure to mseal system mappings.  Establish two kernel
configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future
patches.

Link: https://lkml.kernel.org/r/20250305021711.3867874-1-jeffxu@google.com
Link: https://lkml.kernel.org/r/20250305021711.3867874-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:14 -07:00
Qi Zheng
02d9e1a204 mm: pgtable: remove tlb_remove_page_ptdesc()
The tlb_remove_ptdesc()/tlb_remove_table() is specially designed for page
table pages, and now all architectures have been converted to use it to
remove page table pages.  So let's remove tlb_remove_page_ptdesc(), it
currently has no users and should not be used for page table pages.

Link: https://lkml.kernel.org/r/3df04c8494339073b71be4acb2d92e108ecd1b60.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:14 -07:00
Qi Zheng
f1fdec956f x86: pgtable: convert to use tlb_remove_ptdesc()
The x86 has already been converted to use struct ptdesc, so convert it to
use tlb_remove_ptdesc() instead of tlb_remove_table().

Link: https://lkml.kernel.org/r/36ad56b7e06fa4b17fb23c4fc650e8e0d72bb3cd.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:14 -07:00
Qi Zheng
4239c198e8 riscv: pgtable: unconditionally use tlb_remove_ptdesc()
To support fast gup, the commit 69be3fb111e7 ("riscv: enable
MMU_GATHER_RCU_TABLE_FREE for SMP && MMU") did the following:

1) use tlb_remove_page_ptdesc() for those platforms which use IPI to
   perform TLB shootdown

2) use tlb_remove_ptdesc() for those platforms which use SBI to perform
   TLB shootdown

The tlb_remove_page_ptdesc() is the wrapper of the tlb_remove_page().  By
design, the tlb_remove_page() should be used to remove a normal page from
a page table entry, and should not be used for page table pages.

The tlb_remove_ptdesc() is the wrapper of the tlb_remove_table(), which is
designed specifically for freeing page table pages.  If the
CONFIG_MMU_GATHER_TABLE_FREE is enabled, the tlb_remove_table() will use
semi RCU to free page table pages, that is:

 - batch table freeing: asynchronous free by RCU
 - single table freeing: IPI + synchronous free

If the CONFIG_MMU_GATHER_TABLE_FREE is disabled, the tlb_remove_table()
will fall back to pagetable_dtor() + tlb_remove_page().

For case 1), since we need to perform TLB shootdown before freeing the
page table page, the local_irq_save() in fast gup can block the freeing
and protect the fast gup page walker.  Therefore we can ensure safety by
just using tlb_remove_page_ptdesc().  In addition, we can also the
tlb_remove_ptdesc()/tlb_remove_table() to achieve it, and it doesn't
matter whether CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected.  And in
theory, the performance of freeing pages asynchronously via RCU will not
be lower than synchronous free.

For case 2), since local_irq_save() only disable S-privilege IPI irq but
not M-privilege's, which is used by the SBI implementation to perform TLB
shootdown, so we must select CONFIG_MMU_GATHER_RCU_TABLE_FREE and use
tlb_remove_ptdesc() to ensure safety.  The riscv selects this config for
SMP && MMU, the CONFIG_RISCV_SBI is dependent on MMU.  Therefore, only the
UP system may have the situation where CONFIG_MMU_GATHER_RCU_TABLE_FREE is
disabled but CONFIG_RISCV_SBI is enabled.  But there is no freeing vs fast
gup race in the UP system.

So, in summary, we can use tlb_remove_ptdesc() to support fast gup in all
cases, and this interface is specifically designed for page table pages. 
So let's use it unconditionally.

Link: https://lkml.kernel.org/r/9025595e895515515c95e48db54b29afa489c41d.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:14 -07:00
Qi Zheng
e3ecf7c7d0 mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
Now, the nine architectures of csky, hexagon, loongarch, m68k, mips,
nios2, openrisc, sh and um do not select CONFIG_MMU_GATHER_RCU_TABLE_FREE,
and just call pagetable_dtor() + tlb_remove_page_ptdesc() (the wrapper of
tlb_remove_page()).  This is the same as the implementation of
tlb_remove_{ptdesc|table}() under !CONFIG_MMU_GATHER_TABLE_FREE, so
convert these architectures to use tlb_remove_ptdesc().

The ultimate goal is to make the architecture only use tlb_remove_ptdesc()
or tlb_remove_table() for page table pages.

[zhengqi.arch@bytedance.com: v2]
  Link: https://lkml.kernel.org/r/20250303072603.45423-1-zhengqi.arch@bytedance.com
[akpm@linux-foundation.org: remove trailing semi in arch/loongarch/include/asm/pgalloc.h]
Link: https://lkml.kernel.org/r/19db3e8673b67bad2f1df1ab37f1c89d99eacfea.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:13 -07:00
Qi Zheng
1a03c275a3 mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
All callers of tlb_remove_ptdesc() pass it a pointer of struct ptdesc, so
let's change the pt parameter from void * to struct ptdesc * to perform a
type safety check.

Link: https://lkml.kernel.org/r/60bb44299cf2d731df6592e446e7f694054d0dbe.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:13 -07:00
Qi Zheng
f21bb37afb mm: pgtable: make generic tlb_remove_table() use struct ptdesc
Patch series "remove tlb_remove_page_ptdesc()", v2.

As suggested by Peter Zijlstra below [1], this series aims to remove
tlb_remove_page_ptdesc().

: Fundamentally tlb_remove_page() is about removing *pages* as from a PTE,
: there should not be a page-table anywhere near here *ever*.
:
: Yes, some architectures use tlb_remove_page() for page-tables too, but
: that is more or less an implementation detail that can be fixed.

After this series, all architectures use tlb_remove_table() or
tlb_remove_ptdesc() to remove the page table pages.  In the future, once
all architectures using tlb_remove_table() have also converted to using
struct ptdesc (eg.  powerpc), it may be possible to use only
tlb_remove_ptdesc().

[1] https://lore.kernel.org/linux-mm/20250103111457.GC22934@noisy.programming.kicks-ass.net/


This patch (of 6):

Now only arm will call tlb_remove_ptdesc()/tlb_remove_table() when
CONFIG_MMU_GATHER_TABLE_FREE is disabled.  In this case, the type of the
table parameter is actually struct ptdesc * instead of struct page *.

Since struct ptdesc still overlaps with struct page and has not been
separated from it, forcing the table parameter to struct page * will not
cause any problems at this time.  But this is definitely incorrect and
needs to be fixed.  So just like the generic __tlb_remove_table(), let
generic tlb_remove_table() use struct ptdesc by default when
CONFIG_MMU_GATHER_TABLE_FREE is disabled.

Link: https://lkml.kernel.org/r/cover.1740454179.git.zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/5be8c3ab7bd68510bf0db4cf84010f4dfe372917.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:13 -07:00
Wei Yang
1b3d3e9f4a microblaze/mm: put mm_cmdline_setup() in .init.text section
As reported by lkp, there is a section mismatch of mm_cmdline_setup() and
memblock.  The reason is we don't specify the section of
mm_cmdline_setup() and gcc put it into .text.unlikely.

As mm_cmdline_setup() is only used in mmu_init(), which is in .init.text
section, put mm_cmdline_setup() into it too.

Link: https://lkml.kernel.org/r/20250328010136.13139-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202503241259.kJV3U7Xj-lkp@intel.com/
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:13 -07:00
Jinjiang Tu
9342bc134a mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
We triggered the below BUG:

 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 pfn:0x240402
 head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
 page_type: f4(hugetlb)
 page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
 ------------[ cut here ]------------
 kernel BUG at ./include/linux/page-flags.h:310!
 Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : const_folio_flags+0x3c/0x58
 lr : const_folio_flags+0x3c/0x58
 Call trace:
  const_folio_flags+0x3c/0x58 (P)
  do_migrate_range+0x164/0x720
  offline_pages+0x63c/0x6fc
  memory_subsys_offline+0x190/0x1f4
  device_offline+0xc0/0x13c
  state_store+0x90/0xd8
  dev_attr_store+0x18/0x2c
  sysfs_kf_write+0x44/0x54
  kernfs_fop_write_iter+0x120/0x1cc
  vfs_write+0x240/0x378
  ksys_write+0x70/0x108
  __arm64_sys_write+0x1c/0x28
  invoke_syscall+0x48/0x10c
  el0_svc_common.constprop.0+0x40/0xe0

When allocating a hugetlb folio, between the folio is taken from buddy and
prep_compound_page() is called, start_isolate_page_range() and
do_migrate_range() is called.  When do_migrate_range() scans the head page
of the hugetlb folio, the compound_head field isn't set, so scans the tail
page next.  And at this time, the compound_head field of tail page is set,
folio_test_large() is called by tail page, thus triggers VM_BUG_ON().

To fix it, get folio refcount before calling folio_test_large().

Link: https://lkml.kernel.org/r/20250324131750.1551884-1-tujinjiang@huawei.com
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Fixes: b62b51d2d159 ("mm: memory_hotplug: remove head variable in do_migrate_range()")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:12 -07:00
Mike Rapoport (Microsoft)
38c5ecaadd MAINTAINERS: mm: add entry for secretmem
Link: https://lkml.kernel.org/r/20250326215541.1809379-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Willaims <dan.j.williams@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:12 -07:00
Mike Rapoport (Microsoft)
6985850f3e MAINTAINERS: mm: add entry for numa memblocks and numa emulation
Link: https://lkml.kernel.org/r/20250326215541.1809379-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Willaims <dan.j.williams@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:12 -07:00
Mike Rapoport (Microsoft)
8871b533ef MAINTAINERS: mm: add entry for execmem
Link: https://lkml.kernel.org/r/20250326215541.1809379-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Willaims <dan.j.williams@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:12 -07:00
Mike Rapoport (Microsoft)
59aa44d1ee MAINTAINERS: fixup USERFAULTFD entry
Patch series "MAINTAINERS: add my isub-entries to MM part."

Following discussion at LSF/MM/BPF I'm adding execmem, secretmem and
numa memblocks sub-entries for MEMORY MANAGEMENT in MAINTAINERS.


This patch (of 4):

Change title to "MEMORY MANAGEMENT - USERFAULTFD" and make it sub-topic in
memory management and add missing include/linux/userfaultfd_k.h and
mailing list

Link: https://lkml.kernel.org/r/20250326215541.1809379-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250326215541.1809379-2-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Willaims <dan.j.williams@intel.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:11 -07:00
Li Wang
983e760bcd selftest/mm: va_high_addr_switch: add ppc64 support check
Add PPC64 Radix MMU support to the va_high_addr_switch.sh by introducing
check_supported_ppc64().  The function verifies:

  - 5-level paging (PGTABLE_LEVELS >= 5) enable in kernel config
  - Radix MMU (required for PPC64 5-level translation)
  - HugePages availability (needed for some tests)

If any check fails, the test is skipped (ksft_skip).  This ensures
compatibility with Power9/Power10 systems running in Radix MMU mode.

Avoid failures on 4-level paging system:

  # mmap(NULL, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(LOW_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(-1, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(-1, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
  # mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2*HUGETLB_SIZE, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(ADDR_SWITCH_HINT , 2*HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED

Link: https://lkml.kernel.org/r/20250327114813.25980-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:11 -07:00
Mike Rapoport (Microsoft)
7790c9c926 memblock: don't release high memory to page allocator when HIGHMEM is off
Nathan Chancellor reports the following crash on a MIPS system with
CONFIG_HIGHMEM=n:

  Linux version 6.14.0-rc6-00359-g6faea3422e3b (nathan@ax162) (mips-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP Fri Mar 21 08:12:02 MST 2025
  earlycon: uart8250 at I/O port 0x3f8 (options '38400n8')
  printk: legacy bootconsole [uart8250] enabled
  Config serial console: console=ttyS0,38400n8r
  CPU0 revision is: 00019300 (MIPS 24Kc)
  FPU revision is: 00739300
  MIPS: machine is mti,malta
  Software DMA cache coherency enabled
  Initial ramdisk at: 0x8fad0000 (5360128 bytes)
  OF: reserved mem: Reserved memory: No reserved-memory node in the DT
  Primary instruction cache 2kB, VIPT, 2-way, linesize 16 bytes.
  Primary data cache 2kB, 2-way, VIPT, no aliases, linesize 16 bytes
  Zone ranges:
    DMA      [mem 0x0000000000000000-0x0000000000ffffff]
    Normal   [mem 0x0000000001000000-0x000000001fffffff]
  Movable zone start for each node
  Early memory node ranges
    node   0: [mem 0x0000000000000000-0x000000000fffffff]
    node   0: [mem 0x0000000090000000-0x000000009fffffff]
  Initmem setup node 0 [mem 0x0000000000000000-0x000000009fffffff]
  On node 0, zone Normal: 16384 pages in unavailable ranges
  random: crng init done
  percpu: Embedded 3 pages/cpu s18832 r8192 d22128 u49152
  Kernel command line: rd_start=0xffffffff8fad0000 rd_size=5360128  console=ttyS0,38400n8r
  printk: log buffer data + meta data: 32768 + 102400 = 135168 bytes
  Dentry cache hash table entries: 65536 (order: 4, 262144 bytes, linear)
  Inode-cache hash table entries: 32768 (order: 3, 131072 bytes, linear)
  Writing ErrCtl register=00000000
  Readback ErrCtl register=00000000
  Built 1 zonelists, mobility grouping on.  Total pages: 16384
  mem auto-init: stack:all(zero), heap alloc:off, heap free:off
  Unhandled kernel unaligned access[#1]:
  CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc6-00359-g6faea3422e3b #1
  Hardware name: mti,malta
  $ 0   : 00000000 00000001 81cb0880 00129027
  $ 4   : 00000001 0000000a 00000002 00129026
  $ 8   : ffffdfff 80101e00 00000002 00000000
  $12   : 81c9c224 81c63e68 00000002 00000000
  $16   : 805b1e00 00025800 81cb0880 00000002
  $20   : 00000000 81c63e64 0000000a 81f10000
  $24   : 81c63e64 81c63e60
  $28   : 81c60000 81c63de0 00000001 81cc9d20
  Hi    : 00000000
  Lo    : 00000000
  epc   : 814a227c __free_pages_ok+0x144/0x3c0
  ra    : 81cc9d20 memblock_free_all+0x1d4/0x27c
  Status: 10000002        KERNEL EXL
  Cause : 00800410 (ExcCode 04)
  BadVA : 00129026
  PrId  : 00019300 (MIPS 24Kc)
  Modules linked in:
  Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
  Stack : 81f10000 805a9e00 81c80000 00000000 00000002 814aa240 000003ff 00000400
          00000000 81f10000 81c9c224 00003b1f 81c80000 81c63e60 81ca0000 81c63e64
          81f10000 0000000a 0000001f 81cc9d20 81f10000 81cc96d8 00000000 81c80000
          81c9c224 81c63e60 81c63e64 00000000 81f10000 00024000 00028000 00025c00
          90000000 a0000000 00000002 00000017 00000000 00000000 81f10000 81f10000
          ...
  Call Trace:
  [<814a227c>] __free_pages_ok+0x144/0x3c0
  [<81cc9d20>] memblock_free_all+0x1d4/0x27c
  [<81cc6764>] mm_core_init+0x100/0x138
  [<81cb4ba4>] start_kernel+0x4a0/0x6e4

  Code: 1080ffd5  02003825  2467ffff <8ce30000> 7c630500  1060ffd4  00000000  8ce30000  7c630180

The crash happens because commit 6faea3422e3b ("arch, mm: streamline
HIGHMEM freeing") too eagerly frees high memory to the page allocator even
when HIGHMEM is disabled.

Make sure that when CONFIG_HIGHMEM=n the high memory is not released to the
page allocator.

Link: https://lore.kernel.org/all/20250323190647.GA1009914@ax162
Link: https://lkml.kernel.org/r/20250325114928.1791109-3-rppt@kernel.org
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:11 -07:00
Mike Rapoport (Microsoft)
2ebc3b68ac mm/mm_init: init holes in the end of the memory map for FLATMEM
Patch series "mm: fixes for fallouts from mem_init() cleanup".

These are the fixes for fallouts from mem_init() cleanup reported by
Nathan Chancellor and kbuild.  The details are in the commit messages.


This patch (of 2):

Kernel test robot reports the following crash on 32-bit system with
FLATMEM and DEBUG_VM_PGFLAGS enabled:

[    0.478822][    T0] kernel BUG at include/linux/page-flags.h:536!
[    0.479312][    T0] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
[    0.479768][    T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc6-00357-g8268af309d07 #1
[    0.480470][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.481260][ T0] EIP: reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.481683][ T0] Code: 5d c3 01 f1 89 c8 ba e1 38 f4 c3 e8 1e 37 8e fc 0f 0b b8 90 e2 62 c4 e8 e2 05 5e fc 01 f1 89 c8 ba be 85 f7 c3 e8 04 37 8e fc <0f> 0b b8 80 e2 62 c4 e8 c8 05 5e fc 55 89 e5 53 57 56 83 ec 10 89
[    0.483177][    T0] EAX: 00000000 EBX: c425df50 ECX: 00000000 EDX: 00000000
[    0.483712][    T0] ESI: 017ffc00 EDI: ffffffff EBP: c425df34 ESP: c425df2c
[    0.484248][    T0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210046
[    0.484846][    T0] CR0: 80050033 CR2: 00000000 CR3: 04b48000 CR4: 00000090
[    0.485376][    T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.485907][    T0] DR6: fffe0ff0 DR7: 00000400
[    0.486253][    T0] Call Trace:
[ 0.486494][ T0] ? __die_body (arch/x86/kernel/dumpstack.c:478)
[ 0.486822][ T0] ? die (arch/x86/kernel/dumpstack.c:?)
[ 0.487099][ T0] ? do_trap (arch/x86/kernel/traps.c:? arch/x86/kernel/traps.c:197)
[ 0.487409][ T0] ? do_error_trap (arch/x86/kernel/traps.c:217)
[ 0.487752][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.488153][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.488490][ T0] ? handle_invalid_op (arch/x86/kernel/traps.c:254)
[ 0.488869][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.489271][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:316)
[ 0.489619][ T0] ? handle_exception (arch/x86/entry/entry_32.S:1055)
[ 0.489996][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.490332][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.490733][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.491068][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.491470][ T0] memmap_init_reserved_pages (mm/memblock.c:2203)
[ 0.491887][ T0] free_low_memory_core_early (mm/memblock.c:?)
[ 0.492302][ T0] memblock_free_all (mm/memblock.c:2272 include/linux/atomic/atomic-arch-fallback.h:546 include/linux/atomic/atomic-long.h:123 include/linux/atomic/atomic-instrumented.h:3261 include/linux/mm.h:67 mm/memblock.c:2273)
[ 0.492659][ T0] mem_init (arch/x86/mm/init_32.c:735)
[ 0.492952][ T0] mm_core_init (mm/mm_init.c:2730)
[ 0.493271][ T0] start_kernel (init/main.c:958)
[ 0.493604][ T0] i386_start_kernel (arch/x86/kernel/head32.c:79)
[ 0.493969][ T0] startup_32_smp (arch/x86/kernel/head_32.S:292)

The crash happens because after commit 8268af309d07 ("arch, mm: set
max_mapnr when allocating memory map for FLATMEM") max_mapnr is rounded up
to MAX_ORDER_NR_PAGES and the pages in the end of the memory map are
passing pfn_valid() check in reserve_bootmem_region().

Make sure that that pages in the end of the memory map are initialized,
just like the pages in the end of the last section for SPARSEMEM.

Link: https://lkml.kernel.org/r/20250325114928.1791109-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250325114928.1791109-2-rppt@kernel.org
Fixes: 8268af309d07 ("arch, mm: set max_mapnr when allocating memory map for FLATMEM")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503241424.d16223ec-lkp@intel.com
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:11 -07:00
Peter Xu
4a0cb63144 MAINTAINERS: add peterx as userfaultfd reviewer
Add an entry for userfaultfd and make myself a reviewer of it, just in
case it helps people manage the cc list.

I named it MEMORY USERFAULTFD, could be a bad name, but then it can be
together with the MEMORY* entries when everything is in alphabetic order,
which is definitely a benefit.

The line may not change much on how I'd work with userfaultfd; I think
I'll do the same as before..  But maybe it still, more or less, adds some
responsibility on top, indeed.

[akpm@linux-foundation.org: add include/linux/userfaultfd_k.h, per Mike]
[akpm@linux-foundation.org: fix misordering]
Link: https://lkml.kernel.org/r/20250322002124.131736-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:10 -07:00
Ye Liu
bd145bdd26 mm/page_alloc: replace flag check with PageHWPoison() in check_new_page_bad()
This patch replaces the direct check for the __PG_HWPOISON flag with the
PageHWPoison() macro, improving code readability and maintaining
consistency with other parts of the memory management code.

Link: https://lkml.kernel.org/r/20250320063346.489030-1-ye.liu@linux.dev
Signed-off-by: Ye Liu <liuye@kylinos.cn>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:10 -07:00