How to Get sel4test Running on NVIDIA Jetson Orin Nano 8GB
_end
symbol) turned out to be genuine dead ends and have been moved to the Dead Ends section. The final
file count dropped from 25 to 22 (kernel 12→11, elfloader 8→6, util_libs unchanged).
Please contact me if you are interested to find out more. Thanks, Alex.
A complete step-by-step guide for porting the seL4 microkernel to the NVIDIA Jetson Orin Nano 8GB Developer Kit. Covers hardware setup, serial console wiring, cross-compiling sel4test, and booting from UEFI. All 141 compiled tests pass deterministically with stock NVIDIA firmware.
Contents
The port changes 3 seL4 repositories
(github.com/potanin,
orin-nano branch).
Stock NVIDIA firmware works unmodified.
sel4test and seL4_libs run unmodified from upstream.
Based on TX2 (closest existing seL4 platform)
The seL4 manual's
porting guide
recommends starting from the closest existing platform and customising only what
actually differs. For T234 (the SoC under both the Orin Nano and AGX Orin), the closest
existing seL4 platform is tx2 — Tegra186, same Tegra family, same
TKE timer block, same NV UART register layout, same SMMUv2/GICv3 conventions.
Following the porting guide:
- util_libs platform headers —
clock.h,i2c.h,reset.hare verbatim TX2 copies. sel4test never exercises clocks / I2C / resets through libplatsupport, so the contents are irrelevant for our build — the files just need to exist at the per-plat path because the libplatsupport sources do unconditional#includes. Bringing them across literally is the seL4 manual's recommended pattern. - Two files customised —
serial.h(3-line change: default UART becomes UART-C, not UART-A; AGX/Nano expose TCU debug on UART-C) andtimer.h(1-line change:NV_TMR_PATHpoints at the TKE block at0x2080000, the T234 address, not0x3020000which is TX2's). Every other constant is exactly TX2's. - nvidia/timer.c — we extend the existing
#ifdef CONFIG_PLAT_TX2branch to also cover the orin platforms (one line). The upstream source already has a comment saying "If the platform specific code becomes any larger, then it should be considered moving into a per platformnv_timer_plat_initfunction" — until that refactor happens, extending the conditional is the Tegra-family-hygiene fix.
The kernel-platform tree (kernel/src/plat/orin-nano/) is similarly modelled
on src/plat/tx2/: same declare_platform pattern, GICv3, A72-class
CPU constants, l2c_nop.c shim. The DTS overlay and the chosen device-tree
paths are AGX/Nano-specific because the board layout differs, but the cmake structure is
Tegra-family-standard.
Two categories of changes — and why the distinction matters
Every change in this port lives in one of two buckets, with very different blast radius:
- Platform-specific files — under
kernel/src/plat/orin-nano/,libplatsupport/plat_include/orin-nano/,libsel4/sel4_plat_include/orin-nano/,kernel/tools/dts/orin-nano.dts, andtools/seL4/elfloader-tool/src/plat/orin-nano/. These compile only whenKernelPlatform=orin-nano— other platforms never see them. Safe to be as platform-specific as we like. - Shared-infrastructure changes — modifications under
kernel/src/arch/arm/,kernel/include/arch/arm/,kernel/src/drivers/serial/,elfloader-tool/src/arch-arm/,libplatsupport/src/mach/nvidia/, etc. Every other ARMv8 / Tegra-family platform compiles these. We work to keep this column as small as possible.
The two repository-structure tables below are split on this line. The category-1
tables list files that live under per-plat directories — "safe" changes. The
category-2 table lists modifications to shared code — the ones to scrutinise.
Where possible we move things from category 2 to category 1; the most recent example
is the plat_console_putchar override, which used to be in shared elfloader
code (efi_init.c) and now lives entirely in
src/plat/orin-nano/console.c as a strong override of an upstream
WEAK symbol. Several more candidates exist (init_downpages, the elfloader
cache discipline, the 2 MiB low-address reservation), each gated on a single-line
__attribute__((weak)) upstream marker.
1. Hardware
| Item | Notes |
|---|---|
| Jetson Orin Nano 8GB Developer Kit | The carrier board with the Orin Nano module. Other Orin variants may work but are untested. |
| USB-UART adapter (3.3V) | CP2102, FTDI FT232R, or CH340. Must be 3.3V logic — 5V will damage the Jetson. |
| MicroSD card (32GB+) | For JetPack Linux installation and as the UEFI boot medium. |
| Ethernet cable | Connects the Jetson to your LAN for HTTP boot. |
| DisplayPort monitor + USB keyboard | For initial UEFI configuration. Not needed after one-time setup. |
| DC power supply | 5V/4A barrel jack (included with Dev Kit), or USB-C PD. |
| Host machine | Linux x86_64 (Ubuntu 22.04 recommended). For cross-compiling seL4. |
| Dupont jumper wires | 3 female-to-female wires for the serial console. |
2. Serial Console
The serial console uses UARTC on the J14 button header — the small header near the three buttons on the carrier board (not the 40-pin GPIO header).
| J14 Pin | Signal | Wire Color | Connect to Adapter |
|---|---|---|---|
| Pin 3 | RXD | Green | Adapter TX |
| Pin 4 | TXD | White | Adapter RX |
| Pin 7 | GND | Black | Adapter GND |
J14 Button Header (looking at the board, buttons nearest to you):
Pin 1 Pin 2
Pin 3 Pin 4 <- RXD (3), TXD (4)
Pin 5 Pin 6
Pin 7 Pin 8 <- GND (7)
Pin 9 Pin 10
Pin 11 Pin 12
DO NOT connect VCC/3.3V from the adapter to the Jetson.
Serial parameters: 115200 baud, 8N1. Connect with:
picocom -b 115200 /dev/ttyUSB0
(Your device may be /dev/ttyUSB0, /dev/ttyUSB1, or
/dev/ttyACM0. Use dmesg | tail after plugging in the adapter.)
3. Jetson Linux
Before building seL4 you need JetPack 6.x on the Jetson. Follow NVIDIA's Getting Started guide to flash your Dev Kit. Stock firmware works unmodified.
Once Linux is running, extract the hardware device tree and other reference information used to create the seL4 platform port:
# Copy the device tree blob from the running Jetson
sudo cat /sys/firmware/fdt > tegra234-orin-nano.dtb
# Decompile to human-readable DTS (on host or Jetson)
dtc -I dtb -O dts -o tegra234-orin-nano.dts tegra234-orin-nano.dtb
# Save the kernel config (useful reference for clocks, drivers, etc.)
zcat /proc/config.gz > kernel-config
# Dump the memory map
cat /proc/iomem > iomem.txt
The DTS in the seL4 kernel repo (tools/dts/orin-nano.dts, 11,514 lines)
is the verbatim output of this extraction. The seL4 overlay
(overlay-orin-nano.dts) selects the relevant nodes — UART, GIC,
timer, and memory regions — from this full device tree.
Key information extracted from Linux:
- Memory regions —
/proc/iomemshows DRAM at0x80000000–0xbe000000and0xc2000000–0x100000000, with the 64 MB firmware carve-out gap between them - UARTC address —
0x0c280000, clock ID 157, in the always-on (AON) power domain - GIC addresses — GICD at
0x0f400000, GICR at0x0f440000 - Timer PPIs — Secure=13, Non-secure=14, Virtual=11, Hyp=10
4. Building seL4
Install build dependencies per the seL4
host dependencies
guide, including the AArch64 cross-compiler (gcc-aarch64-linux-gnu).
Clone Repositories
3 forked repos
(github.com/potanin,
orin-nano branch) plus upstream dependencies:
mkdir -p ~/jetson/potanin-git && cd ~/jetson/potanin-git
# Forked repos (github.com/potanin, branch: orin-nano)
git clone -b orin-nano https://github.com/potanin/seL4.git kernel
mkdir -p tools
git clone -b orin-nano https://github.com/potanin/seL4_tools.git tools/seL4
mkdir -p projects
git clone -b orin-nano https://github.com/potanin/util_libs.git projects/util_libs
# Upstream repos (pinned commits)
git clone https://github.com/seL4/seL4_libs.git projects/seL4_libs
git clone https://github.com/seL4/sel4test.git projects/sel4test
git clone https://github.com/seL4/musllibc.git projects/musllibc
git clone https://github.com/seL4/sel4_projects_libs.git projects/sel4_projects_libs
git clone https://github.com/seL4/sel4runtime.git projects/sel4runtime
git clone https://github.com/nanopb/nanopb.git tools/nanopb
git clone https://github.com/riscv/opensbi.git tools/opensbi
# Required build symlinks
ln -sf projects/sel4test/easy-settings.cmake easy-settings.cmake
ln -sf tools/seL4/cmake-tool/init-build.sh init-build.sh
ln -sf tools/seL4/cmake-tool/griddle griddle
Build
cd ~/jetson/potanin-git
mkdir build-orin && cd build-orin
../init-build.sh \
-DPLATFORM=orin-nano \
-DAARCH64=TRUE \
-DARM_HYP=ON \
-DSIMULATION=FALSE \
-DRELEASE=FALSE
ninja
Output: build-orin/images/sel4test-driver-image-arm-orin-nano —
a PE/COFF EFI executable (~6 MB).
rm -rf build-orin and re-run init-build.sh + ninja.
5. Booting
Option A: HTTP Boot (Recommended)
HTTP boot lets you rebuild on the host and immediately reboot the Jetson without touching the SD card.
# Serve the EFI binary from the host machine
mkdir -p /srv/tftp
cp build-orin/images/sel4test-driver-image-arm-orin-nano /srv/tftp/sel4.efi
cd /srv/tftp && python3 -m http.server 8080
One-time UEFI setup: Connect DisplayPort + keyboard. Press ESC during
boot to enter UEFI Setup. Add an HTTP boot entry pointing to
http://<host-ip>:8080/sel4.efi. Set it as the first boot option.
Option B: SD Card Boot
# Format SD card as FAT32, then:
sudo mount /dev/sdX1 /mnt/sd
sudo mkdir -p /mnt/sd/EFI/BOOT
sudo cp build-orin/images/sel4test-driver-image-arm-orin-nano \
/mnt/sd/EFI/BOOT/BOOTAA64.EFI
sudo umount /mnt/sd
Expected Output
The elfloader runs silently — UEFI's page tables do not map the UART MMIO region, so there is no serial output until the kernel initializes its own UART driver:
Booting all finished, dropped to user space
Starting test suite sel4test
Starting test 0: Test that there are tests
Starting test 1: SYSCALL0000
...
Test suite passed. 141 tests passed. 42 tests disabled.
All is well in the universe
6. Technical Details
UEFI boots to EL2 (not EL1), so seL4 must build with ARM_HYP=ON.
The T234 has 6 Cortex-A78AE cores (ARMv8.2-A); seL4 runs single-core (non-SMP).
Memory Map
| Region | Start | End | Size |
|---|---|---|---|
| Region 1 | 0x80000000 | 0xbe000000 | 992 MB |
| Gap (CO:43) | 0xbe000000 | 0xc2000000 | 64 MB |
| Region 2 | 0xc2000000 | 0x100000000 | 992 MB |
The gap at 0xbe000000 is firmware carve-out CO:43 (allocated by MB1,
hardware-protected by the SNOC bus fabric). Any CPU access triggers an uncorrectable
RAS error. Physical addresses 0x0–0x200000 are also reserved to prevent device
untypeds covering firmware-protected addresses.
Two categories of changes — and why the distinction matters
The 22 file operations split cleanly into two groups, with very different blast radius:
- Platform-specific changes — anything that lives under
src/plat/orin-nano/(kernel) orlibplatsupport/plat_include/orin-nano/(util_libs) orsrc/plat/orin-nano/(elfloader). These touch only this platform's tree; selecting a differentKernelPlatformnever sees them. We can be as AGX/Nano-specific as we want without worrying about regressions elsewhere. - Shared-infrastructure changes — modifications to files in
src/arch/arm/,elfloader-tool/src/arch-arm/,libplatsupport/src/mach/nvidia/, etc. Every other ARMv8 platform compiles these. Even guarded by#ifdef CONFIG_PLAT_ORIN_*, they enlarge the surface area maintainers have to review and risk subtle regressions on platforms we don't have hands on to test.
Where possible, we move things from group (2) into group (1). The
plat_console_putchar override is one example — it used to live in
efi_init.c (shared elfloader code, modified). It now lives in
tools/seL4/elfloader-tool/src/plat/orin-nano/console.c as a strong override
of the upstream WEAK default. Zero changes to shared elfloader files for
that fix. Several more candidates exist (init_downpages, the elfloader cache
discipline, the 2 MiB low-address reservation, the TX2 timer-init extension) and each
could be moved with a one-line __attribute__((weak)) upstream marker. That
reduces the shared-code modifications from 14 to ~6 at the cost of trivial upstream PRs.
Fixes That Should Go Upstream
These are the genuine shared-infrastructure changes — group (2) above. Several affect other platforms beyond the T234. All four below have been re-validated during the AGX Orin bring-up, where each one can be empirically attributed to a specific failure mode it eliminates:
- SDRAM-only kernel window
(
vspace.c) — upstreammap_kernel_window()maps the entirePADDR_BASE–PADDR_TOPrange as Normal memory. On any platform where non-DRAM addresses are protected, speculative accesses trigger faults. Our port maps onlyavail_p_regs[]. Upstream PR #1516 addresses kernel virtual layout but does not yet restrict the physical window to SDRAM. - Identity-map full PA window in the elfloader
(
mmu.c) — upstreaminit_downpagesmaps only ~2 MiB around the elfloader's_text. The post-MMU jump to the kernel image lands in unmapped memory and silently faults. Replace with 1 GiB PUD blocks covering the full 512 GiB PA range. This is THE silent-boot fix on T234. Affects any platform where the elfloader and kernel image are not within the same 2 MiB region. - PoU → PoC cache maintenance for page tables
(
machine.h,objecttype.c,vspace.c) — page table creation usescleanCacheRange_PoU(dc cvau) and runtime PTE updates usecleanByVA_PoU. These only clean to Point of Unification, which does not reach a system-level cache sitting outside the CPU complex. The hardware page table walker reads from DRAM, so it misses dirty PTEs stuck in the SLC. Fix: usecleanCacheRange_RAM(dc cvac, to PoC) for page table creation andcleanInvalByVA(dc civac) for PTE updates. Affects any platform with a system-level cache beyond PoU (e.g. ARM DSU-AE + external LLC). AGX bring-up confirmed this is required — without it, the kernel boots silently and never reaches its firstprintf. - Non-shareable page tables in UP builds
(
vspace.c) — upstream usesSMP_TERNARY(SMP_SHARE, 0)for kernel page table shareability, which evaluates to non-shareable in uniprocessor builds. On hardware with a system-level cache, the page table walker participates in the inner shareable domain and does not see PTEs flushed to a non-shareable address. Fix: force inner shareable for kernel page tables regardless of SMP configuration. Affects any UP build on hardware where the page table walker is in the IS domain. - Elfloader set/way flush does not reach system-level caches
(
sys_boot.c,mmu-hyp.S) — the elfloader usesdc cisw(clean+invalidate by set/way) to flush caches before enabling the MMU. Set/way operations only reach CPU-managed caches, not an external system-level cache. Page table entries get stuck in the SLC and the hardware walker reads stale data from DRAM — the post-MMU-enableBLR x_entryjumps to whatever zero-or-garbage address the walker returns. Fix: flush page tables withdc cvac(to PoC) while caches are still ON, then skip the set/way flush indisable_caches_hyp. Affects any platform with a system-level cache beyond L3.
7. Repository Structure
kernel/ — potanin/seL4 (11 files)
| File | Change |
|---|---|
src/plat/orin-nano/config.cmake | New platform declaration: Cortex-A78AE (A72 proxy), GICv3 |
src/plat/orin-nano/overlay-orin-nano.dts | Device tree overlay: UARTC, dual memory regions with 64 MB carve-out gap |
tools/dts/orin-nano.dts | Full device tree extracted from running Linux on the hardware |
libsel4/sel4_plat_include/orin-nano/sel4/plat/api/constants.h | Platform constants header (required by build system) |
tools/hardware.yml | Add nvidia,tegra194-hsuart to the seL4 hardware database. Compat-list addition, doesn't change behaviour for existing platforms. |
src/drivers/serial/tegra_omap3_dwapb.c | Add UART init (FIFO, DLAB, divisor) + TX timeout to the shared 8250-style driver used by Tegra/OMAP/DW-APB platforms. Changes driver behaviour for all consumers. |
src/drivers/serial/config.cmake | Wire tegra194-hsuart compatible into the shared serial driver registration. Compat-string registration, doesn't change behaviour for existing platforms. |
src/arch/arm/64/kernel/vspace.c | Map only SDRAM regions in kernel window instead of full PA range; prevents speculative RAS errors. Arguably an upstream-correct fix for any platform with firmware-protected pages. |
src/arch/arm/kernel/boot.c | Reserve physical addresses 0x0–0x200000 to prevent device untypeds covering firmware-protected memory. Guarded by #ifdef CONFIG_PLAT_ORIN_NANO; candidate for plat-isolation via a plat_pre_create_untypeds() hook. |
include/arch/arm/arch/machine.h | cleanCacheRange_RAM (dc cvac to PoC) for page table cleaning. PoU does not reach the T234 SLC; any platform with a system-level cache beyond PoU benefits. |
src/arch/arm/64/object/objecttype.c | Same PoU→PoC cache fix for user page table object creation |
tools/seL4/ — potanin/seL4_tools (6 files)
| File | Change |
|---|---|
elfloader-tool/src/plat/orin-nano/console.c | No-op plat_console_putchar override; strong override of upstream WEAK default. UEFI doesn't map UART MMIO at 0x0c280000 — the override silences the elfloader's pre-MMU banner. Tier A: previously lived in efi_init.c as shared code; migrated to plat dir. |
elfloader-tool/src/arch-arm/sys_boot.c | Flush page tables to DRAM with dc cvac (bypasses SLC); reordered boot flow for T234 cache maintenance. Candidate for plat-isolation via a plat_pre_disable_caches() hook. |
elfloader-tool/src/arch-arm/armv/armv8-a/64/mmu-hyp.S | Rewrite cache disable to skip set/way flush (SLC unreachable); register-only MMU transition. Candidate for plat-isolation via .weak markers on disable_caches_hyp + arm_enable_hyp_mmu. |
elfloader-tool/src/arch-arm/64/mmu.c | Identity-map the full 512 GiB PA range as 1 GiB PUD blocks (prevents speculative translation faults on T234). The silent-boot fix. Candidate for plat-isolation via __attribute__((weak)) on init_downpages. |
elfloader-tool/src/drivers/uart/8250-uart.c | Add nvidia,tegra194-hsuart compatible string to the shared 8250 driver's match table. Compat-list addition. |
cmake-tool/helpers/application_settings.cmake | Add orin-nano to the EFI boot platform list. Compat-list addition. |
projects/util_libs/ — potanin/util_libs (6 files)
#ifdef widening)
| File | Change |
|---|---|
libplatsupport/plat_include/orin-nano/platsupport/plat/clock.h | verbatim from tx2 — no changes |
libplatsupport/plat_include/orin-nano/platsupport/plat/i2c.h | verbatim from tx2 — no changes |
libplatsupport/plat_include/orin-nano/platsupport/plat/reset.h | verbatim from tx2 — no changes |
libplatsupport/plat_include/orin-nano/platsupport/plat/serial.h | copy of tx2's, 3-line change: PS_SERIAL_DEFAULT / DEFAULT_SERIAL_PADDR / DEFAULT_SERIAL_INTERRUPT → UART-C (AGX/Nano debug TCU); TX2 uses UART-A |
libplatsupport/plat_include/orin-nano/platsupport/plat/timer.h | copy of tx2's, 1-line change: NV_TMR_PATH = /bus@0/timer@2080000 (T234 TKE) vs /timer@3020000 (TX2) |
libplatsupport/src/mach/nvidia/timer.c | extend the existing #ifdef CONFIG_PLAT_TX2 branch to also cover the orin platforms (one line). Tegra-family conditional widening; upstream source already flags this as a planned refactor. |
8. Dead Ends
The port took roughly 120 iterations over 15 phases. Many of those iterations explored approaches that turned out to be wrong. Documenting them here so others can avoid the same traps.
Wrong UART: UARTA and the BPMP keepalive saga
The biggest time sink. We initially used UARTA at
0x03100000 (40-pin GPIO header), which required a custom
pinmux overlay flashed into MB1. It worked at first, but the UARTA clock
died after 5–10 minutes — BPMP power management aggressively
gates clocks when no Linux driver holds a reference.
This led to building an entire BPMP IPC keepalive subsystem inside the seL4 kernel: IVC (Inter-VM Communication) protocol, HSP doorbell interrupts, MRQ_CLK messages, timer-tick integration. Over 40 versions (v70–v112) were spent tuning this. Results were non-deterministic (25 to 122 tests per boot depending on BPMP firmware state). Calling BPMP IPC from the timer interrupt blocked the kernel for ~10 ms and violated real-time scheduling guarantees. A fire-and-forget variant corrupted the IVC TX/RX counters.
The fix was trivial: switch to UARTC at
0x0c280000 on the J14 button header. UARTC is in the
Always-On (AON) power domain and stays alive without any keepalive.
All BPMP kernel code was deleted.
TCU (Tegra Combined UART) via USB
Before physical UART worked, we tried using the TCU — the Orin
Nano’s default console, which operates as a USB gadget through
the XUSB controller. After ExitBootServices the USB stack
is gone, so no USB device appears on the host. Dead end.
Cross-boot diagnostics via SRAM and IVC shared memory
With no serial output after ExitBootServices, we tried
writing diagnostic values to persistent memory (SRAM at
0x40010000, IVC shared memory regions) and reading them
back on the next boot. SRAM wasn’t mapped in UEFI (crashed on access).
IVC regions were cleared each boot. EFI SetVariable
didn’t work after our MMU changes. Each attempt required two power
cycles and none persisted reliably.
Garbled serial blamed on pinmux
When garbled bytes appeared on UARTA, investigation focused on pinmux
configuration. A GPIO bit-bang test toggled pins directly — no
output, because PADCTL is firewalled on T234 (CPU cannot read/write
pinmux registers). The actual cause was a baud rate mismatch: UEFI
released the UART clock reference at ExitBootServices,
changing the effective clock rate. Fix: read UEFI’s divisor
registers before ExitBootServices and re-send
CLK_ENABLE afterward.
CBB ERD and IOB/ACI ERRCTLR — masking vs fixing RAS
We tried masking RAS errors at the bus fabric level. Setting the CBB
(Control Backbone) Error Response Disable register at
0x13a3a004 converted bus errors into benign responses.
But the IOB (I/O Bridge) generates its own RAS error independently of CBB.
The error path is: SLC eviction → writeback → SCF → IOB
→ RAS. CBB is downstream of where the IOB error fires.
Writing directly to IOB/ACI ERRCTLR registers failed too — they are Secure-only. Writes from NS-EL2 are silently ignored.
SDEI handlers to catch RAS
Registered SDEI (Software Delegated Exception Interface) handlers for all 13 RAS events. The handlers were called successfully. But ATF unconditionally powers off the faulting core after dispatching the SDEI event, regardless of the handler’s return value. The core was killed anyway.
Custom BL31 — unnecessary in the end
Built custom NVIDIA ATF from source, modifying
tegra234_ras_handler() to log errors instead of killing the core.
Required a complex QSPI flashing journey (USB Recovery Mode, A/B boot slots,
TOS image packing). Achieved deterministic 141/141 pass — but treated the
symptom (core kill) rather than the cause (speculative accesses to protected memory).
Three software-only fixes later eliminated the RAS errors entirely: MT_NORMAL identity mapping in the elfloader, SDRAM-only kernel window, and 2 MiB low-address reservation. Stock BL31 restored.
40-bit PA override + S2_START_L1 cmake rename (confirmed unnecessary)
Original port added a KernelPlatformPASizeBitsOverride mechanism in
src/arch/arm/config.cmake to force the kernel to a 40-bit PA layout, and a
partner change renamed the ARM_HYPERVISOR_SUPPORT check on the
AARCH64_VSPACE_S2_START_L1 line to KernelArmHypervisorSupport.
During AGX bring-up both were tested in isolation. Without PASizeBitsOverride,
the kernel uses A72's native 44-bit PA — sel4test passes 141/141.
The S2_START_L1 cmake rename only matters when KernelArmPASizeBits40 is set
(the if condition is dead code otherwise), so removing the override removes the
reason for the rename. Both changes have been reverted; src/arch/arm/config.cmake
is at upstream HEAD with no diff. Saves one modified kernel file.
FEAT_CCIDX CCSIDR parsing (confirmed unnecessary)
The elfloader's cache set/way code reads CCSIDR_EL1 in 32-bit format. ARMv8.2+
adds FEAT_CCIDX which can extend that register to 64-bit with different field widths,
and the original port modified elfloader-tool/include/arch-arm/64/mode/assembler.h
to parse both layouts. On the AGX (also A78AE) we proved the parsing change is never reached
on the actual boot path: the only place we keep set/way operations is the optional flush in
disable_caches_hyp, which we replaced with dc cvac/dsb sy
anyway (see the SLC fix above). With set/way removed from the boot path, FEAT_CCIDX-style
CCSIDR parsing is moot. assembler.h reverted to upstream HEAD.
EFI _end symbol addition (confirmed unnecessary)
The original port added a fabricated _end symbol to
elfloader-tool/src/binaries/efi/gnuefi/elf_aarch64_efi.lds, on the theory
that NVIDIA's UEFI PE/COFF loader expected it. AGX bring-up boots without this addition
— _end is already provided by other parts of the build. The linker script
modification is reverted to upstream HEAD.
XN-only (execute-never) mapping
Mapped the full PA range with UXN=1 instead of leaving non-DRAM unmapped. XN prevents speculative instruction fetches but not speculative data reads. The A78AE still speculatively reads data from Normal-mapped addresses. RAS errors persisted. Only leaving non-DRAM regions completely unmapped works.
PE/COFF sensitivity misdiagnosis
Removing UEFI ConOut code from efi_init.c
caused a Synchronous Exception. We initially blamed NVIDIA’s PE/COFF
loader being sensitive to binary layout changes. The actual cause: the
default weak plat_console_putchar calls
uart_8250_putchar(), which writes to UART MMIO at
0x0c280000. UEFI doesn’t map that address → data abort.
Fix: override with a no-op (7 lines).
Implementation-defined CPU registers
Attempted to modify speculative behavior via ACTLR_EL2 and other implementation-defined registers. On the A78AE (r0p1), ATF traps and silently discards all EL2 writes to these registers. No CPU-level workarounds are possible from EL2.
MCE ROC_FLUSH_CACHE SMC calls
Tried T186-era MCE commands (ROC_FLUSH_CACHE,
ROC_CLEAN_CACHE) via SMC to flush the SLC. Not supported
on T234 — both return SMC_UNK. The SLC cannot be
flushed from NS-EL2 on T234.
dc ivac to discard dirty SLC lines
Tried dc ivac (invalidate without clean) to discard dirty
cache lines at protected addresses without triggering a writeback. The
A78AE implementation cleans dirty lines before invalidating —
dc ivac triggers the same writeback as dc civac.
IOB RAS fires, core killed.
QSPI flash from the running system
During the custom BL31 phase, we tried every approach to write QSPI from
a running OS: Linux MTD, /dev/mem, UEFI Shell mm,
UEFI FirmwareManagement Protocol, NorFlashDxe, OP-TEE TA. All failed
because the T234 CBB firewall blocks all Non-Secure CPU access to the QSPI
controller at 0x03270000. Only USB Recovery Mode works (BROM
exposes USB, MB1/MB2 have Secure access to QSPI).
BPMP IPC for UARTC clock enable
After switching to UARTC, we added BPMP IPC code to the elfloader to
explicitly enable the UARTC clock (CLK_ENABLE + SET_RATE + RESET_DEASSERT)
before ExitBootServices. ~60 lines of shared-memory IPC and
doorbell code. Turned out to be completely redundant — firmware
(MB1/MB2/BPMP/SPE) already enables UARTC from power-on (serial output
appears at timestamp 0000.066, long before our EFI application loads).
Removed.
Have questions or feedback? Comment on this post on LinkedIn.