> I'm mystified what this "memory training" even is.
Modern high speed links are very finicky, to the extent that various parameters (timing, etc.) on each end of the link can't be hardcoded. The link training is part of initializing the link where the FW on each end of the link try out various parameters in order to enable full speed operation.
> Modern high speed links are very finicky, to the extent that various parameters (timing, etc.) on each end of the link can't be hardcoded. The link training is part of initializing the link where the FW on each end of the link try out various parameters in order to enable full speed operation.
Exactly. Initialising DDR memory in any kind of system is a bit of a black art for this reason, even when the chips are soldered to the same PCB as the SOC, let alone when there's two unknown PCBs and two sockets in the mix.
On essentially every PC that supports different sizes of RAM modules the firmware has to jump through somewhat interesting hoops in order to configure/enable the memory controller. DDR4 training is just one part of this. Typical flow goes something like this:
* All cores are brought out of reset. All cores start executing code from the end of 32b physical address space, hopefully some EPROM with firmware lives there. (One thing of note is that yes, intel says that there is no such thing as "unreal mode", but everything after i386 boots into unreal mode)
* Initial state of EDX signifies the position of CPU in SMP system, one core is designated as system processor, all other are application processors and thus do minimal APIC initialization and then enter halt and wait for IPI.
* The system processor continues executing its firmware and its first order of bussiness is to configure cache controllers as to get even a few kilobytes of writable SRAM for its data area.
* SP initializes enough of on-board PCIe devices to be able to access SMBus/SPD
* SP scans SPD to get list of populated DIMMs, their sizing and timing
* SP converts the SPD data into memory controller configuration and writes it there (note that at this point we are still doing XIP from slow-ish SPI Flash and have no RAM to speak of, so code that does this has to be very efficient, which is the reason for various RAM incompatibilities, as somehow common approach is to just hardcode expected configurations into the firmware).
* SP looks into NVRAM if the DIMM configuration is same as was on previous boot, if it is it skips the next step
* If it changed the SP enters the DDR4 link training algorithm. This essentially involves measuring lengths of wires from memory controller to individual DRAM chips. The idea there is that it is impossible to make them well-enough matched for the frequencies involved, so the deliberate difference is compensated for in logic and software (also it saves space on PCB of both motherboard and the DIMMs themselves).
* The result of training is written into configuration registers of the DRAM chips and memory controller (and saved into NVRAM for next boot, as the training is somewhat slow)
* We have RAM.
* Firmware completes initialization of PCIe topology, enables LPC bridge.
* Firmware does PCI(e) resource allocation. (The "PNP OS installed" switch in many BIOS configurations controls whether it configures all PCI devices or only these that it deems relevant)
* We have something that looks like PC.
* What one would call traditional PC-style POST happens now. First phase of ROM SCAN (in BIOS/CSM case) causes the VGA controller to come out of reset. Display turns on and user sees something like "Copyright (C) NVidia/AMD" in top-left corner
* Firmware draws its splashscreen/UI.
* Firmware completes all other initialization tasks (now running in somewhat sane environment). This involves entering protected mode and in CSM/BIOS case exiting it again (sometimes even multiple times). Various datastructures that are going to be passed to OS get filled in. List of boot devices is built. (this is the step that user perceives as "POST")
* Bootloader gets loaded from whatever device was selected and control is passed to it.
[Edit: and to make it even more interesting modern CPUs often need microcode update to be applied somewhere pretty early along this process]
[Edit2: it is my understanging that on at least some x86 server platforms the memory initialization is done by BMC although I'm not sure how exactly that works... bitbanging the DDR4 controller over JTAG?]
Right, that is how Intel does it. On AMD systems the PSP (which is ARM core integrated on the same die with its own on-chip SRAM) starts by loading its firmware from flash, initializes DRAM, does some other things, loads UEFI into main memory and only then releases x86 cores from reset.
Sidenote: Intel does that first stage before PCI/LPC root init using a bootrom on the CPU which also contains the hash of the RSA public key required to be used on the signed firmware it loads next. The enforcement policy on that depends on BootGuard fusing and the CSME/SPS.
If you search for XIP, there's a few resources out there, but mostly around intel's FPGA offerings. I would wager most of the documentation is behind NDAs for AMD and Intel respectively.
Which is meaningless philosophical exercise, as you still have the blob running anyway. (and now you cannot update it because it is in write protected memory, uh)
It's not running on main core either way. It's blob which is loaded into DDR controller, and for some reason FSF thinks it is wrong to memcpy that blob using main cpu, but it is fine if you memcpy it using secondary core.
Modern high speed links are very finicky, to the extent that various parameters (timing, etc.) on each end of the link can't be hardcoded. The link training is part of initializing the link where the FW on each end of the link try out various parameters in order to enable full speed operation.
> Where's this memory training fit in?
I believe it happens even before POST.