12-對part10-multicore的翻譯和搬運

非黑色字體均為我自己添加
圖均為原文所有
原文在文章末尾
使用多個CPU核
????????我建議在我們的主核運行時使用第二個CPU核來播放音頻而不是使用DMA來做搬運.我同樣說過在樹莓派上這很難,它確實是這樣.
????????我寫代碼時參考了?Sergey Matyukevich的工作(https://github.com/s-matyukevich/raspberry-pi-os/tree/master/src/lesson02),我非常的感謝它.它需要一些改動來保證第二個核在正確的時間啟動.這段代碼并不是特別"安全",但是足以在原則上證明這個概念.
????????你需要修改你的SD卡上的config.txt 文件來包括這些行:
kernel_old=1
disable_commandline_tags=1
arm_64bit=1
????????可能這里最重要的是?kernel_old=1 這條指令.它告訴引導(dǎo)程序(bootloader)期望的內(nèi)核偏移量是0x00000而不是0x80000.同樣的,我們需要從 link.ld 中移除這一行:
. = 0x80000;? ? ?/* Kernel load address for AArch64 */
????????它同樣不會在啟動的時候為我們鎖住第二個核,所以我們?nèi)匀豢梢栽L問它們(稍后會詳細介紹).
設(shè)置主計時器
????????還有一些其他的設(shè)置是我們現(xiàn)在需要注意的 -- 建立主計時器.我們在 boot.S 的頂部加入如下 #define 塊:
#define LOCAL_CONTROL? ?0xff800000
#define LOCAL_PRESCALER 0xff800008
#define OSC_FREQ? ? ? ? 54000000
#define MAIN_STACK? ? ? 0x400000
????????LOCAL_CONTROL 是 ARM_CONTROL 寄存器的地址. 在我們 _start?節(jié)的頂部設(shè)置其為0,有效的告訴ARM主計時器使用晶振作為時鐘源,并且設(shè)置增量為1.
ldr? ? ?x0, =LOCAL_CONTROL? ?// Sort out the timer
str? ? ?wzr, [x0]
????????我們繼續(xù)設(shè)置分頻 -- 把它想成等效的時鐘分頻.設(shè)置它會有效的把除數(shù)設(shè)為1(也就是它不起作用):
mov? ? ?w1, 0x80000000
str? ? ?w1, [x0, #(LOCAL_PRESCALER - LOCAL_CONTROL)]
????????你需要記住part9中期望的振蕩頻率是54MHz.我們使用以下行來設(shè)置它:
ldr? ? ?x0, =OSC_FREQ
msr? ? ?cntfrq_el0, x0
msr? ? ?cntvoff_el2, xzr
????????我們的定時器現(xiàn)在是我們所需要的了.
啟動主核
????????我們向往常一樣繼續(xù)檢查我們的處理器ID.如果它是0那么我們就在主核上,然后我們向label2跳轉(zhuǎn).這回我們要稍微不同地設(shè)置堆棧指針.我們不能在我們的代碼里面設(shè)置它,因為他現(xiàn)在在0x00000!相反,我們使用在之前頂部定義好的?MAIN_STACK 地址.
// Set stack to start somewhere safe
mov? ? ?sp, #MAIN_STACK
????????我們向往常一樣繼續(xù)清理BBS,并且跳轉(zhuǎn)到我們C代碼中的main() 函數(shù).如果它恰好返回了,我們就跳回1,然后掛起核.
設(shè)置輔助核
????????之前我們明確的通過label1中的無線循環(huán)來掛起其他的內(nèi)核.相反,現(xiàn)在每個內(nèi)核都將在指定的內(nèi)存地址觀察到一個值.在 boot.S 里面被初始化為0,并且被命名為 spin_cpu0-3.如果它的值變?yōu)榉?,它就被這個信號喚醒,然后跳轉(zhuǎn)到那個內(nèi)存位置,執(zhí)行那里的任何代碼.一旦那段代碼返回了,我們開始循環(huán)并且再次觀察所有值.
? ? adr? ? ?x5, spin_cpu0? ? ? ? // Base watch address
1:? wfe
? ? ldr? ? ?x4, [x5, x1, lsl #3] // Add (8 * core_number) to the base address and load what's there into x4
? ? cbz? ? ?x4, 1b? ? ? ? ? ? ? ?// Loop if zero, otherwise continue
? ? ldr? ? ?x2, =__stack_start? ?// Get ourselves a fresh stack - location depends on CPU core asking
? ? lsl? ? ?x1, x1, #9? ? ? ? ? ?// Multiply core_number by 512
? ? add? ? ?x3, x2, x1? ? ? ? ? ?// Add to the address
? ? mov? ? ?sp, x3
? ? mov? ? ?x0, #0? ? ? ? ? ? ? ?// Zero registers x0-x3, just in case
? ? mov? ? ?x1, #0
? ? mov? ? ?x2, #0
? ? mov? ? ?x3, #0
? ? br? ? ? x4? ? ? ? ? ? ? ? ? ?// Run the code at the address in x4
? ? b? ? ? ?1b
????????你將會注意到我們把棧指針設(shè)置到其他的地方,并且每個核都有它們自己指定的棧地址.我們通過把以下的東西加入 link.ld來建立必要的指向安全的內(nèi)存區(qū)域的指針:
.cpu1Stack :
{
? ? . = ALIGN(16);? ? ? ? ? ? ? ?// 16 bit aligned
? ? __stack_start? = .;? ? ? ? ? // Pointer to the start
? ? . = . + 512;? ? ? ? ? ? ? ? ?// 512 bytes long
? ? __cpu1_stack? = .;? ? ? ? ? ?// Pointer to the end (stack grows down)
}
.cpu2Stack :
{
? ? . = . + 512;
? ? __cpu2_stack? = .;
}
.cpu3Stack :
{
? ? . = . + 512;
? ? __cpu3_stack? = .;
}
????????哦!這是它裝載引導(dǎo)程序的代碼.如果你使用新的引導(dǎo)程序和現(xiàn)有的代碼,樹莓派應(yīng)該啟動并且像之前一樣運行.我們現(xiàn)在需要繼續(xù)實現(xiàn)在這些次要核心上執(zhí)行代碼所需的信號,這些次級核心現(xiàn)在由我們支配.
從C喚醒輔助核
????????查看 multicore.c .
????????這里我們?yōu)槊總€核心復(fù)制兩個函數(shù):
void start_core1(void (*func)(void))
{
? ? store32((unsigned long)&spin_cpu1, (unsigned long)func);
? ? asm volatile ("sev");
}
void clear_core1(void)?
{
? ? store32((unsigned long)&spin_cpu1, 0);
}
????????首先,start_core1()使用了store32()函數(shù)(也在 multicore.c 中)來寫我們事先定義的spin_cpu1的內(nèi)存地址.這使它變?yōu)榉?值,告訴它被喚醒時應(yīng)該跳轉(zhuǎn)到的地方.因為我們使用wfe(Wait For Event)來使他休眠,我們使用sev(Set Event)來再次喚醒它.
????????其次,clear_core1()可以被執(zhí)行的函數(shù)使用來重置spin_cpu1到0,所以當代碼返回時核不會再次跳轉(zhuǎn).
更多的main()!
????????最后,我們看到 kernel.c,我們有一個單獨的main(),還有:
core0_main() -- 每一秒遞增一下進度條(大約)
core1_main() -- 有兩個進度條,在50%的時候使用CPU播放音頻,放完時直接跳轉(zhuǎn)到100%
core2_main() -- 設(shè)置DMA運輸音頻,然后每半秒遞增一下進度條,播放完成時跳轉(zhuǎn)到100%
core3_main() -- 每四分之一秒遞增一下進度條(大概)
????????main() 是核0的入口,它最終落到core0_main()里面,但它分別向?core3_main() 和 core1_main() 傳遞開始函數(shù)來啟動它們之前它不會落進去.當 core1_main() 完成后,它啟動 core2_main().
????????當你運行這個的時候,你可以看見這些函數(shù)分別在它們的內(nèi)核上并行運行.歡迎來到對稱多核處理!(原文為 Welcome to symmetric multi-processing!)
????????如果你在啟動的時候看見了彩虹屏,首先試試使用樹莓派官方操作系統(tǒng)的?rpi-update 更新你的固件.

????????在將要來到的part11中,我們將要把這些東西都放在一起,來做一個多核版本的Breakout游戲.
原文如下

Using multiple CPU cores
Instead of a background DMA transfer, I suggested that we might use a second CPU core to play the audio whilst our main core continues on. I also said it would be hard on the Raspberry Pi 4... and it is.
I wrote this code as I referenced [Sergey Matyukevich's work](https://github.com/s-matyukevich/raspberry-pi-os/tree/master/src/lesson02), for which I am very grateful. It did need some modification to ensure the secondary cores are woken up when the time is right. This code isn't particularly "safe" yet, but it's good enough to prove the concept in principle.
You'll need to modify your _config.txt_ file on your SD card to include the following lines:
```c
kernel_old=1
disable_commandline_tags=1
arm_64bit=1
```
Perhaps the most important here is the `kernel_old=1` directive. This tells the bootloader to expect the kernel at offset `0x00000` instead of `0x80000`. As such, we'll need to remove this line from our _link.ld_:
```c
. = 0x80000;? ? ?/* Kernel load address for AArch64 */
```
It also won't lock the secondary cores for us on boot, so we will still be able to access them (more on this later).
Setting up the main timer
There is one other important piece of setup that we'll need to take care of ourselves now - establishing the main timer. We add the following `#define` block to the top of _boot.S_:
```c
#define LOCAL_CONTROL? ?0xff800000
#define LOCAL_PRESCALER 0xff800008
#define OSC_FREQ? ? ? ? 54000000
#define MAIN_STACK? ? ? 0x400000
```
`LOCAL_CONTROL` is the address of the ARM_CONTROL register. At the top of our `_start:` section we'll set this to zero, effectively telling the ARM main timer to use the crystal clock as a source and set the increment value to 1:
```c
ldr? ? ?x0, =LOCAL_CONTROL? ?// Sort out the timer
str? ? ?wzr, [x0]
```
We go on to set the prescaler - think of this as another clock divisor equivalent. Setting it thus will effectively make this divisor 1 (i.e. it will have no effect):
```c
mov? ? ?w1, 0x80000000
str? ? ?w1, [x0, #(LOCAL_PRESCALER - LOCAL_CONTROL)]
```
You should remember the expected oscillator frequency of 54 MHz from part9. We set this with the following lines:
```c
ldr? ? ?x0, =OSC_FREQ
msr? ? ?cntfrq_el0, x0
msr? ? ?cntvoff_el2, xzr
```
Our timer is now as we need it.
Booting the main core
We go on to check the processor ID as we always have. If it's zero then we're on the main core and we jump forward to label `2:`. This time, we have to set our stack pointer slightly differently. We can't set it below our code, because it's at 0x00000 now! Instead, we use the address we defined earlier as `MAIN_STACK` at the top:
```c
// Set stack to start somewhere safe
mov? ? ?sp, #MAIN_STACK
```
We then continue to clear the BSS as always, and jump to our `main()` function in C code. If it does happen to return, we branch back to `1:` to halt the core.
Setting up the secondary cores
Previously, we've unequivocally halted the other cores by spinning them in an infinite loop at label `1:`. Instead, each core will now watch a value at its own designated memory address, initialised to zero at the bottom of _boot.S_, and named as `spin_cpu0-3`. If this value goes non-zero, then that's a signal to wake up and jump to that memory location, executing whatever code is there. Once that code returns, we start looping and watching all over again.
```c
? ? adr? ? ?x5, spin_cpu0? ? ? ? // Base watch address
1:? wfe
? ? ldr? ? ?x4, [x5, x1, lsl #3] // Add (8 * core_number) to the base address and load what's there into x4
? ? cbz? ? ?x4, 1b? ? ? ? ? ? ? ?// Loop if zero, otherwise continue
? ? ldr? ? ?x2, =__stack_start? ?// Get ourselves a fresh stack - location depends on CPU core asking
? ? lsl? ? ?x1, x1, #9? ? ? ? ? ?// Multiply core_number by 512
? ? add? ? ?x3, x2, x1? ? ? ? ? ?// Add to the address
? ? mov? ? ?sp, x3
? ? mov? ? ?x0, #0? ? ? ? ? ? ? ?// Zero registers x0-x3, just in case
? ? mov? ? ?x1, #0
? ? mov? ? ?x2, #0
? ? mov? ? ?x3, #0
? ? br? ? ? x4? ? ? ? ? ? ? ? ? ?// Run the code at the address in x4
? ? b? ? ? ?1b
```
You'll notice that we've set our stack pointer elsewhere, and each core has its own designated stack address. This is to avoid it conflicting with activity on the other cores. We establish the necessary pointers to a safe memory area by adding the following to our _link.ld_:
```c
.cpu1Stack :
{
? ? . = ALIGN(16);? ? ? ? ? ? ? ?// 16 bit aligned
? ? __stack_start? = .;? ? ? ? ? // Pointer to the start
? ? . = . + 512;? ? ? ? ? ? ? ? ?// 512 bytes long
? ? __cpu1_stack? = .;? ? ? ? ? ?// Pointer to the end (stack grows down)
}
.cpu2Stack :
{
? ? . = . + 512;
? ? __cpu2_stack? = .;
}
.cpu3Stack :
{
? ? . = . + 512;
? ? __cpu3_stack? = .;
}
```
Phew! That's it for the bootloader code. If you use this new bootloader with your existing code, the RPi4 should boot and run as before. We now need to go on to implement the signalling required to execute code on these secondary cores which are now at our disposal.
Waking the secondary cores from C
Check out _multicore.c_.
Here we essentially duplicate two functions for each core:
```c
void start_core1(void (*func)(void))
{
? ? store32((unsigned long)&spin_cpu1, (unsigned long)func);
? ? asm volatile ("sev");
}
void clear_core1(void)?
{
? ? store32((unsigned long)&spin_cpu1, 0);
}
```
The first, `start_core1()`, uses the `store32()` function (also in _multicore.c_) to write an address to our predefined `spin_cpu1` memory location. This takes it non-zero, telling core 1 where to jump to when it wakes. Since we put it to sleep with a `wfe` (Wait For Event) instruction, we use a `sev` (Set Event) instruction to wake it again.
The second, `clear_core1()`, can be used by an executing function to reset `spin_cpu1` to zero, so the core won't jump again when the executing code returns.
More main()'s please!
Finally, we look at _kernel.c_, where we now have a single `main()`, but also:
?* `core0_main()` - increments a progress bar every 1 second (roughly)
?* `core1_main()` - has a two-step progress bar, playing an audio sample using the CPU at 50%, jumping straight to 100% when done
?* `core2_main()` - sets a DMA audio transfer, then increments a progress bar every half second (roughly), jumping to 100% as playback finishes
?* ... and `core3_main()` - increments a progress bar every quarter second (roughly)
`main()` is core 0's entry point, which ultimately falls through to `core0_main()`, but not before it kicks off `core3_main()` and `core1_main()` by passing them to their respective start functions. When `core1_main()` finishes, it kicks off `core2_main()`.
_As you run this, you'll see that these functions run in parallel on their respective cores. Welcome to symmetric multi-processing!_
**If all you see on boot is the rainbow screen, try first updating your firmware using** `rpi-update` **from Raspbian.**

Coming up in part 11, we'll put all of this work together for a multi-core version of our Breakout game.