PL and OS

Hacking software and hardware at BU

SMP and cache coherency on Cortex-A9 MPCore

Last time I described in brief how to boot auxiliary processor cores on a Cortex-A9 MPCore Panda Board. I didn’t go into too much depth about the synchronization mechanisms however. Turns out it is rather tricky to do synchronization in the intermediate states between no cache coherency and full cache coherency with hardware support. Originally I had been enabling onboard caches (actually, just L1 with separate I and D, for OMAP4460) on CPU0 prior to SMP initialization and attempting to bring up the auxiliary CPUs using a global variable for synchronization until they were ready to enable caches themselves. This led to all sorts of wacky problems with memory not appearing as it should. I tried various schemes of cache invalidation and clean-up before finally just disabling caches for the purpose of bootstrapping auxiliary CPUs.

Diagram of hypothesis regarding coherency bug

 

The trouble is, eventually those caches do have to get enabled. Now I had the auxiliary CPU enabling caches before CPU0 got around to doing it. This created another weird bug with the spinlock used to regulate the serial port debugging output. While CPU0 had caches disabled, it was working fine. But CPU1 had caches enabled and got stuck because it could never find a moment when the lock was considered “free.” Then when CPU0 finally enabled caches, it ended up with a bogus cache line which claimed that CPU0 had already grabbed the lock. This triggered my deadlock detection code. My hypothesis is that as soon as CPU0 enabled its cache, it started receiving bogus cache line updates from CPU1 — which was stuck in a world where CPU0 always had control over the lock. I decided to dodge the whole problem by simply forcing the secondary CPU to wait until the primary CPU had enabled caches, before attempting to use synchronization primitives.

With both CPUs having caching enabled, as well as the SCU, and the SMP bit, the spinlock implementation began to work correctly again, allowing both CPUs to spit out logging information over the serial port without trampling on each other.

Categories: hacking

Leave a Reply