PL and OS

Hacking software and hardware at BU

Virtual Memory, Interrupts and Timers

ARM has a fairly straightforward MMU design but with a couple of twists that are unfamiliar to me as a longtime x86 programmer. The page table structures are similar but different sizes — the first level controls 1MB regions instead of 4MB. Also there is a 64kB option at the second level, but I am not going to bother with it, and stick to 4kB pages. One small oddity is that it is possible to set access permissions down to the 1kB level — there are 4 AP fields per page table entry.

The major twist is the notion of “Domains” which add another layer of protection that is integrated with the MMU. Basically, there are up to 16 Domains that you can program. Each page entry is assigned into a domain. There is a central domain register which contains 2 bits per domain and controls the access to every page, on top of the existing page table access permissions. So conceivably, you could share a TTB (translation table base) between multiple tasks, and then use domains to protect one from the other. Kind of like segments, in a way, but not limited to contiguous regions.

I’m not sure yet whether it is worthwhile to do something like that. There is another hardware feature which makes use of domains to provide something called the “Fast Context Switch Extension.” Basically it allows you to assign a hardware process ID and avoid TLB flushes during context switches. But there are some limitations, including: 32MB max virtual size and no more than 16 process IDs. I need to do some microbenchmarks before I determine whether to pursue this. To do that I will need to get interrupts and timers working.

The OMAP353x interrupt controller is pretty well described in the TRM. There are 96 lines, though one of them covers all “external interrupt sources” whatever that may end up being. There are two interrupt inputs to the ARM processor itself: FIQ and IRQ. Programming it is pretty straightforward to start. I’ve just set all the lines to go to IRQ for the time being. In the future, there are a bunch of interesting possibilities: priorities, preemption and FIQ support. For now, I just need a way to get the timer interrupt.

While looking at the GCC documentation for function attributes, I discovered that it knows about ARM interrupt handlers already. That means I don’t have to write assembly veneers for the C functions, which saved me some trouble. The only major trouble I had was setting up the interrupt vector table. It’s actually a piece of code: the ARM processor transfers control to some offset in the table and expects your code to do the right thing at that point. Usually the right thing is to load the PC with the address of the actual interrupt handler. In x86, I am accustomed to encoding the address of a far jump directly into the instruction. However, with ARM’s 32-bit fixed-length instructions, this is basically impossible except for limited cases. ARM assemblers will actually create a “pool” of 32-bit constants that you use in your program, and then load them using PC-relative addressing. It’s a little bit of extra “magic” that they do for convenience, but it is a surprise if you are not used to having assemblers insert extra data into your code. In this case, the surprise got me because when I copied the interrupt table into place, I forgot to copy the “pool” that went with it. This resulted in unexpected “prefetch aborts” which is ARM-inology for “Page fault caused by instruction fetch.”

Getting timers to work took me on a bit of a runaround. Actually there are two subsystems that I am dealing with here: performance monitoring, and general-purpose timers. The perfmon system in Cortex-A8 (or ARM-v7) processors is controlled via coprocessor 15. Originally, I had gotten ahold of bad information off one of the manuals. Unfortunately, QEMU does not support perfmon. So I had to wait until I tried it on the real HW to discover that I was causing “undefined instruction” exceptions. Except that I was getting “data abort” exceptions instead, which puzzled me until I remembered that I had not yet set up a stack for “undefined instruction.” The HW was telling me that the CP15 functionality I was invoking did not exist. I tracked down the correct information and plugged it in — no more crashes, but no performance monitoring either. I was just trying to count the number of processor cycles that had passed, which is one of the built-in counter sources. Then I noticed that there was yet another register which mentioned “Enable Clock Counter” in its description. Strange, but it worked — to enable the feature, I had to set it in two places.

GP Timers are a bit more involved. There are 11 of them, scattered throughout various “Power Domains” for some reason. Some have different capabilities than the others. Also there is a 32kHz timer register, which counts the number of ticks since reset. Convenient — and I timed my perfmon counter using this, finding that approximately 720 million cycles pass per second. This corresponds to the advertised value of 720 “MHz” — though it seems they are abusing the term “MHz” here to mean “Million Hertz” instead of “2^20 Hertz”. It doesn’t help that the 32kHz timer is actually kilohertz (32*2^10). Oh well.

The GP Timers can be programmed to raise an interrupt on overflow, or when they match a certain value. There are also lots of other little features I will need to check out at some point. The first thing I tested is the clock-rate for GP timer 1. I expected it to be 32kHz, which seems to be the default reset value according to the manual. However, I only discovered this after trying to figure out why it was ticking approximately 100 times faster than expected. After spending a long time in the TRM (these manuals do not make it easy!) I found a register called CM_CLKSEL_WKUP which has a bit controlling GP timer 1. It can be sourced from either the 32K_FCLK (default), or the SYS_CLK. I checked it, and on my system it was defaulting to SYS_CLK instead of 32K_FCLK. Since SYS_CLK is driving the processor, it is a lot faster, which explains the speedy tick rate. I do not know why it was not set at the stated default. But now I have a working, predictable timer.

All I need is a primitive context switching experiment and I can do the microbenchmarks I had set out to do.

Categories: hacking

Discussion

Gregg Schroeder March 3rd, 2014 at 13:14

Hello, I found your article here interesting. I am attempting to get interrupts working correctly on a beagleboard and am having some difficulty with the assembly portion. I have GPTimer1 working and firing an interrupt at 1ms rate. I am needing about 4 task rate intervals. 25ms tasks, 50ms tasks, 100ms tasks, and 1s tasks. I thought it would be simpler to stick with a single timer for this, and just do context switching to keep track of current task state and reinitialization. The faster task rates must have priority over the slower ones, so there will end up being a fair amount of context switching. Did you get any further on this project, or would you have any suggestions for task scheduling methods?
Thank you much!

Matthew March 3rd, 2014 at 13:28

Hi Gregg. You will probably want to take a look at http://github.com/mrd/puppy which has a round robin scheduler implemented.

Task switching is a fine option. What you are describing sounds a bit like a rate-monotonic scheduler, which is implemented in my follow-up OS, http://github.com/mrd/terrier

There are other things going on with Terrier, so it probably won’t be useful to you in its full form. But you are welcome to take ideas from it. You might want to go back in its git history, because it used to have a rate-monotonic scheduler written in C, rather than ATS.

Leave a Reply