Tuesday, March 12, 2013

pm-cpuidle: High level code flow from bootup

When the kernel boots up, it starts as a single thread for which PID = 0. It is named as "swapper". This eventually calls cpu_idle() (which is architecture specific function) and this task is nothing but idle task.

init/main.c
start_kernel(void)
{
  ...
  rest_init();

  This is the last function that is called after all inits are called.
}


rest_init()
{
  ...
  /* We need to spawn init first so that it obtains pid 1, however
   * the init task will end up wanting to create kthreads, which, if
   * we schedule it before we create kthreadd, will OOPS.
   */
  kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);    PID = 1
  ...
  pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);   PID = 2
  ...
  /*
   * The boot idle thread must execute schedule()
   * at least once to get things moving:
   */
  init_idle_bootup_task(current);
  schedule_preempt_disabled();
  /* Call into cpu_idle with preempt disabled */
  cpu_idle();     PID = 0

}

arch/arm/kernel/process.c
cpu_idle()
 
{
  ...
  /* endless idle loop with no priority at all */
  while (1) {

    ...
    while (!need_resched()) {
      ...
      if (cpuidle_idle_call())     
        pm_idle()
      If CPU_IDLE is enabled, in non-error case, cpuidle_idle_call()
      returns 0. Thereby pm_idle() is not called. Determining and
      entering the sleep state is done in cpuidle_idle_call()
      itself.
      ...
    }
  }
}

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
pm_idle()
 cpu_do_idle() => cpu_v7_do_idle

arch/arm/mm/proc-v7.S
 ENTRY(cpu_v7_do_idle)
    dsb           @ WFI may enter a low-power mode
    wfi
    mov     pc, lr
 ENDPROC(cpu_v7_do_idle)
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


drivers/cpuidle/cpuidle.c
int cpuidle_idle_call(void)
{
  ...
  /* ask the governor for the next state */
 
  next_state = cpuidle_curr_governor->select(drv, dev);
  Current cpuidle governer identifies the appropriate idle
  state. Platform data is used by the governor to select the
  approprite sleep state.
  ...
  entered_state = cpuidle_enter_ops(dev, drv, next_state);

  Here we enter the idle state suggested by current cpuidle
  governor. Platform specific code comes into picture to set
  the appropriate idle state. 
  ...
}



[Based on Linux kernel v3.4]

Saturday, March 9, 2013

ARM Linux: Switching page tables during context switch


Every process (except kernel threads) in Linux has a page table of it's own.

For 32-bit ARM without LPAE, 2-level page table is used.
For 32-bit ARM with LPAE, 3-level page table is used.
 
In arch/arm/mm/proc-v7.S:
#ifdef CONFIG_ARM_LPAE
#include "proc-v7-3level.S"
#else
#include "proc-v7-2level.S"
#endif


 

Let's look at the case of 32-bit ARM without LPAE, which uses 2-Level page table
  • Level-1: pgd: page global directory
  • Level-2: pte: page table entry

pgd base for each process is embedded in task_struct as shown below:

struct task_struct {   :include/linux/sched.h
    ...
    struct mm_struct *mm
    ...
    }

struct mm_struct{      :include/linux/mm_types.h
    ...
    pgd_t * pgd;
    ...
    }

TTBR: Translation Table Base Register - Check ARM spec for more details about this register
LAPE: Large Physical Address Extension

When ever context switch happens in Linux, the pgd base of the next process has to be stored in TTBR (note that this is not done while switching to kernel threads as the kernel threads doesn't have a mm struct of it's own). Here is the code flow for how it is done.

schedule() :kernel/sched/core.c
     __schedule()

__schedule() :kernel/sched/core.c
    context_switch()

context_switch() :kernel/sched/core.c
    switch_mm()


switch_mm() :arch/arm/include/asm/mmu_context.h
    cpu_switch_mm(next->pgd, next);

    #define cpu_switch_mm(pgd,mm) cpu_do_switch_mm(virt_to_phys(pgd),mm)
    #define cpu_do_switch_mm processor.switch_mm

    processor.switch_mm is cpu_v7_switch_mm() in case of ARMv7

        struct processor { :arch/arm/include/asm/proc-fns.h
        ...
        /*
         * Set the page table
         */
        void (*switch_mm)(unsigned long pgd_phys, struct mm_struct *mm);
        ...
        }
 
    Check arch/arm/include/asm/glue-proc.h
       #define CPU_NAME cpu_v7
       #define cpu_do_switch_mm   __glue(CPU_NAME,_switch_mm)


Here is some info from ARM spec for accessing TTBR register:
CP15 register: c2
Register name: TTBR0, Translation Table Base Register 0
Affected operation: MCR p15, 0, <Rd>, c2, c0, 0
 
cpu_v7_switch_mm(mm->pgd, mm) :arch/arm/mm/proc-v7-2level.S
    ...
    mcr     p15, 0, r0, c2, c0, 0           @ set TTB 0

This is the instruction which sets TTBR0 to point to the page table of the next process. 

[Based on Linux kernel v3.4]