Chapter 4. Linux Kernel

In this chapter, you will not only learn about the Linux kernel in general, but also specific things about it. The chapter will start with a quick presentation of the history of Linux and its role and will then continue with an explanation of its various features. The steps used to interact with the sources of the Linux kernel will not be omitted. You will only be presented with the steps necessary to obtain a Linux kernel image from a source code, but also information about what porting for an new ARM machine implies, and some of the methods used to debug various problems that could appear when working with the Linux kernel sources in general. In the end, the context will be switched to the Yocto Project to show how the Linux kernel can be built for a given machine, and also how an external module can be integrated and used later from a root filesystem image.

This chapter will give you an idea of the Linux kernel and Linux operating system. This presentation would not have been possible without the historical component. Linux and UNIX are usually placed in the same historical context, but although the Linux kernel appeared in 1991 and the Linux operating system quickly became an alternative to the UNIX operating system, these two operating systems are members of the same family. Taking this into consideration, the history of UNIX operating system could not have started from another place. This means that we need to go back in time to more than 40 years ago, to be more precise, about 45 years ago to 1969 when Dennis Ritchie and Ken Thompson started the development of UNIX.

The predecessor of UNIX was Multiplexed Information and Computing Service (Multics), a multiuser operating system project that was not on its best shape at the time. Since the Multics had become a nonviable solution for Bell Laboratories Computer Sciences Research Center in the summer of 1969, a filesystem design was born and it later became what is known today as UNIX. Over time, it was ported on multiple machines due to its design and the fact that the source code was distributed alongside it. The most prolific contributor to the UNIX was the University of California, Berkeley. They also developed their own UNIX version called Berkeley Software Distribution (BSD), that was first released in 1977. Until the 1990s, multiple companies developed and offered their own distributions of UNIX, their main inspirations being Berkeley or AT&T. All of them helped UNIX become a stable, robust, and powerful operating system. Among the features that made UNIX strong as an operating system, the following can be mentioned:

  • UNIX is simple. The number of system calls that it uses are reduced to only a couple of hundred and their design is basic
  • Everything is regarded as a file in UNIX, making the manipulation of data and devices simpler, and it minimizes system calls used for interaction.
  • Faster process creation time and the fork() system call.
  • The UNIX kernel and utilities written in C language as well as a property that makes it easily portable and accessible.
  • Simple and robust interprocess communication (IPC) primitives helps in the creation of fast and simple programs that accomplish only one thing in the best available manner.

Nowadays, UNIX is a mature operating system with support for features, such as virtual memory, TCP/IP networking, demand paging preemptive multiprocessing, and multithreading. The features spread is wide and varies from small embedded devices to systems with hundreds of processors. Its development has moved past the idea that UNIX is a research project, and it has become an operating system that is general-purpose and practically fits any needs. All this has happened due to its elegant design and proven simplicity. It was able to evolve without losing its capability to remain simple.

Linux is as an alternative solution to a UNIX variant called Minix, an operating system that was created for teaching purposes, but it lacked easy interaction with the system source code. Any changes made to the source code were not easily integrated and distributed because of Minix's license. Linus Torvalds first started working at a terminal emulator to connect to other UNIX systems from his university. Within the same academic year, emulator evolved in a full-fledged UNIX. He released it to be used by everyone in 1991.

One of the most attractive features of Linux is that it is an open source operating system whose source code is available under the GNU GPL license. When writing the Linux kernel, Linus Torvalds used the best design choices and features from the UNIX available in variations of the operating system kernel as a source of inspiration. Its license is what has propelled it into becoming the powerhouse it is today. It has engaged a large number of developers that helped with code enhancements, bug fixing, and much more.

Today, Linux is an experienced operating system that is able to run on a multitude of architectures. It is able to run on devices that are even smaller than a wristwatch or on clusters of supercomputer. It's the new sensation of our days and is being adopted by companies and developers around the world in an increasingly diversified manner. The interest in the Linux operating system is very strong and this implies not only diversity, but also offers a great number of benefits, ranging from security, new features, embedded solutions to server solution options, and many more.

Linux has become a truly collaborative project developed by a huge community over the internet. Although a great number of changes were made inside this project, Linus has remained its creator and maintainer. Change is a constant factor in everything around us and this applies to Linux and its maintainer, who is now called Greg Kroah-Hartman, and has already been its kernel maintainer for two years now. It may seem that in the period that Linus was around, the Linux kernel was a loose-knit community of developers. This may be because of Linus' harsh comments that are known worldwide. Since Greg has been appointed the kernel maintainer, this image started fading gradually. I am looking forward to the years to come.

The role of the Linux kernel

With an impressive numbers of code lines, the Linux kernel is one of the most prominent open source projects and at the same time, the largest available one. The Linux kernel constitutes a piece of software that helps with the interfacing of hardware, being the lowest-level code available that runs in everyone's Linux operating system. It is used as an interface for other user space applications, as described in the following diagram:

The main roles of the Linux kernel are as follows:

  • It provides a set of portable hardware and architecture APIs that offer user space applications the possibility to use necessary hardware resources
  • It helps with the management of hardware resources, such as a CPU, input/output peripherals, and memory
  • It is used for the management of concurrent accesses and the usage of necessary hardware resources by different applications.

To make sure that the preceding roles are well understood, an example will be very useful. Let's consider that in a given Linux operating system, a number of applications need access to the same resource, a network interface, or a device. For these elements, the kernel needs to multiplex a resource in order to make sure that all applications have access to it.

Delving into the features of the Linux kernel

This section will introduce a number of features available inside the Linux kernel. It will also cover information about each of them, how they are used, what they represent, and any other relevant information regarding each specific functionality. The presentation of each feature familiarizes you with the main role of some of the features available inside the Linux kernel, as well as the Linux kernel and its source code in general.

On a more general note, some of the most valuable features that the Linux kernel has are as follows:

  • Stability and reliability
  • Scalability
  • Portability and hardware support
  • Compliance with standards
  • Interoperability between various standards
  • Modularity
  • Ease of programming
  • Comprehensive support from the community
  • Security

The preceding features does not constitute actual functionalities, but have helped the project along its development process and are still helping it today. Having said this, there are a lot of features that are implemented, such as fast user space mutex (futex), netfileters, Simplified Mandatory Access Control Kernel (smack), and so on. A complete list of these can be accessed and studied at http://en.wikipedia.org/wiki/Category:Linux_kernel_features.

Memory mapping and management

When discussing the memory in Linux, we can refer to it as the physical and virtual memory. Compartments of the RAM memory are used for the containment of the Linux kernel variables and data structures, the rest of the memory being used for dynamic allocations, as described here:

The physical memory defines algorithms and data structures that are able to maintain the memory, and it is done at the page level relatively independently by the virtual memory. Here, each physical page has a struct page descriptor associated with it that is used to incorporate information about the physical page. Each page has a struct page descriptor defined. Some of the fields of this structure are as follows:

  • _count: This represents the page counter. When it reaches the 0 value, the page is added to the free pages list.
  • virtual: This represents the virtual address associated to a physical page. The ZONE_DMA and ZONE_NORMAL pages are always mapped, while the ZONE_HIGHMEN are not always mapped.
  • flags: This represents a set of flags that describe the attributes of the page.

The zones of the physical memory have been previously. The physical memory is split up into multiple nodes that have a common physical address space and a fast local memory access. The smallest of them is ZONE_DMA between 0 to 16Mb. The next is ZONE_NORMAL, which is the LowMem area between 16Mb to 896Mb, and the largest one is ZONE_HIGHMEM, which is between 900Mb to 4GB/64Gb. This information can be visible both in the preceding and following images:

The virtual memory is used both in the user space and the kernel space. The allocation for a memory zone implies the allocation of a physical page as well as the allocation of an address space area; this is done both in the page table and in the internal structures available inside the operating system. The usage of the page table differs from one architecture type to another. For the Complex instruction set computing (CISC) architecture, the page table is used by the processor, but on a Reduced instruction set computing (RISC) architecture, the page table is used by the core for a page lookup and translation lookaside buffer (TLB) add operations. Each zone descriptor is used for zone mapping. It specifies whether the zone is mapped for usage by a file if the zone is read-only, copy-on-write, and so on. The address space descriptor is used by the operating system to maintain high-level information.

The memory allocation is different between the user space and kernel space context because the kernel space memory allocation is not able to allocate memory in an easy manner. This difference is mostly due to the fact that error management in the kernel context is not easily done, or at least not in the same key as the user space context. This is one of the problems that will be presented in this section along with the solutions because it helps readers understand how memory management is done in the context of the Linux kernel.

The methods used by the kernel for memory handling is the first subject that will be discussed here. This is done to make sure that you understand the methods used by the kernel to obtain memory. Although the smallest addressable unit of a processor is a byte, the Memory Management Unit (MMU), the unit responsible for virtual to physical translation the smallest addressable unit is the page. A page's size varies from one architecture to another. It is responsible for maintaining the system's page tables. Most of 32-bit architectures use 4KB pages, whereas the 64-bit ones usually have 8KB pages. For the Atmel SAMA5D3-Xplained board, the definition of the struct page structure is as follows:

struct page {
        unsigned long   flags;
        atomic_t        _count;
        atomic_t        _mapcount;
        struct address_space *mapping;
        void        *virtual;
        unsigned long   debug_flags;
        void        *shadow;
        int        _last_nid;

};

This is one of the most important fields of the page structure. The flags field, for example, represents the status of the page; this holds information, such as whether the page is dirty or not, locked, or in another valid state. The values that are associated with this flag are defined inside the include/linux/page-flags-layout.h header file. The virtual field represents the virtual address associated with the page, count represents the count value for the page that is usually accessible indirectly through the page_count() function. All the other fields can be accessed inside the include/linux/mm_types.h header file.

The kernel divides the hardware into various zone of memory, mostly because there are pages in the physical memory that are not accessible for a number of the tasks. For example, there are hardware devices that can perform DMA. These actions are done by interacting with only a zone of the physical memory, simply called ZONE_DMA. It is accessible between 0-16 Mb for x86 architectures.

There are four main memory zones available and other two less notable ones that are defined inside the kernel sources in the include/linux/mmzone.h header file. The zone mapping is also architecture-dependent for the Atmel SAMA5D3-Xplained board. We have the following zones defined:

enum zone_type {
#ifdef CONFIG_ZONE_DMA
        /*
         * ZONE_DMA is used when there are devices that are not able
         * to do DMA to all of addressable memory (ZONE_NORMAL). Then we
         * carve out the portion of memory that is needed for these devices.
         * The range is arch specific.
         *
         * Some examples
         *
         * Architecture         Limit
         * ---------------------------
         * parisc, ia64, sparc  <4G
         * s390                 <2G
         * arm                  Various
         * alpha                Unlimited or 0-16MB.
         *
         * i386, x86_64 and multiple other arches
         *                      <16M.
         */
        ZONE_DMA,
#endif
#ifdef CONFIG_ZONE_DMA32
        /*
         * x86_64 needs two ZONE_DMAs because it supports devices that are
         * only able to do DMA to the lower 16M but also 32 bit devices that
         * can only do DMA areas below 4G.
         */
        ZONE_DMA32,
#endif
        /*
         * Normal addressable memory is in ZONE_NORMAL. DMA operations can be
         * performed on pages in ZONE_NORMAL if the DMA devices support
         * transfers to all addressable memory.
         */
        ZONE_NORMAL,
#ifdef CONFIG_HIGHMEM
        /*
         * A memory area that is only addressable by the kernel through
         * mapping portions into its own address space. This is for example
         * used by i386 to allow the kernel to address the memory beyond
         * 900MB. The kernel will set up special mappings (page
         * table entries on i386) for each page that the kernel needs to
         * access.
         */
        ZONE_HIGHMEM,
#endif
        ZONE_MOVABLE,
        __MAX_NR_ZONES
};

There are allocations that require interaction with more than one zone. One such example is a normal allocation that is able to use either ZONE_DMA or ZONE_NORMAL. ZONE_NORMAL is preferred because it does not interfere with direct memory accesses, though when the memory is at full usage, the kernel might use other available zones besides the ones that it uses in normal scenarios. The kernel that is available is a struct zone structure that defines each zone's relevant information. For the Atmel SAMA5D3-Xplained board, this structure is as shown here:

struct zone {
        unsigned long   watermark[NR_WMARK];
        unsigned long   percpu_drift_mark;
        unsigned long   lowmem_reserve[MAX_NR_ZONES];
        unsigned long   dirty_balance_reserve;
        struct per_cpu_pageset __percpu *pageset;
        spinlock_t        lock;
        int        all_unreclaimable;
        struct free_area        free_area[MAX_ORDER];
        unsigned int            compact_considered;
        unsigned int            compact_defer_shift;
        int                     compact_order_failed;
        spinlock_t              lru_lock;
        struct lruvec           lruvec;
        unsigned long         pages_scanned;
        unsigned long         flags;
        unsigned int        inactive_ratio;
        wait_queue_head_t       * wait_table;
        unsigned long         wait_table_hash_nr_entries;
        unsigned long         wait_table_bits;
        struct pglist_data    *zone_pgdat;
        unsigned long         zone_start_pfn;
        unsigned long         spanned_pages;
        unsigned long         present_pages;
        unsigned long         managed_pages;
        const char              *name;
};

As you can see, the zone that defines the structure is an impressive one. Some of the most interesting fields are represented by the watermark variable, which contain the high, medium, and low watermarks for the defined zone. The present_pages attribute represents the available pages within the zone. The name field represents the name of the zone, and others, such as the lock field, a spin lock that shields the zone structure for simultaneous access. All the other fields that can be identified inside the corresponding include/linux/mmzone.h header file for the Atmel SAMA5D3 Xplained board.

With this information available, we can move ahead and find out how the kernel implements memory allocation. All the available functions that are necessary for memory allocation and memory interaction in general, are inside the linux/gfp.h header file. Some of these functions are:

struct page * alloc_pages(gfp_t gfp_mask, unsigned int order)

This function is used to allocate physical pages in a continuous location. At the end, the return value is represented by the pointer of the first page structure if the allocation is successful, or NULL if errors occur:

void * page_address(struct page *page)

This function is used to get the logical address for a corresponding memory page:

unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)

This one is similar to the alloc_pages() function, but the difference is that the return variable is offered in the struct page * alloc_page(gfp_t gfp_mask) return argument:

unsigned long __get_free_page(gfp_t gfp_mask)
struct page * alloc_page(gfp_t gfp_mask)

The preceding two functions are wrappers over similar ones, the difference is that this function returns only one page information. The order for this function has the zero value:

unsigned long get_zeroed_page(unsigned int gfp_mask)

The preceding function does what the name suggests. It returns the page full of zero values. The difference between this function and the __get_free_page() function is that after being released, the page is filled with zero values:

void __free_pages(struct page *page, unsigned int order)
void free_pages(unsigned long addr, unsigned int order)
void free_page(unsigned long addr)

The preceding functions are used for freeing the given allocated pages. The passing of the pages should be done with care because the kernel is not able to check the information it is provided.

Page cache and page writeback

Usually the disk is slower than the physical memory, so this is one of the reasons that memory is preferred over disk storage. The same applies for processor's cache levels: the closer it resides to the processor the faster it is for the I/O access. The process that moves data from the disk into the physical memory is called page caching. The inverse process is defined as page writeback. These two notions will be presented in this subsection, but is it mainly about the kernel context.

The first time the kernel calls the read() system call, the data is verified if it is present in the page cache. The process by which the page is found inside the RAM is called cache hit. If it is not available there, then data needs to be read from the disk and this process is called cache miss.

When the kernel issues the write() system call, there are multiple possibilities for cache interaction with regard to this system call. The easiest one is to not cache the write system calls operations and only keep the data in the disk. This scenario is called no-write cache. When the write operation updates the physical memory and the disk data at the same time, the operation is called write-through cache. The third option is represented by write-back cache where the page is marked as dirty. It is added to the dirty list and over time, it is put on the disk and marked as not dirty. The best synonym for the dirty keyword is represented by the synchronized key word.

The process address space

Besides its own physical memory, the kernel is also responsible for user space process and memory management. The memory allocated for each user space process is called process address space and it contains the virtual memory addressable by a given process. It also contains the related addresses used by the process in its interaction with the virtual memory.

Usually a process receives a flat 32 or 64-bit address space, its size being dependent on the architecture type. However, there are operating systems that allocate a segmented address space. The possibility of sharing the address space between the operating systems is offered to threads. Although a process can access a large memory space, it usually has permission to access only an interval of memory. This is called a memory area and it means that a process can only access a memory address situated inside a viable memory area. If it somehow tries to administrate a memory address outside of its valid memory area, the kernel will kill the process with the Segmentation fault notification.

A memory area contains the following:

  • The text section maps source code
  • The data section maps initialized global variables
  • The bss section maps uninitialized global variables
  • The zero page section is used to process user space stack
  • The shared libraries text, bss and data-specific sections
  • Mapped files
  • Anonymous memory mapping is usually linked with functions, such as malloc()
  • Shared memory segments

A process address space is defined inside the Linux kernel source through a memory descriptor. This structure is called struct mm_struct, which is defined inside the include/linux/mm_types.h header file and contains information relevant for a process address space, such as the number of processes that use the address space, a list of memory areas, the last memory area that was used, the number of memory areas available, start and finish addresses for the code, data, heap and stack sections.

For a kernel thread, no process address space associated with it; for kernel, the process descriptor structure is defined as NULL. In this way, the kernel mentions that a kernel thread does not have a user context. A kernel thread only has access to the same memory as all the other processes. A kernel thread does not have any pages in a user space or access to the user space memory.

Since the processors work only with physical addresses, the translation between physical and virtual memory needs to be made. These operations are done by the page tables that split the virtual addresses into smaller components with associated indexes that are used for pointing purposes. In the majority of available boards and architectures in general, the page table lookup is handled by the hardware; the kernel is responsible for setting it up.

Process management

A process, as presented previously, is a fundamental unit in a Linux operating system and at the same time, is a form of abstraction. It is, in fact, a program in execution, but a program by itself is not a process. It needs to be in an active state and have associated resources. A process is able to become a parent by using the fork() function, which spawns a child process. Both parent and child processes reside in separate address spaces, but both of them have the same content. The exec() family of function is the one that is able to execute a different program, create an address space, and load it inside that address space.

When fork() is used, the resources that the parent process has are reproduced for the child. This function is implemented in a very interesting manner; it uses the clone() system call that, at it's base, contains the copy_process() function. This functions does the following:

  • Calls the dup_task_struct() function to create a new kernel stack. The task_struct and thread_info structures are created for a new process.
  • Checks that the child does not go beyond the limits of the memory area.
  • The child process distinguishes itself from its parent.
  • It is set as TASK_UNINTERRUPTIBLE to make sure it does not run.
  • Flags are updated.
  • PID is associated with the child process.
  • The flags that are already set are inspected and proper action is performed with respect to their values.
  • The clean process is performed at the end when the child process pointer is obtained.

Threads in Linux are very similar to processes. They are viewed as processes that share various resources, such as memory address space, open files, and so on. The creation of threads is similar to a normal task, the exception being the clone() function, which passes flags that mention shared resources. For example, the clone function calls for a thread, which is clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0), while for the normal fork looks similar to clone(SIGCHLD, 0).

The notion of kernel threads appeared as a solution to problems involving tasks running in the background of the kernel context. The kernel thread does not have an address space and is only available inside the kernel context. It has the same properties as a normal process, but is only used for special tasks, such as ksoftirqd, flush, and so on.

At the end of the execution, the process need to be terminated so that the resources can be freed, and the parent of the executing process needs to be notified about this. The method that is most used to terminate a process is done by calling the exit() system call. A number of steps are needed for this process:

  1. The PF_EXITING flag is set.
  2. The del_timer_sync() function is called to remove the kernel timers.
  3. The acct_update_integrals() function is called when writing accounting and logging information.
  4. The exit_mm() is called to release the mm_struct structure for the process.
  5. The exit_sem() is called to dequeue the process from the IPC semaphore.
  6. The exit_files() and exit_fs() function are called to remove the links to various files descriptors.
  7. The task exit code should be set.
  8. Call exit_notify() to notify the parent and set the task exit state to EXIT_ZOMBIE.
  9. Call schedule() to switch to a new process.

After the preceding steps are performed, the object associated with this task is freed and it becomes unrunnable. Its memory exists solely as information for its parent. After its parent announces that this information is of no use to it, this memory is freed for the system to use.

Process scheduling

The process scheduler decides which resources are allocated for a runnable process. It is a piece of software that is responsible for multitasking, resource allocation to various processes, and decides how to best set the resources and processor time. it also decides which processes should run next.

The first design of the Linux scheduler was very simplistic. It was not able to scale properly when the number of processes increased, so from the 2.5 kernel version, a new scheduler was developed. It is called O(1) scheduler and offers a constant time algorithm for time slice calculation and a run queue that is defined on a per-processor basis. Although it is perfect for large servers, it is not the best solution for a normal desktop system. From the 2.6 kernel version, improvements have been made to the O(1) scheduler, such as the fair scheduling concept that later materialized from the kernel version 2.6.23 into the Completely Fair Scheduler (CFS), which became the defacto scheduler.

The CFC has a simple idea behind. It behaves as if we have a perfect multitasking processor where each process gets 1/n slice of the processor's time and this time slice is an incredibly small. The n value represents the number of running processes. Con Kolivas is the Australian programmer that contributed to the fair scheduling implementation, also known as Rotating Staircase Deadline Scheduler (RSDL). Its implementation required a red-black tree for the priorities of self-balancing and also a time slice that is calculated at the nanosecond level. Similarly to the O(1) scheduler, CFS applies the notion of weight, which implies that some processes wait more than others. This is based on the weighed fair queuing algorithm.

A process scheduler constitutes one of the most important components of the Linux kernel because it defines the user interaction with the operating system in general. The Linux kernel CFS is the scheduler that appeals to developers and users because it offers scalability and performance with the most reasonable approach.

System calls

For processes to interact with a system, an interface should be provided to give the user space application the possibility of interacting with hardware and other processes.System calls. These are used as an interface between the hardware and the user space. They are also used to ensure stability, security, and abstraction, in general. These are common layers that constitute an entry point into the kernel alongside traps and exceptions, as described here:

The interaction with most of the system calls that are available inside the Linux system is done using the C library. They are able to define a number of arguments and return a value that reveals whether they were successful or not. A value of zero usually means that the execution ended with success, and in case errors appear, an error code will be available inside the errno variable. When a system call is done, the following steps are followed:

  1. The switch into kernel mode is made.
  2. Any restrictions to the kernel space access are eliminated.
  3. The stack from the user space is passed into the kernel space.
  4. Any arguments from the user space are checked and copied into the kernel space.
  5. The associated routine for the system call is identified and run.
  6. The switch to the user space is made and the execution of the application continues.

A system call has a syscall number associated with it, which is a unique number used as a reference for the system call that cannot be changed (there is no possibility of implementing a system call). A symbolic constant for each system call number is available in the <sys/syscall.h> header file. To check the existence of a system call, sys_ni_syscall() is used, which returns the ENOSYS error for an invalid system call.

The virtual file system

The Linux operating system is able to support a large variety of filesystem options. This is done due to the existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.

The filesystem types supported by the VFS can be put in these three categories:

  • Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
    • Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
    • UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
    • Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
    • ISO966 CD-ROM filesystem and disk format DVD filesystem
    • Proprietary filesystems, such as the ones from Apple, IBM, and other companies
  • Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
  • Special filesystems: The /proc filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.

The virtual filesystem system call implementation is very well summarized in this image:

In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open(), close(), read(), write() functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open() system calls sys_open()and it takes the same arguments as open() and returns the same result. The difference between sys_open() and open() is that sys_open() is a more permissive function.

All the other three system calls have corresponding sys_read(), sys_write(), and sys_close() functions that are called internally.

Interrupts

An interrupt is a representation of an event that changes the succession of instructions performed by the processor. Interrupts imply an electric signal generated by the hardware to signal an event that has happened, such as a key press, reset, and so on. Interrupts are divided into more categories depending on their reference system, as follows:.

  • Software interrupts: These are usually exceptions triggered from external devices and user space programs
  • Hardware interrupts: These are signals from the system that usually indicate a processor specific instruction

The Linux interrupt handling layer offers an abstraction of interrupt handling for various device drivers through comprehensive API functions. It is used to request, enable, disable, and free interrupts, making sure that portability is guaranteed on multiple platforms. It handles all available interrupt controller hardware.

The generic interrupt handling uses the __do_IRQ() handler, which is able to deal with all the available types of the interrupt logic. The handling layers are divided in two components:

  • The top half component is used to respond to the interrupt
  • The bottom half component is scheduled by the top half to run at a later time

The difference between them is that all the available interrupts are permitted to act in the bottom half context. This helps the top half respond to another interrupt while the bottom half is working, which means that it is able to save its data in a specific buffer and it permits the bottom half to operate in a safe environment.

For the bottom half processing, there are four defined mechanisms available:

  • Softirqs
  • Tasklets
  • Work queues
  • Kernel threads

The available mechanisms are well presented here:

Although the model for the top and bottom half interrupt mechanism looks simple, it has a very complicated function calling mechanism model. This example shows this fact for the ARM architecture:

For the top half component of the interrupt, there are three levels of abstraction in the interrupt source code. The first one is the high-level driver API that has functions, such as request_irq(), free_irq, disable_irq(), enable_irq(), and so on. The second one is represented by the high-level IRQ flow handlers, which is a generic layer with predefined or architecture-specific interrupt flow handlers assigned to respond to various interrupts during device initialization or boot time. It defines a number of predefined functions, such as handle_level_irq(), handle_simple_irq(), handle_percpu_irq(), and so on. The third is represented by chip-level hardware encapsulation. It defines the struct irq_chip structure that holds chip-relevant functions used in the IRQ flow implementation. Some of the functions are irq_ack(), irq_mask(), and irq_unmask().

A module is required to register an interrupt channel and release it afterwards. The total number of supported requests is counted from the 0 value to the number of IRQs-1. This information is available inside the <asm/irq.h> header file. When the registration is done, a handler flag is passed to the request_irq() function to specify the interrupt handler's type, as follows:

  • SA_SAMPLE_RANDOM: This indicates that the interrupt can contribute to the entropy pool, that is, a pool with bits that possess a strong random property, by sampling unpredictable events, such as mouse movement, inter-key press time, disk interrupts, and so on
  • SA_SHIRQ: This shows that the interrupt is sharable between devices.
  • SA_INTERRUPT: This indicates a fast interrupt handler, so interrupts are disabled on the current processor-it does not represent a situation that is very desirable

Bottom halves

The first mechanism that will be discussed regarding bottom half interrupt handling is represented by softirqs. They are rarely used but can be found on the Linux kernel source code inside the kernel/softirq.c file. When it comes to implementation, they are statically allocated at the compile step. They are created when an entry is added in the include/linux/interrupt.h header file and the system information they provide is available inside the /proc/softirqs file. Although not used too often, they can be executed after exceptions, interrupts, system calls, and when the ksoftirkd daemon is run by the scheduler.

Next on the list are tasklets. Although they are built on top of softirqs, they are more commonly used for bottom half interrupt handling. Here are some of the reasons why this is done:

  • They are very fast
  • They can be created and destroyed dynamically
  • They have atomic and nonblocking code
  • They run in a soft interrupt context
  • They run on the same processor that they were scheduled for

Tasklets have a struct tasklet_struct structure available. These are also available inside the include/linux/interrupt.h header file, and unlike softirqs, tasklets are non-reentrant.

Third on the list are work queues that represent a different form of doing the work allotted in comparison to previously presented mechanisms. The main differences are as follows:

  • They are able run in the same time on more the one CPU
  • They are allowed to go to sleep
  • They runs on a process context
  • They can be scheduled or preempted

Although they might have a latency that is slightly bigger the tasklets, the preceding qualities are really useful. The tasklets are built around the struct workqueue_struct structure, available inside the kernel/workqueue.c file.

The last and the newest addition to the bottom half mechanism options is represented by the kernel threads that are operated entirely in the kernel mode since they are created/destroyed by the kernel. They appeared during the 2.6.30 kernel release, and also have the same advantages as the work queues, along with some extra features, such as the possibility of having their own context. It is expected that eventually the kernel threads will replace the work queues and tasklets, since they are similar to the user space threads. A driver might want to request a threaded interrupt handler. All it needs to do in this case is to use request_threaded_irq() in a similar way to request_irq(). The request_threaded_irq() function offers the possibility of passing a handler and thread_fn to split the interrupt handling code into two parts. In addition to this, quick_check_handler is called to check if the interrupt was called from a device; if that is the case, it will need to call IRQ_WAKE_THREAD to wake up the handler thread and execute thread_fn.

Methods to perform kernel synchronization

The number of requests with which a kernel is dealing is likened to the number of requests a server has to receive. This situation can deal with race conditions, so a good synchronization method would be required. A number of policies are available for the way the kernel behaves by defining a kernel control path. Here is an example of a kernel control path:

The preceding image offers a clear picture as to why synchronization is necessary. For example, a race condition can appear when more than one kernel control path is interlinked. To protect these critical regions, a number of measures should be taken. Also, it should be taken into consideration that an interrupt handler cannot be interrupted and softirqs should not be interleaved.

A number of synchronization primitives have been born:

  • Per-CPU variables: This is one of the most simple and efficient synchronization methods. It multiplies a data structure so that each one is available for each CPU.
  • Atomic operations: This refers to atomic read-modify-write instructions.
  • Memory barrier: This safeguards the fact that the operations done before the barrier are all finished before starting the operations after it.
  • Spin lock: This represents a type of lock that implements bust waiting.
  • Semaphore: This is a form of locking that implements sleep or blocking waiting.
  • Seqlocks: This is similar to spin locks, but is based on an access counter.
  • Local interrupt disabling: This forbids the use of functions that can be postponed on a single CPU.
  • Read-copy-update(RCU): This is a method designed to protect the most used data structures used for reading. It offers a lock-free access to shared data structures using pointers.

With the preceding methods, race condition situations try to be fixed. It is the job of the developer to identify and solve all the eventual synchronization problems that might appear.

Timers

Around the Linux kernel, there are a great number of functions that are influenced by time. From the scheduler to the system uptime, they all require a time reference, which includes both absolute and relative time. For example, an event that needs to be scheduled for the future, represents a relative time, which, in fact, implies that there is a method used to count time.

The timer implementation can vary depending on the type of the event. The periodical implementations are defined by the system timer, which issues an interrupt at a fixed period of time. The system timer is a hardware component that issues a timer interrupt at a given frequency to update the system time and execute the necessary tasks. Another one that can be used is the real-time clock, which is a chip with a battery attached that keeps counting time long after the system was shut down. Besides the system time, there are dynamic timers available that are managed by the kernel dynamically to plan events that run after a particular time has passed.

The timer interrupt has an occurrence window and for ARM, it is 100 times per second. This is called the system timer frequency or tick rate and its unit of measurement is hertz (Hz). The tick rate differs from one architecture to another. If for the most of them, we have the value of 100 Hz, there are others that have values of 1024 Hz, such as the Alpha and Itanium (IA-64) architectures, for example. The default value, of course, can be changed and increased, but this action has its advantages and disadvantages.

Some of the advantages of higher frequency are:

  • The timer will be executed more accurately and in a larger number
  • System calls that use a timeout are executed in a more precise manner
  • Uptime measurements and other similar measurements are becoming more precise
  • The preemption of process is more accurate

The disadvantages of higher frequency on the other hand, implies a higher overhead. The processors spend more time in a timer interrupt context; also, an increase in power consumption will take place because more computing is done.

The total number of ticks done on a Linux operation system from the time it started booting is stored in a variable called jiffies inside the include/linux/jiffies.h header file. At boot time, this variable is initialized to zero and one is added to its value each time an interrupt happens. So, the actual value of the system uptime can be calculated in the form of jiffies/Hz.

Linux kernel interaction

Until now, you were introduced to some of features of the Linux kernel. Now, it is time to present more information about the development process, versioning scheme, community contributions, and and interaction with the Linux kernel.

The development process

Linux kernel is a well known open source project. To make sure that developers know how to interact with it, information about how the git interaction is done with this project, and at the same time, some information about its development and release procedures will be presented. The project has evolved and its development processes and release procedures have evolved with it.

Before presenting the actual development process, a bit of history will be necessary. Until the 2.6 version of the Linux kernel project, one release was made every two or three years, and each of them was identified by even middle numbers, such as 1.0.x, 2.0.x, and 2.6.x. The development branches were instead defined using even numbers, such as 1.1.x, 2.1.x, and 2.5.x, and they were used to integrate various features and functionalities until a major release was prepared and ready to be shipped. All the minor releases had names, such as 2.6.32 and 2.2.23, and they were released between major release cycles.

This way of working was kept up until the 2.6.0 version when a large number of features were added inside the kernel during every minor release, and all of them were very well put together as to not cause the need for the branching out of a new development branch. This implied a faster pace of release with more features available. So, the following changes have appeared since the release of the 2.6.14 kernel:

  • All the new minor release versions, such as 2.6.x, contain a two week merge window in which a number of features could be introduced in the next release
  • This merge window will be closed with a release test version called 2.6.(x+1)-rc1
  • Then a 6-8 weeks bug fixing period follows when all the bugs introduced by the added features should be fixed
  • In the bug fixing interval, tests were run on the release candidate and the 2.6.(x+1)-rcY test versions were released
  • After the final test were done and the last release candidate is considered sufficiently stable, a new release will be made with a name, such as 2.6.(x+1), and this process will be continued once again

This process worked great but the only problem was that the bug fixes were only released for the latest stable versions of the Linux kernel. People needed long term support versions and security updates for their older versions, general information about these versions that were long time supported, and so on.

This process changed in time and in July 2011, the 3.0 Linux kernel version appeared. It appeared with a couple of small changes designed to change the way the interaction was to be done to solve the previously mentioned requests. The changes were made to the numbering scheme, as follows:

  • The kernel official versions would be named 3.x (3.0, 3.1, 3.2, and so on)
  • The stable versions would be named 3.x.y (3.0.1, 3.1.3, and so on)

Although it only removed one digit from the numbering scheme, this change was necessary because it marked the 20th anniversary of the Linux kernel.

Since a great number of patches and features are included in the Linux kernel everyday, it becomes difficult to keep track of all the changes, and the bigger picture in general. This changed over time because sites, such as http://kernelnewbies.org/LinuxChanges and http://lwn.net/, appeared to help developers keep in touch with the world of Linux kernel.

Besides these links, the git versioning control system can offer much needed information. Of course, this requires the existence of Linux kernel source clones to be available on the workstation. Some of the commands that offer a great deal of information are:

  • git log: This lists all the commits with the latest situated on top of the list
  • git log –p: This lists all the commits and with their corresponding diffs
  • git tag –l: This lists the available tags
  • git checkout <tagname>: This checks out a branch or tag from a working repository
  • git log v2.6.32..master: This lists all the changes between the given tag and the latest version
  • git log –p V2.6.32..master MAINTAINERS: This lists all the differences between the two given branches in the MAINTAINERS file

Of course, this is just a small list with helpful commands. All the other commands are available at http://git-scm.com/docs/.

Kernel porting

The Linux kernel offers support for a large variety of CPU architectures. Each architecture and individual board have their own maintainers, and this information is available inside the MAINTAINERS file. Also, the difference between board porting is mostly given by the architecture, PowerPC being very different from ARM or x86. Since the development board that this book focuses on is an Atmel with an ARM Cortex-A5 core, this section will try to focus on ARM architecture.

The main focus in our case is the arch/arm directory, which contains sub directories such as, boot, common, configs, crypto, firmware, kernel, kvm, lib, mm, net, nwfpe, oprofile, tools, vfp, and xen. It also contains an important number of directories that are specific for different CPU families, such as the mach-* directories or the plat-* directories. The first mach-* category contains support for the CPU and several boards that use that CPU, and the second plat-* category contains platform-specific code. One example is plat-omap, which contains common code for both mach-omap1 and mach-omap2.

The development for the ARM architecture has suffered a great change since 2011. If until then ARM did not use a device tree, it was because it needed to keep a large portion of the code inside the mach-* specific directory, and for each board that had support inside the Linux kernel, a unique machine ID was associated and a machine structure was associates with each board that contained specific information and a set of callbacks. The boot loader passed this machine ID to a specific ARM registry and in this way, the kernel knew the board.

The increase in popularity of the ARM architecture came with the refactoring of the work and the introduction of the device tree that dramatically reduced the amount of code available inside the mach-* directories. If the SoC is supported by the Linux kernel, then adding support for a board is as simple as defining a device tree in the /arch/arm/boot/dts directory with an appropriate name. For example, for <soc-name>-<board-name>.d, include the relevant dtsi files if necessary. Make sure that you build the device tree blob (DTB) by including the device tree into arch/arm/boot/dts/Makefile and add the missing device drivers for board.

In the eventuality that the board does not have support inside the Linux kernel, the appropriate additions would be required inside the mach-* directory. Inside each mach-* directory, there are three types of files available:

  • Generic code files: These usually have a single word name, such as clock.c, led.c, and so on
  • CPU specific code: This is for the machine ID and usually has the <machine-ID>*.c form - for example, at91sam9263.c, at91sam9263_devices.c, sama5d3.c, and so on
  • Board specific code: This usually is defined as board-*.c, such as board-carmeva.c, board-pcontrol-g20.c, board-pcontrol-g20.c, and so on

For a given board, the proper configuration should be made first inside the arch/arm/mach-*/Kconfig file; for this, the machine ID should be identified for the board CPU. After the configuration is done, the compilation can begin, so for this, arch/arm/mach-*/Makefile should also be updated with the required files to ensure board support. Another step is represented by the machine structure that defines the board and the machine type number that needs to be defined in the board-<machine>.c file.

The machine structure uses two macros: MACHINE_START and MACHINE_END. Both are defined inside arch/arm/include/asm/march/arch.h and are used to define the machine_desc structure. The machine type number is available inside the arch/arm/tools/mach_types file. This file is used to generate the include/asm-arm/mach-types.h file for the board.

Note

The updated number list of the machine type is available at http://www.arm.linux.org.uk/developer/machines/download.php.

When the boot process starts in the first case, only the dtb is necessary to pass to the boot loader and loaded to initialize the Linux kernel, while in the second case, the machine type number needs to be loaded in the R1 register. In the early boot process, __lookup_machine_type looks for the machine_desc structure and loads it for the initialization of the board.

Community interaction

After this information has been presented to you, and if you are eager to contribute to the Linux kernel, then this section should be read next. If you want to really contribute to the Linux kernel project, then a few steps should be performed before starting this work. This is mostly related to documentation and investigation of the subject. No one wants to send a duplicate patch or replicate the work of someone else in vain, so a search on the Internet on the topic of your interest could save a lot of time. Other useful advice is that after you've familiarized yourself with the subject, avoid sending a workaround. Try to reach the problem and offer a solution. If not, report the problem and describe it thoroughly. If the solution is found, then make both the problem and solution available in the patch.

One of the most valuable things in the open source community is the help you can get from others. Share your question and issues, but do not forget to mention the solution also. Ask the questions in appropriate mailing lists and try to avoid the maintainers, if possible. They are usually very busy and have hundreds and thousands of e-mails to read and reply. Before asking for help, try to research the question you want to raise, it will help both when formulating it but also it could offer an answer. Use IRC, if available, for smaller questions and lastly, but most importantly, try to not overdo it.

When you are preparing for a patch, make sure that it is done on the corresponding branch, and also that you read the Documentation/BUG-HUNTING file first. Identify bug reports, if any, and make sure you link your patch to them. Do not hesitate to read the Documentation/SubmittingPatches guidelines before sending. Also, do not send your changes before testing them properly. Always sign your patches and make the first description line as suggestive as possible. When sending the patches, find appropriate mailing lists and maintainers and wait for the replies. Solve comments and resubmit them if this is needed, until the patch is considered acceptable.

Kernel sources

The official location for the Linux kernel is available at their features or even maintain their own versions.

Although the Linux core contains the scheduler, memory management, and other features, it is quite small in size. The extremely large number of device drivers, architectures and boards support together with filesystems, network protocols and all the other components were the ones that made the size of the Linux kernel really big. This can be seen by taking a look at the size of the directories of the Linux.

The Linux source code structure contains the following directories:

  • arch: This contains architecture-dependent code
  • block: This contains the block layer core
  • crypto: This contains cryptographic libraries
  • drivers: This gathers all the implementation of the device drivers with the exception of the sound ones
  • fs: This gathers all the available implementations of filesystem
  • include: This contains the kernel headers
  • init: This has the Linux initialization code
  • ipc: This holds the interprocess communication implementation code
  • kernel: This is the core of the Linux kernel
  • lib: This contains various libraries, such as zlibc, crc, and so on
  • mm: This contains the source code for memory management
  • net: This offers access to all the network protocol implementations supported inside Linux
  • samples: This presents a number of sample implementations, such as kfifo, kobject, and so on
  • scripts: This is used both internally and externally
  • security: This has a bunch of security implementation, such as apparmor, selinux, smack, and so on
  • sound: This contains sound drivers and support code
  • usr: This is the initramfs cpio archive that generates sources
  • virt: This holds the source code for the virtualization support
  • COPYING: This represents the Linux license and the definition copying conditions
  • CREDITS: This represents the collection of Linux's main contributors
  • Documentation: This contains corresponding documentation of kernel sources
  • Kbuild: This represents the top-level kernel build system
  • Kconfig: This is the top-level descriptor for configuration parameters
  • MAINTAINERS: This a list with the maintainers of each kernel component
  • Makefile: This represents the top-level makefile
  • README: This file describes what Linux is, it is the starting point for understanding the project
  • REPORTING-BUGS: This offers information regarding the bug report procedure

As it can be seen, the source code of the Linux kernel is quite large, so a browsing tool would be required. There are a number of tools that can be used, such as Cscope, Kscope, or the web browser, Linux Cross Reference (LXR). Cscope is a huge project that can be also available with extensions for vim and emacs.

Configuring kernel

Before building a Linux kernel image, the proper configuration needs to be done. This is hard, taking into consideration that we have access to hundreds and thousands of components, such as drivers, filesystems, and other items. A selection process is done inside the configuration stage, and this is possible with the help of dependency definitions. The user has the chance to use and define a number of options that are enabled in order to define the components that will be used to build a Linux kernel image for a specific board.

All the configurations specific for a supported board are located inside a configuration file, simply named .config, and it is situated on the same level as the previously presented files and directory locations. Their form is usually represented as configuration_key=value. There are, of course, dependencies between these configurations, so they are defined inside the Kconfig files.

Here are a number of variable options available for a configuration key:

  • bool: These are the options can have true or false values
  • tristate: This, besides the true and false options, also appears as a module option
  • int : These values, are not that spread but they usually have a well-established value range
  • string : These values, are also not the most spread ones but usually contain some pretty basic information

With regard to the Kconfig files, there are two options available. The first one makes option A visible only when option B is enabled and is defined as depends on, and the second option offers the possibility of enabling option A. This is done when the option is enabled automatically and is defined as select.

Besides the manual configuration of the .config file, configuration is the worst option for a developer, mostly because it can miss dependencies between some of the configurations. I would like to suggest to developers to use the make menuconfig command that will launch a text console tool for the configuration of a kernel image.

Compiling and installing the kernel

After the configuration is done, the compilation process can be started. A piece of advice I would like to give is to use as many threads as possible if the host machine offers this possibility because it would help with the build process. An example of the build process start command is make –j 8.

At the end of the build process, a vmlinux image is offered and also some architecture-dependent images are made available inside the architecture-specific files for the ARM architecture. The result of this is available inside arch/arm/boot/*Image. Also, the Atmel SAMA5D3-Xplained board will offer a specific device tree file that is available in arch/arm/boot/dts/*.dtb. If the vmlinux image file is an ELF file with debug information that cannot be used for booting except for debug purposes, the arch/arm/boot/*Image file is the solution for this purpose.

The installation is the next step when development is done for any other application. The same also takes place for the Linux kernel, but in an embedded environment, this step seems kind of unnecessary. For Yocto enthusiasts, this step is also available. However, in this step, proper configurations are done for the kernel source and headers are to be used by the dependencies that do the storing for the deploy step.

The kernel modules, as mentioned in the cross-compilation chapter, need to be later used for the compiler build. The install for the kernel modules could be done using the make modules_install command, and this offers the possibility to install the sources available inside the /lib/modules/<linux-kernel-version> directory with all the module dependencies, symbols, and aliases.

Cross-compiling the Linux kernel

In an embedded development, the compilation process implies cross-compilation, the most visible difference with the native compilation process being the fact that it has a prefix with the target architecture available in the naming. The prefix setup can be done using the ARCH variable that defines the name of the architecture of the target board and the CROSS_COMPILE variable that defines the prefix for the cross-compilation toolchain. Both of them are defined in the top-level Makefile.

The best option would be to set these variables as environment variables to make sure that a make process is not run for the host machine. Although it only works in the current terminal, it will be the best solution in the situation that no automation tool is available for these tasks, such as the Yocto Project. It is not recommended though to update the .bashrc shell variables if you are planning to use more than one toolchain on the host machine.

Devices and modules

As I mentioned previously, the Linux kernel has a lot of kernel modules and drivers that are already implemented and available inside the source code of the Linux kernel. A number of them, being so many, are also available outside the Linux kernel source code. Having them outside not only reduces the boot time by not initializing them at boot time, but is done instead at the request and needs of users. The only difference is that the loading and unloading of the modules requires root access.

Loading and interacting with the Linux kernel modules requires logging information to be made available. The same happens for any kernel module dependencies. The logging information is available through the dmesg command and the level of logging enables manual configuration using the loglevel parameter or it can be disabled with the quite parameter. Also for the kernel dependencies, information about them is available inside the /lib/modules/<kernel-version>/modules.dep file.

For module interaction, multiple utilities used for multiple operations are available, such as modinfo, which is used for information gathering about modules; insmod is able for loading a module when the fill path to the kernel module is given. Similar utilities for a module are available. One of them is called modprobe and the difference in modprobe is that the full path is not necessary, as it is responsible for loading dependent modules of the chosen kernel object before loading itself. Another functionality that modprobe offers is the –r option. It is the remove functionality which offers support for removing the module and all its dependencies. An alternative to this is the rmmod utility, which removes modules not used anymore. The last utility available is lsmod, which lists the loaded modules.

The simplest kernel module example that can be written looks something similar to this:

#define MODULE
#define LINUX
#define __KERNEL__

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>

static int hello_world_init(void)
{
   printk(KERN_ALERT "Hello world!\n");
   return 0;
}

static void hello_world_exit(void)
{
   printk(KERN_ALERT "Goodbye!\n");
}

module_init(hello_world_init);
module_exit(hello_world_exit);

MODULE_LICENSE("GPL");

This is a simple hello world kernel module. Useful information that can be gathered from the preceding example is that every kernel module needs a start function defined in the preceding example as hello_world_init(). It is called when the module is inserted, and a cleanup function called hello_world_exit() is called when the module is removed.

Since the Linux kernel version 2.2, there is a possibility of using the _init and __exit macros in this way:

static int __init hello_world_init (void)
static void __exit hello_world_exit (void)

The preceding macros are removed, the first one after the initialization, and the second one when the module is built-in within the Linux kernel sources.

Note

More information about the Linux kernel modules can be found in the Linux Kernel Module Programming Guide available at http://www.tldp.org/LDP/lkmpg/2.6/html/index.html.

As mentioned previously, a kernel module is not only available inside a Linux kernel, but also outside of the Linux kernel tree. For a built-in kernel module, the compile process is similar to the one of other available kernel modules and a developer can inspire its work from one of them. The kernel module available outside of the Linux kernel drivers and the build process requires access to the sources of the Linux kernel or the kernel headers.

For a kernel module available outside of the Linux kernel sources, a Makefile example is available, as follows:

KDIR := <path/to/linux/kernel/sources>
PWD := $(shell pwd)
obj-m := hello_world.o
all:
$(MAKE) ARCH=arm CROSS_COMPILE=<arm-cross-compiler-prefix> -C
$(KDIR) M=$(PWD)

For a module that is implemented inside a Linux kernel, a configuration for the module needs to be made available inside the corresponding Kconfig file with the correct configuration. Also, the Makefile near the Kconfig file needs to be updated to let the Makefile system know when the configuration for the module is updated and the sources need to be built. We will see an example of this kind for a kernel device driver here.

An example of the Kconfig file is as follows:

config HELLO_WORLD_TEST 
 tristate "Hello world module test"
 help
 To compile this driver as a module chose the M option.
 otherwise chose Y option.

An example of the Makefile is as follows:

obj-$(CONFIG_ HELLO_WORLD_TEST) += hello_world.c

In both these examples, the source code file is hello_world.c and the resulting kernel module if it is not built-in is called hello_world.ko.

A driver is usually used as an interface with a framework that exposes a number of hardware features, or with a bus interface used to detect and communicate with the hardware. The best example is shown here:

Since there are multiple scenarios of using a device driver and three device mode structures are available:

  • struct bus_type: This represents the types of busses, such as I2C, SPI, USB, PCI, MMC, and so on
  • struct device_driver: This represents the driver used to handle a specific device on a bus
  • struct device: This is used to represent a device connected to a bus

An inheritance mechanism is used to create specialized structures from more generic ones, such as struct device_driver and struct device for every bus subsystem. The bus driver is the one responsible for representing each type of bus and matching the corresponding device driver with the detected devices, detection being done through an adapter driver. For nondiscoverable devices, a description is made inside the device tree or the source code of the Linux kernel. They are handled by the platform bus that supports platform drivers and in return, handles platform devices.

Debugging a kernel

Having to debug the Linux kernel is not the most easy task, but it needs to be accomplished to make sure that the development process moves forward. Understanding the Linux kernel is, of course, one of the prerequisites. Some of the available bugs are very hard to solve and may be available inside the Linux kernel for a long period of time.

For most of the trivial ones, some of the following steps should be taken. First, identify the bug properly; it is not only useful when define the problem, but also helps with reproducing it. The second step involves finding the source of the problem. Here, I am referring to the first kernel version in which the bug was first reported. Good knowledge about the bug or the source code of the Linux kernel is always useful, so make sure that you understand the code before you start working on it.

The bugs inside the Linux kernel have a wide spread. They vary from a variable not being stored properly to race conditions or hardware management problems, they have widely variable manifestations and a chain of events. However, debugging them is not as difficult as it sounds. Besides some specific problems, such as race conditions and time constraints, debugging is very similar to the debugging of any large user space application.

The first, easiest, and most handy method to debug the kernel is the one that involves the use of the printk() function. It is very similar to the printf() C library function, and although old and not recommended by some, it does the trick. The new preferred method involves the usage of the pr_*() functions, such as pr_emerg(), pr_alert(), pr_crit(), pr_debug(), and so on. Another method involves the usage of the dev_*() functions, such as dev_emerg(), dev_alert(), dev_crit(), dev_dbg(), and so on. They correspond to each logging level and also have extra functions that are defined for debugging purposes and are compiled when CONFIG_DEBUG is enabled.

Note

More information about the pr_*() and dev_*() family of functions can be found inside the Linux kernel source code at Documentation/dynamic-debug-howto.txt. You can also find more information about loglevel at Documentation/kernel-parameters.txt.

When a kernel oops crash appears, it signals that the kernel has made a mistake. Not being able to fix or kill itself, it offers access to a bunch of information, such as useful error messages, registers content, and back trace information.

The Magic SysRq key is another method used in debugging. It is enabled by CONFIG_MAGIC_SYSRQ config and can be used to debug and rescue kernel information, regardless of its activity. It offers a series of command-line options that can be used for various actions, ranging from changing the nice level to rebooting the system. Plus, it can be toggled on or off by changing the value in the /proc/sys/kernel/sysrq file. More information about the system request key can be found at Documentation/sysrq.txt.

Although Linus Torvalds and the Linux community do not believe that the existence of a kernel debugger will do much good to a project, a better understanding of the code is the best approach for any project. There are still some debugger solutions that are available to be used. GNU debugger (gdb) is the first one and it can be used in the same way as for any other process. Another one is the kgdb a patch over gdb that permits debugging of serial connections.

If none of the preceding methods fail to help solve the problem and you've tried everything but can't seem to arrive at a solution, then you can contact the open source community for help. There will always will be developers there who will lend you a hand.

Note

To acquire more information related to the Linux kernel, there are a couple of books that can be consulted. I will present a bunch of their names here: Embedded Linux Primer by Christopher Hallinan, Linux Kernel Development by Robert Love, Linux Kernel In A Nutshell by Greg Kroah-Hartman, and last but not the least, Understanding the Linux Kernel by Daniel P. Bovet and Marco Cesati.

The Yocto Project reference

Moving on to the Yocto Project, we have recipes available for every kernel version available inside the BSP support for each supported board, and recipes for kernel modules that are built outside the Linux kernel source tree.

The Atmel SAMA5D3-Xplained board uses the linux-yocto-custom kernel. This is defined inside the conf/machine/sama5d3-xplained.conf machine configuration file using the PREFERRED_PROVIDER_virtual/kernel variable. No PREFERRED_VERSION is mentioned, so the latest version is preferred; in this case, we are talking about the linux-yocto-custom_3.10.bb recipe.

The linux-yocto-custom_3.10.bb recipe fetches the kernel sources available inside Linux Torvalds' git repository. After a quick look at the sources once the do_fetch task is finished, it can be observed that the Atmel repository was, in fact, fetched. The answer is available inside the linux-yocto-custom_3.10.bbappend file, which offers another SR_URI location. Other useful information you can gather from here is the one available in bbappend file, inside it is very well stated that the SAMA5D3 Xplained machine is a COMPATIBLE_MACHINE:

KBRANCH = "linux-3.10-at91"
SRCREV = "35158dd80a94df2b71484b9ffa6e642378209156"
PV = "${LINUX_VERSION}+${SRCPV}"

PR = "r5"

FILESEXTRAPATHS_prepend := "${THISDIR}/files/${MACHINE}:"

SRC_URI = "git://github.com/linux4sam/linux-at91.git;protocol=git;branch=${KBRANCH};nocheckout=1"
SRC_URI += "file://defconfig"

SRCREV_sama5d4-xplained = "46f4253693b0ee8d25214e7ca0dde52e788ffe95"

do_deploy_append() {
  if [ ${UBOOT_FIT_IMAGE} = "xyes" ]; then
    DTB_PATH="${B}/arch/${ARCH}/boot/dts/"
    if [ ! -e "${DTB_PATH}" ]; then
      DTB_PATH="${B}/arch/${ARCH}/boot/"
    fi

    cp ${S}/arch/${ARCH}/boot/dts/${MACHINE}*.its ${DTB_PATH}
    cd ${DTB_PATH}
    mkimage -f ${MACHINE}.its ${MACHINE}.itb
    install -m 0644 ${MACHINE}.itb ${DEPLOYDIR}/${MACHINE}.itb
    cd -
  fi
}

COMPATIBLE_MACHINE = "(sama5d4ek|sama5d4-xplained|sama5d3xek|sama5d3-xplained|at91sam9x5ek|at91sam9rlek|at91sam9m10g45ek)"

The recipe firstly defines repository-related information. It is defined through variables, such as SRC_URI and SRCREV. It also indicates the branch of the repository through the KBRANCH variable, and also the place from where defconfig needs to be put into the source code to define the .config file. As seen in the recipe, there is an update made to the do_deploy task for the kernel recipe to add the device driver to the tmp/deploy/image/sama5d3-xplained directory alongside the kernel image and other binaries.

The kernel recipe inherits the kernel.bbclass and kernel-yocto.bbclass files, which define most of its tasks actions. Since it also generates a device tree, it needs access to linux-dtb.inc, which is available inside the meta/recipes-kernel/linux directory. The information available in the linux-yocto-custom_3.10.bb recipe is rather generic and overwritten by the bbappend file, as can be seen here:

SRC_URI = "git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git;protocol=git;nocheckout=1"

LINUX_VERSION ?= "3.10"
LINUX_VERSION_EXTENSION ?= "-custom"

inherit kernel
require recipes-kernel/linux/linux-yocto.inc

# Override SRCREV to point to a different commit in a bbappend file to
# build a different release of the Linux kernel.
# tag: v3.10 8bb495e3f02401ee6f76d1b1d77f3ac9f079e376"
SRCREV = "8bb495e3f02401ee6f76d1b1d77f3ac9f079e376"

PR = "r1"
PV = "${LINUX_VERSION}+git${SRCPV}"

# Override COMPATIBLE_MACHINE to include your machine in a bbappend
# file. Leaving it empty here ensures an early explicit build failure.
COMPATIBLE_MACHINE = "(^$)"

# module_autoload is used by the kernel packaging bbclass
module_autoload_atmel_usba_udc = "atmel_usba_udc"
module_autoload_g_serial = "g_serial"

After the kernel is built by running the bitbake virtual/kernel command, the kernel image will be available inside the tmp/deploy/image/sama5d3-xplained directory under the zImage-sama5d3-xplained.bin name, which is a symbolic link to the full name file and has a larger name identifier. The kernel image was deployed here from the place where the Linux kernel tasks were executed. The simplest method to discover that place would be to run bitbake –c devshell virtual/kernel. A development shell will be available to the user for direct interaction with the Linux kernel source code and access to task scripts. This method is preferred because the developer has access to the same environment as bitbake.

A kernel module, on the other hand, has a different kind of behavior if it is not built-in inside the Linux kernel source tree. For the modules that are build outside of the source tree, a new recipe need to be written, that is, a recipe that inherits another bitbake class this time called module.bbclass. One example of an external Linux kernel module is available inside the meta-skeleton layer in the recipes-kernel/hello-mod directory:

SUMMARY = "Example of how to build an external Linux kernel module"
LICENSE = "GPLv2"
LIC_FILES_CHKSUM = "file://COPYING;md5=12f884d2ae1ff87c09e5b7ccc2c4ca7e"

inherit module

PR = "r0"
PV = "0.1"

SRC_URI = "file://Makefile \
           file://hello.c \
           file://COPYING \
          "

S = "${WORKDIR}"

# The inherit of module.bbclass will automatically name module packages with
# "kernel-module-" prefix as required by the oe-core build environment.

As mentioned in the example of the Linux kernel external module, the last two lines of each kernel module that is external or internal is packaged with the kernel-module- prefix to make sure that when the IMAGE_INSTALL variable is available, the value kernel-modules are added to all kernel modules available inside the /lib/modules/<kernel-version> directory. The kernel module recipe is very similar to any available recipe, the major difference being in the form of the module inherited, as shown in the line inherit module.

Inside the Yocto Project, there are multiple commands available to interact with the kernel and kernel module recipes. The simplest command is, of course, bitbake <recipe-name>, but for the Linux kernel, there are a number of commands available to make the interaction easier. The most used one is the bitbake -c menuconfig virtual/kernel operation, which offers access to the kernel configuration menu.

Besides already known tasks, such as configure, compile, and devshell, that are used mostly in the development process, there are other ones, such as diffconfig, which uses the diffconfig script available in the Linux kernel scripts directory. The difference between the implementation of the Yocto Project and the available script of the Linux kernel is the fact that the former adds the kernel config creation phase. These config fragments are used to add kernel configurations into the .config file as part of the automation process.

Summary

In this chapter, you learned about the Linux kernel in general, about its features and methods of interacting with it. There was also information about debugging and porting features. All this was done to make sure that you would get enough information about the whole ecosystem before interacting with it. It is my opinion that if you understand the whole picture first, it will become easier to focus on the more specific things. This is also one of the reasons that the Yocto Project reference was kept toward the end. You were introduced to how a Linux kernel recipe and a Linux kernel external module are defined and used later by a given machine. More information on Linux kernels will also be available in the next chapter, which will gather all the previously presented information and will show you how a developer can interact with a Linux operating system image.

Besides this information, in the next chapter, there will be an explanation about the organization of the root file system and the principles behind it, its content, and device drivers. Busybox is another interesting subject that will be discussed and also the various support for file systems that are available. Since it tends to become larger, information about what a minimal file system should look like will also be presented. Having said this, we shall proceed to the next chapter.