X-Git-Url: http://git.annexia.org/?p=virt-mem.git;a=blobdiff_plain;f=HACKING;h=610d9763162274b9309bb21dd12726e4610198b3;hp=2fb11795d6624bbac87ce30157338897d987b236;hb=5465f41dc973c040cc5abd423640b2d4a118a159;hpb=5a960be177bdfbbb1d62f96490e355e3e3e54f12 diff --git a/HACKING b/HACKING index 2fb1179..610d976 100644 --- a/HACKING +++ b/HACKING @@ -34,6 +34,11 @@ extract/ subdirectories here correspond to the different Linux distributions and methods of getting at their kernels. +extract/codegen/ + + - Tools to turn the kernel database into generated code which parses + the kernel structures. + General structure of lib/virt_mem.ml ------------------------------------ @@ -45,21 +50,108 @@ which gets successively enhanced with extra data along the way: process, load kernel images | - | (passes a 'Virt_mem_types.image0') + | V Find kernel symbols | - | (enhanced into a 'Virt_mem_types.image1') + | V Find kernel version (uname) | - | (enhanced into a 'Virt_mem_types.image2') + | + V + + Find task_structs, net_devices, etc. + + | + | V Call tool's "run" function. -Tools can register other callbacks which get called at earlier stages. \ No newline at end of file +Tools can register other callbacks which get called at earlier stages. + +How it works +------------ + +(1) Getting the kernel image + +This is pretty easy (on QEMU/KVM anyway): There is a QEMU monitor +command which reads out memory from the guest, and this is made +available through the virDomainMemoryPeek call in libvirt. + +Kernel images are generally located at small number of known addresses +(eg. 0xC010_0000 on x86). + +(2) Getting the kernel symbols. + +The Linux kernel contains two tables of kernel symbols - the usual +kernel symbols used for exporting symbols to loadable modules, and +'kallsyms' which is used for error reporting. (Of course, specific +Linux kernels may be compiled without one or other of these tables). + +The functions in modules lib/virt_mem_ksyms.ml and +lib/virt_mem_kallsyms.ml deal with searching kernel memory for these +two tables. + +(3) Getting the kernel version. + +The kernel has the kernel version information compiled in at a known +symbol address, so once we have the kernel symbols it is relatively +straightforward to get the kernel version. + +See lib/virt_mem_utsname.ml. + +(4) Process table / memory / network info etc. + +Note that we have the kernel symbols and the kernel version (and that +information is pretty reliable). + +If we take the process table as an example, then it consists of a +linked list of 'struct task_struct', starting at the symbol +'init_task' (which corresponds to the "hidden" PID 0 / swapper task), +and linked through a double-linked list in the 'tasks' member of this +structure. + +We have the location of 'init_task', but struct task_struct varies +greatly depending on: word size, kernel version, CONFIG_* settings, +and vendor/additional patches. + +The problem is to work out the "shape" of task_struct, and we do this +in two different ways: + +(Method 1) Precompiled task_struct. We can easily and reliably +determine the Linux kernel version (see virt-uname). In theory we +could compile a list of known kernel versions, check out their sources +beforehand, and find the absolute layout of the task_struct (eg. using +CIL). This method would only work for known kernel versions, but has +the advantage that all fields in the task_struct would be known. + +(Method 2) Fuzzy matched task_struct. The task_struct has a certain +internal structure which is stable even over many kernel revisions. +For example, groups of pointers always occur together. We search +through init_task looking for these characteristic features and where +a pointer is found to another task_struct we search that (recursively) +on the assumption that those contain the same features at the same +location. This works well for pointers, but not so well for finding +other fields (eg. uids, process name, etc). We can defray the cost of +searches by caching the results between runs. + +Currently we use Method 1, deriving the database of known kernels from +gdb debugging information generated by Linux distributions when they +build their kernels, and processing that with 'pahole' (from dwarves +library). + +We have experimented with Method 2. Currently work on it is postponed +to a research project for a keen student at some point in the near +future. There are some early implementations of method 2 if you look +back over the version control history. + +The database of known kernels is stored in kernels/ subdirectory. + +The functions to build the database by extracting debug information +from Linux distributions is stored in extract/ subdirectory. \ No newline at end of file