system embedded C

  • systemtap支持嵌入C,这样我们就可以实现systemtap自带tapset没有提供的功能
  • 再用上面的例子来举例,我们之前实现了可以根据inode找到d_name来打印文件名,但没有路径,可读性不高,用embedded C来实现打印一个文件的绝对路径
  • 查看内核代码中实现打印文件路径的相关代码d_path(),实现在/usr/src/debug/kernel-2.6.32-358.18.1.el6/linux-2.6.32-358.18.1.el6.x86_64/fs/dcache.c,头文件/usr/src/debug/kernel-2.6.32-358.18.1.el6/linux-2.6.32-358.18.1.el6.x86_64/include/linux/dcache.h
 * d_path - return the path of a dentry
 * @path: path to report
 * @buf: buffer to return value in
 * @buflen: buffer length
 *
 * Convert a dentry into an ASCII path name. If the entry has been deleted
 * the string " (deleted)" is appended. Note that this is ambiguous.
 *
 * Returns a pointer into the buffer or an error code if the path was
 * too long. Note: Callers should use the returned pointer, not the passed
 * in buffer, to use the name! The implementation often starts at an offset
 * into the buffer, and may leave 0 bytes at the start.
 *
 * "buflen" should be positive.
 */
char *d_path(const struct path *path, char *buf, int buflen)  
  • 现在我们需要确定d_path()的三个输入参数,最关键的const struct path *path其实就是上面例子中的$file->d_path
  • 怎么写呢,不用担心,这里有个现成的例子task_file_handle_d_path,我们可以改写下,写成如下的脚本:
%{
#include <linux/file.h>
#include <linux/dcache.h>
#include <linux/fdtable.h>
%}
function task_file_handle_d_path:string (task:long, file:long) %{ /* pure */  
        struct task_struct *p = (struct task_struct *)((long)STAP_ARG_task);
        struct file *f = (struct file *)((long)STAP_ARG_file);
        struct files_struct *files;
        char *page = NULL;
        char *path = NULL;
        rcu_read_lock();
        if ((files = kread(&p->files)) &&
        (page = (char *)__get_free_page(GFP_ATOMIC))) {
                path = d_path(&f->f_path, page, PAGE_SIZE);
                if (path && !IS_ERR(path)) {
                        snprintf(STAP_RETVALUE, MAXSTRINGLEN, "%s", path);
                }
        }
        CATCH_DEREF_FAULT();
        if (page) free_page((unsigned long)page);
        rcu_read_unlock();
%}
probe kernel.function("vfs_read").return {  
        if(execname() != "stapio") {
                task = pid2task(pid())
                printf("%s[%ld], %ld, %s\n", execname(), pid(), $file->f_path->dentry->d_inode->i_ino, task_file_handle_d_path(task, $file))
        }
}
probe timer.s(2)  
{
        exit()
}
  • 执行时需要加-g参数,输出结果:
sshd[10841], 5197, /dev/ptmx  
  • embedded C代码需要包含在%{%}
  • /* pure */的解释见这里
  • systemtap脚本中function的传参,在embedded C中需要以STAP_ARG_开头,比如说上面的task传参,在embedded C里就是STAP_ARG_task,返回值则是STAP_RETVALUE
  • embedded C由于是在内核空间里执行,内核空间是没有内存保护机制的,所以请尽量使用kread来读取内核空间的地址
  • 读取内核空间地址时,需要使用rcu_read_lock()来加锁,操作完毕后使用rcu_read_unlock()来解锁