Lab5 文件系统

到 Lab4 为止，我们已经基本处理了操作系统中内存和进程的管理，现在已经能够执行基础的功能了。然而实际上我们并不能把所有数据都存放在内存中，这时我们就需要引入文件系统这一概念，把数据交由磁盘进行存储。

Lab5 中主要涉及到以下内容：

硬件外设与读写驱动
磁盘结构与驱动
文件系统服务进程操作

硬件外设与读写驱动

本次实验中，我们要实现的驱动都使用 MMIO 技术进行编写，也就是直接向内存中某个空间写入以达成对外设的访问。具体到我们的 MOS 实验中，需要访问的外设共有三个：控制台、硬盘、时钟。在这一部分，我们需要实现的就是供这些外设使用的读写驱动（接口），为后面的文件系统的访问做准备。

*	* ---------------------------------*
*	|   device   | start addr | length |
*	* -----------+------------+--------*
*	|  console   | 0x10000000 | 0x20   | (dev_cons.h)
*	|  IDE disk  | 0x13000000 | 0x4200 | (dev_disk.h)
*	|    rtc     | 0x15000000 | 0x200  | (dev_rtc.h)
*	* ---------------------------------*

首先我们需要回顾一下之前完成的实验代码：

我们在 Lab2 中已经完成了对实时钟 rtc 的写入，并且只需要在内核态提供为数不多的操作，不需要处理用户态试图访问、修改时钟的情况，也就不需要额外编写驱动。
在 Lab1 中，我们通过内核态中的 printcharc 函数实现了 printk 函数，并使之可以在用户态使用（写外设）；并且内核中的 kern/console.c 也实现了单个字符的读取（读外设）。实际上这也就是一种驱动。

那么我们也就剩下一种外设缺少驱动程序了，那就是 IDE 磁盘，它恰好是文件管理系统中的关键部分。

硬件外设的访问

在 Report 中我们已经知道，访问外设需避免向 Cache 中写入，所以我们通过访问 kseg1 段，来实现对物理内存的访问。例如 console 设备物理地址为 DEV_CONS_ADDRESS，那么我们要写入的虚拟地址应该为 DEV_CONS_ADDRESS | KSEG1 。

内核态实现

内核态中，我们访问的虚拟地址空间不受限制，所以可以直接访问对应地址。

例如函数 printcharc 就通过直接访问内存实现向控制台的输入：

void printcharc(char ch) {
    *((volatile char *)(KSEG1 + DEV_CONS_ADDRESS + DEV_CONS_PUTGETCHAR)) = ch;
}
// 注意使用 volatile 关键字以避免写入操作被优化

用户态实现 - Exercise 5.1 5.2

由于我们采用微内核的思路进行设计，所以文件系统需要工作于用户态中，那么就必须要实现用户态中的读写操作。MOS 采用系统调用的方式，先陷入内核态，再直接利用上一种方法读写，最后完成信息等的传递。这也就是 Exercise 5.1 + 5.2 的实现内容

/* Overview:
 *  从 va 处读数据，并写入指定外设的地址
 *
 * Post-Condition:
 *  Data within [va, va+len) is copied to the physical address 'pa'.
 *  Return 0 on success.
 *  Return -E_INVAL on bad address.
 *
 * Hint: 使用 kseg1 段虚拟地址访问外设.
 * Hint: 使用 'is_illegal_va_range' 检测 va 有效性.
 * Hint: You MUST use 'memcpy' to copy data after checking the validity.
 *
 *  All valid device and their physical address ranges:
 *	* ---------------------------------*
 *	|   device   | start addr | length |
 *	* -----------+------------+--------*
 *	|  console   | 0x10000000 | 0x20   | (dev_cons.h)
 *	|  IDE disk  | 0x13000000 | 0x4200 | (dev_disk.h)
 *	|    rtc     | 0x15000000 | 0x200  | (dev_rtc.h)
 *	* ---------------------------------*
 */
int sys_write_dev(u_int va, u_int pa, u_int len) {
	/* Exercise 5.1: Your code here. (1/2) */
	if (is_illegal_va_range(va, len)) {
		return -E_INVAL;
	}

	if (pa < 0x10000000 || pa > 0x15000200) {
		return -E_INVAL;
	}

	// TEST 5-1: Forget to think about the illegal_pa in this part
	if ((pa >= 0x10000021 && pa < 0x13000000) || 
		(pa >= 0x13004201 && pa < 0x15000000)) {
		return -E_INVAL;
	}

	if ((pa >= 0x10000000 && pa <= 0x10000020 && (pa + len) > 0x10000020) ||
	    (pa >= 0x13000000 && pa <= 0x13004200 && (pa + len) > 0x13004200) ||
		(pa >= 0x15000000 && pa <= 0x15000200 && (pa + len) > 0x15000200)) {
		return -E_INVAL;
	}

	memcpy((void *) (KSEG1 + pa), (const void *) va, len);

	return 0;
}

/* Overview:
 *  从外设指定地址中读取数据
 *
 * Hint: You MUST use 'memcpy' to copy data after checking the validity.
 */
int sys_read_dev(u_int va, u_int pa, u_int len) {
	/* Exercise 5.1: Your code here. (2/2) */
	if (is_illegal_va_range(va, len)) {
		return -E_INVAL;
	}

	if (pa < 0x10000000 || pa > 0x15000200) {
		return -E_INVAL;
	}

	if ((pa >= 0x10000021 && pa < 0x13000000) ||
		(pa >= 0x13004201 && pa < 0x15000000)) {
		return -E_INVAL;
	}

	if ((pa >= 0x10000000 && pa <= 0x10000020 && (pa + len) > 0x10000020) ||
	    (pa >= 0x13000000 && pa <= 0x13004200 && (pa + len) > 0x13004200) ||
		(pa >= 0x15000000 && pa <= 0x15000200 && (pa + len) > 0x15000200)) {
		return -E_INVAL;
	}

	memcpy((void *) va, (const void *) (KSEG1 + pa), len);

	return 0;
}

Exercise 5.2 是实现在用户态中的 msyscall 接口，实现方式与其他系统调用类似。

IDE 磁盘与读写

在 MOS 系统中，GXemul 为我们提供了模拟的 IDE 磁盘，每次读写的单位是一个扇区（512 Byte）。和其他的外设一样，在指定位置读写可以实现和磁盘的交互，地址表如下：

偏移	效果	位宽
0x0000	写：下一次读写操作的偏移字节数	4 Byte
0x0008	写：设置高 32 位偏移	4 Byte
0x0010	写：下一次读写的磁盘号	4 Byte
0x0020	写：开始操作（0读/1写）	4 Byte
0x0030	读：上一次操作的返回值（0成功/非0失败）	4 Byte
0x4000 - 0x414f	读/写：512 Byte 的一个扇区	none

读操作后需要从缓冲区取出数据，写操作前需要事先写入缓冲区内

内核态实现

课程组在指导书中给出了内核态的访问实现，但实际上我们的 MOS 中并不需要这一函数去实现驱动，因为我们采用微内核架构，将访问磁盘的操作交给用户态完成，这里的代码只起到示例作用。

// 定义 - C
extern int read_sector(int diskno, int offset);
// 实现 - MIPS
LEAF(read_sector)
    sw a0, 0xB3000010  # choose the IDE id, must 0 in our MOS
    sw a1, 0xB3000000  # offset in the disk
    li t0, 0
    sw t0, 0xB3000020  # launch a 'read' action
    lw v0, 0xB3000030  # get the result of the action
    nop
    jr ra
    nop
END(read_sector)

用户态实现 - Exercise 5.3

在 MOS 中，实际完成对磁盘读写操作的是 fs/ide.c 中的两个函数，他们可在用户态使用，充当磁盘驱动。在这里我们就相当于复刻了前面的内核态驱动，使用 C 语言替代 MIPS，并使用系统调用来完成写入地址的操作。

// Overview:
//  read data from IDE disk.
//
// Parameters:
//  diskno: disk number.
//  secno: start sector number.
//  dst: destination for data read from IDE disk.
//  nsecs: the number of sectors to read.
//
// Hint: Use syscalls to access device registers and buffers.
// Sample: ide_read(0, blockno * SECT2BLK, va, SECT2BLK);

void ide_read(u_int diskno, u_int secno, void *dst, u_int nsecs) {
	u_int begin = secno * BY2SECT;
	u_int end = begin + nsecs * BY2SECT;

	//sample: ide_read(0, blockno * SECT2BLK, va, SECT2BLK);

	for (u_int off = 0; begin + off < end; off += BY2SECT) {
		uint32_t temp = diskno;
		/* Exercise 5.3: Your code here. (1/2) */
		panic_on(-E_INVAL == syscall_write_dev(&temp, (DEV_DISK_ADDRESS | DEV_DISK_ID), sizeof(temp))); 
        //select disk_id
        
		temp = begin + off;
		panic_on(-E_INVAL == syscall_write_dev(&temp, (DEV_DISK_ADDRESS | DEV_DISK_OFFSET), sizeof(temp))); 
        // select reading offset_address
        
		temp = DEV_DISK_OPERATION_READ;
		panic_on(-E_INVAL == syscall_write_dev(&temp, (DEV_DISK_ADDRESS | DEV_DISK_START_OPERATION), sizeof(temp))); 
        // select operation
        
		panic_on(-E_INVAL == syscall_read_dev(&temp, (DEV_DISK_ADDRESS | DEV_DISK_STATUS), sizeof(temp))); 
        // get reading result
        
		if (temp == 0) {
			panic_on(temp == 0);
		} else {
			panic_on(-E_INVAL == syscall_read_dev((void *) ((u_int) dst + off), (DEV_DISK_ADDRESS | DEV_DISK_BUFFER), DEV_DISK_BUFFER_LEN));
            // pull from buffer
		}
	}
}

ide_write 同理，不再添加注释

// Overview:
//  write data to IDE disk.
void ide_write(u_int diskno, u_int secno, void *src, u_int nsecs) {
	u_int begin = secno * BY2SECT;
	u_int end = begin + nsecs * BY2SECT;

	for (u_int off = 0; begin + off < end; off += BY2SECT) {
		uint32_t temp = diskno;
		/* Exercise 5.3: Your code here. (2/2) */
		panic_on(-E_INVAL == syscall_write_dev(src + off, (DEV_DISK_ADDRESS | DEV_DISK_BUFFER), BY2SECT));
		panic_on(-E_INVAL == syscall_write_dev(&temp, (DEV_DISK_ADDRESS | DEV_DISK_ID), sizeof(temp)));
		temp = begin + off;
		panic_on(-E_INVAL == syscall_write_dev(&temp, (DEV_DISK_ADDRESS | DEV_DISK_OFFSET), sizeof(temp)));
		temp = DEV_DISK_OPERATION_WRITE;
		panic_on(-E_INVAL == syscall_write_dev(&temp, (DEV_DISK_ADDRESS | DEV_DISK_START_OPERATION), sizeof(temp)));
		panic_on(-E_INVAL == syscall_read_dev(&temp, (DEV_DISK_ADDRESS | DEV_DISK_STATUS), sizeof(temp)));
		if (temp == 0) {
			panic_on(temp == 0);
		}
	}
}

需要注意的是，在函数中设置读写的偏移量时（例如 read 的 26 行），这里的偏移量不是 off 提供的偏移，而是读写地址相对于磁盘起始的偏移，也就是 secno * BY2SECT + off，如图：

完成这两个函数后，我们就可以通过在用户态调用 ide_read/write 来对磁盘进行操作了，这里的读写操作并不限制我们读写的大小（扇区数），但一定是整数个扇区（section）。通过对磁盘的自由读写，我们就能在此基础上建立合适的文件系统了。

硬盘结构

首先我们要区分两种磁盘的常见空间单位：

扇区（section）：物理上划分磁盘的一个单位，大小通常为 512 Byte
磁盘块（block）：为了便于磁盘管理而虚拟出的一个单元，在 MOS 中大小为 4 KB

struct Block {
	uint8_t data[BY2BLK];
	uint32_t type;
} disk[NBLOCK];

通常情况下我们以磁盘块作为逻辑上管理磁盘的单位，每个 Block 的大小为 4100 字节

其次是两个特殊的磁盘块，分别代表磁盘上的第0块和第1块：引导扇区+分区表、超级块（Super Block）

引导扇区+分区表：这部分在理论课的第二单元系统引导中已经涉及，主要与 BootLoader 和硬件处理相关，我们在 Gxemul 中不需要考虑这部分
超级块：包含磁盘的一些判断信息，同时充当当前磁盘的根目录

struct Super {
    u_int s_magic;         // FS_MAGIC
    u_int s_nblocks;       // number of blocks = 1024
    struct File s_root;    // root directory of the disk
};

其中 s_root 的类型为 FTYPE_DIR，f_name 为 "/"

在 MOS 中，我们通过设置结构体 Block 的数组 disk 来对磁盘进行表示，每个 Block 就代表一个磁盘块，data是具体的空间，type是磁盘块用途/状态的标记。

磁盘状态与更改

tools/fsformat.c

在实验中的 IDE 磁盘中，我们占用了一些磁盘块用来充当标记是否已分配磁盘块的位图数组，通过读取数组的状态来得知磁盘块的分配情况，这点与内存管理中我们使用的链表不同。

通常情况下这部分磁盘块紧跟磁盘的超级块

磁盘文件的创建

我们使用 tools/fsformat.c 的 main 函数来在 Linux 中创建一个可供我们 MOS 使用的磁盘镜像文件。Usage: fsformat <img-file> [files or directories]... 意味着函数的 argv[1] 代表着生成镜像的路径，后面的每个参数都代表准备写入的文件/目录路径。

我们来分析一下 fsformat.c 是怎样创建一个磁盘镜像的。

int main(int argc, char **argv) {
	static_assert(sizeof(struct File) == BY2FILE);
	init_disk();  // 初始化磁盘

	if (argc < 3) {
		fprintf(stderr, "Usage: fsformat <img-file> [files or directories]...\n");
		exit(1);
	}

	for (int i = 2; i < argc; i++) {
		char *name = argv[i];
		struct stat stat_buf;
		int r = stat(name, &stat_buf);
		assert(r == 0);
		if (S_ISDIR(stat_buf.st_mode)) {
			printf("writing directory '%s' recursively into disk\n", name);
			write_directory(&super.s_root, name); // 向 root 写入该目录/文件
		} else if (S_ISREG(stat_buf.st_mode)) {
			printf("writing regular file '%s' into disk\n", name);
			write_file(&super.s_root, name);
		} else {
			fprintf(stderr, "'%s' has illegal file mode %o\n", name, stat_buf.st_mode);
			exit(2);
		}
	}

	flush_bitmap();
	finish_fs(argv[1]);

	return 0;
}

程序逻辑如下：
磁盘空间分配：在 main 开始前，创建全局数组 disk 代表磁盘的全部空间，相当于在程序内存当中开辟了一块磁盘等待处理
写入前的初始化：检查 File 大小符合规格后，调用 init_disk 函数对磁盘块信息进行初始化。随后执行参数检查，不符合规格将无法创建磁盘镜像
文件信息获取：遍历每个文件，获取文件属性（主要区分是目录文件还是常规文件）
文件写入：获取文件属性后在不同的 if-else 分支中执行不同的写入函数
封装磁盘镜像：全部文件写入后，执行函数对位图进行更新，同时封装磁盘镜像，结束程序

最后，在 make 的过程中会编译 fsformat.c 并调用程序，过程中的命令行与输出，fs 目录下的 Makefile 如下：

$ ../tools/fsformat ../target/fs.img \
	$(printf '%s\n' rootfs/motd rootfs/newmotd   | awk -F/ '{ ns[$NF]=$0 } END { for (n in ns) { print ns[n] } }')
writing regular file 'rootfs/newmotd' into disk
writing regular file 'rootfs/motd' into disk

include $(user_dir)/include.mk
USERLIB     := $(addprefix $(user_dir)/, $(USERLIB))
USERAPPS    := $(addprefix $(user_dir)/, $(USERAPPS))
FSIMGFILES  := rootfs/motd rootfs/newmotd $(USERAPPS) $(fs-files)

image: $(tools_dir)/fsformat
	dd if=/dev/zero of=../target/fs.img bs=4096 count=1024 2>/dev/null
	# using awk to remove paths with identical basename from FSIMGFILES
	$(tools_dir)/fsformat ../target/fs.img \
		$$(printf '%s\n' $(FSIMGFILES) | awk -F/ '{ ns[$$NF]=$$0 } END { for (n in ns) { print ns[n] } }')

具体而言，Makefile 会把 rootfs/motd rootfs/newmotd USERAPPS fs-files 这几类文件写入磁盘镜像中，但是在 user/include.mk 中，Lab5 处并没有定义 USERAPP，所以只有前两个文件被写入磁盘（）

写入前的初始化 - init_disk

在 init_disk 中，我们需要对位图和特殊的磁盘块进行初始化。

初始化引导扇区块（Step 1）和超级块（Step 3）
将所有位图需要的磁盘块标记为 BMAP 类
对这些块的 data 域进行初始化（全置为可用）
修订位图数组：删去那些本不存在、又被位图初始化的磁盘块

void init_disk() {
	int i, diff;

	// Step 1: Mark boot sector block.
	disk[0].type = BLOCK_BOOT;

	// Step 2: Initialize boundary.
	nbitblock = (NBLOCK + BIT2BLK - 1) / BIT2BLK;
	nextbno = 2 + nbitblock;

	// Step 2: Initialize bitmap blocks.
	for (i = 0; i < nbitblock; ++i) {
		disk[2 + i].type = BLOCK_BMAP;
	}
	for (i = 0; i < nbitblock; ++i) {
		memset(disk[2 + i].data, 0xff, BY2BLK);
	}
	if (NBLOCK != nbitblock * BIT2BLK) {
		diff = NBLOCK % BIT2BLK / 8;
		memset(disk[2 + (nbitblock - 1)].data + diff, 0x00, BY2BLK - diff);
	}

	// Step 3: Initialize super block.
	disk[1].type = BLOCK_SUPER;
	super.s_magic = FS_MAGIC;
	super.s_nblocks = NBLOCK;
	super.s_root.f_type = FTYPE_DIR;
	strcpy(super.s_root.f_name, "/");
}
#define FS_MAGIC 0x68286097 // Everyone's favorite OS class （6.828是吧

由于我们不应修改引导扇区的内容，所以在 Step 1 中只进行了状态的标记；超级块则应该在 Step 3 中把包括状态在内的多个字段初始化。
Step 2 中，nbitblock 代表的是位图所需的磁盘块数量，最后的一个 if 判断将未对齐的尾端进行删除。

文件信息获取 - struct stat

struct stat 这个结构体是用来描述一个 Linux 系统文件系统中的文件属性的结构。可以通过 stat 函数获取文件的所有相关信息，一般情况下，我们关心文件大小和创建时间、访问时间、修改时间。

int stat(const char *path, struct stat *buf); // 返回值为 0 说明成功获取
int fstat(int fdnum, struct Stat *stat);

struct stat {
        mode_t     st_mode;       // - 文件对应的模式，文件，目录等
        ino_t      st_ino;        // inode节点号
        dev_t      st_dev;        // 设备号码
        dev_t      st_rdev;       // 特殊设备号码
        nlink_t    st_nlink;      // 文件的连接数
        uid_t      st_uid;        // 文件所有者
        gid_t      st_gid;        // 文件所有者对应的组
        off_t      st_size;       // 普通文件，对应的文件字节数
        time_t     st_atime;      // - 文件最后被访问的时间
        time_t     st_mtime;      // - 文件内容最后被修改的时间
        time_t     st_ctime;      // - 文件状态改变时间
        blksize_t st_blksize;     // 文件内容对应的块大小
        blkcnt_t   st_blocks;     // 伟建内容对应的块数量
      };

S_ISDIR(stat_buf.st_mode) 是一个宏，查询当前的结构体是否为目录（isDir），ISREG 同理，说明是常规文件，我们据此将文件分类

文件写入

我们实现了两种写入函数，分别处理目录文件和常规文件的写入。这两类函数的基础是 POSIX 标准库函数，我们在 Linux 环境中利用这些函数把待写入的文件转写到磁盘镜像中

// Overview:
//  Write directory to disk under specified dir.
//  Notice that we may use POSIX library functions to operate on
//  directory to get file infomation.
void write_directory(struct File *dirf, char *path) {
	DIR *dir = opendir(path); // 获取路径所在信息
	if (dir == NULL) {
		perror("opendir");
		return;
	}
	struct File *pdir = create_file(dirf); // 创建目录文件
	strncpy(pdir->f_name, basename(path), MAXNAMELEN - 1);
	if (pdir->f_name[MAXNAMELEN - 1] != 0) { // 过程中的目录名称超长度
		fprintf(stderr, "file name is too long: %s\n", path);
		// File already created, no way back from here.
		exit(1);
	}
	pdir->f_type = FTYPE_DIR; // 设置目录类型
	for (struct dirent *e; (e = readdir(dir)) != NULL;) { // 递归地获取所有当前目录下的文件，并创建目录与文件
		if (strcmp(e->d_name, ".") != 0 && strcmp(e->d_name, "..") != 0) {
			char *buf = malloc(strlen(path) + strlen(e->d_name) + 2);
			sprintf(buf, "%s/%s", path, e->d_name);
			if (e->d_type == DT_DIR) {
				write_directory(pdir, buf);
			} else {
				write_file(pdir, buf);
			}
			free(buf);
		}
	}
	closedir(dir);
}

readdir 函数循环读取当前目录中的下一个目录进入点，大概可以理解为遍历内容中的文件/目录。需要注意，代表当前目录的 . 和父目录的 .. 也在循环中可被读取，所以循环中要把他们去掉。

writefile 逻辑相对简单：

// Write file to disk under specified dir.
void write_file(struct File *dirf, const char *path) {
	int iblk = 0, r = 0, n = sizeof(disk[0].data);
	struct File *target = create_file(dirf);

	/* in case `create_file` is't filled */
	if (target == NULL) {
		return;
	}

	int fd = open(path, O_RDONLY);

	// Get file name with no path prefix.
	const char *fname = strrchr(path, '/');
	if (fname) {
		fname++;
	} else {
		fname = path;
	}
	strcpy(target->f_name, fname);

	target->f_size = lseek(fd, 0, SEEK_END);
	target->f_type = FTYPE_REG;

	// Start reading file. 从 Linux 的环境中读到 disk 中存储
	lseek(fd, 0, SEEK_SET);
	while ((r = read(fd, disk[nextbno].data, n)) > 0) {
		save_block_link(target, iblk++, next_block(BLOCK_DATA));
	}
	close(fd); // Close file descriptor.
}

其中也同样调用了一些本文件的函数，做一些基本介绍

// Overview:
//  Allocate an unused 'struct File' under the specified directory.
// Hint:
//  Use 'make_link_block' to allocate a new block for the directory if there are no existing unused
//  'File's.
struct File *create_file(struct File *dirf) {
	int nblk = dirf->f_size / BY2BLK;

	// Step 1: Iterate through all existing blocks in the directory.
	for (int i = 0; i < nblk; ++i) {
		int bno; // the block number
		// If the block number is in the range of direct pointers (NDIRECT), get the 'bno'
		// directly from 'f_direct'. Otherwise, access the indirect block on 'disk' and get
		// the 'bno' at the index.
		/* Exercise 5.5: Your code here. (1/3) */
		if (i < NDIRECT) {
			bno = dirf->f_direct[i];
		} else {
			bno = ((u_int *)(disk[dirf->f_indirect].data))[i];
		}
		// Get the directory block using the block number.
		struct File *blk = (struct File *)(disk[bno].data);

		// Iterate through all 'File's in the directory block.
		for (struct File *f = blk; f < blk + FILE2BLK; ++f) {
			// If the first byte of the file name is null, the 'File' is unused.
			// Return a pointer to the unused 'File'.
			/* Exercise 5.5: Your code here. (2/3) */
			if (f->f_name[0] == 0) {
				return f;
			}
		}
	}
	// Step 2: If no unused file is found, allocate a new block using 'make_link_block' function
	// and return a pointer to the new block on 'disk'.
	/* Exercise 5.5: Your code here. (3/3) */
	int bno = make_link_block(dirf, dirf->f_size / BY2BLK);
	struct File *blk = (struct File *)(disk[bno].data);
	return blk;

	return NULL;
}