下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、FISH: Linux System Calls for FPGA AcceleratorsKevin NamECE Department University of Toronto Toronto, Canadakevin.nammail.utoronto.caBlair FortECE Department University of Toronto Toronto, Canada forteecg.utoronto.caStephen Brown ECE Department University of Toronto Toronto, Canadabrowneecg.toronto.e
2、ducalls, each of which provides a particular kernel resource. Each call is assigned a unique system call ID.When a program makes a system call, it must use the Application Binary Interface (ABI) provided by Linux. The ABI enforces a calling convention that requires the user program to store argument
3、s to the system call in appropriate CPU registers, and call the CPU-architecture-specific system call instruction whilst providing the system call ID. At this point the CPU changes from user mode to privileged mode, and the ID is used to jump to the appropriate subroutine that handles the system cal
4、l being requested.The goal of this work is to provide an interface that FPGA accelerators can use to request system calls from the Linux kernel, similar to what an ABI provides for software programs.AbstractThis paper presentsthe FISH (FPGA-InitiatedSoftware-Handled) framework which allows FPGA acce
5、lerators to make system calls to the Linux operating system in CPU-FPGA systems. A special FISH Linux kernel module running on the CPU provides a system call interface for FPGA accelerators, much like the ABI which exists for software programs. We provide a proof- of-concept implementation of this f
6、ramework running on the Intel Cyclone V SoC device, and show that an FPGA accelerator can seamlessly make system calls as if it were the host program. We see the FISH framework being especially useful for high-level synthesis (HLS) by making it possible to synthesize software code that contains syst
7、em calls.I. INTRODUCTIONTightly-coupled CPU-FPGA devices, such as the Intel/Altera SoC 1 and the Xilinx SoC/MPSoC 2 execute hybrid applications which often run on top of Linux as a Linux host program + FPGA accelerator combination. These systems tend to contain resources that are controlled by the L
8、inux kernel, such as memory, storage, and peripherals. If the Linux host program needs one of these resources, it makes a Linux system call which grants it temporary privileged access. If an FPGA accelerator needs one of these resources, it is unclear how it would gain access. The ability to make sy
9、stem calls from an FPGA accelerator (or some equivalent method to access system resources) is clearly a potential benefit in CPU-FPGA systems, and has been studied in the past 3 4.A. System Calls in FPGA High-Level SynthesisWhen creating a hybrid application, developers can use high-level synthesis
10、(HLS) frameworks, such as the Intel OpenCL SDK 5 or LegUp 6. Since the input to high-level synthesis is usually high-level software-like code, there could be a natural inclination to include system calls in the accelerated code. This is because software code tends to contain many system calls (often
11、 implicitly) to carry out common tasks like I/O, memory allocation, timers, etc. The inclination toward system calls could be especially strong when the HLS user is software-oriented. To this end, the ability to make system calls from the FPGA could ultimately allow system call-laden software code t
12、o be accelerated by a high-level synthesis tool.B. The Anatomy of the Linux System CallThe Linux Programmers Manual describes the system call as the fundamental interface between an application and the Linux kernel 7. Linux provides hundreds of different systemII. THE FPGA-INITIATED SOFTWARE-HANDLED
13、 (FISH) FRAMEWORK FOR SYSTEM CALLSWe propose a framework that provides Linux system calls to FPGA accelerators in CPU-FPGA systems. At a high level, the proposed framework consists of three logical components:1.2.3.Linux host program FPGA acceleratorFISH Linux kernel moduleThe host program runs on t
14、he Linux operating system, launches the FPGA accelerator. When the FPGA accelerator needs to make a system call, it requests it from the FISH Linux kernel module (kmodule). The FISH kmodule carries out the system call on behalf of the accelerator. When the system call is complete, the FISH kmodule s
15、ignals the accelerator, and computation on the accelerator resumes.A. A Standard System Call Interface for FPGA AcceleratorsParallels can be drawn between our framework and the traditional ABI used by software, as shown in Figure 1. The accelerator is analogous to the user program and both the accel
16、erator and the user program must prepare for a system call by providing data (system call ID and system call arguments). While the data is the same, the medium for transfer is different. The user program writes the data to the CPU registers, but the accelerator uses some global memory accessible by
17、the CPU. Once the data is ready, the accelerator defers control to the FISH kmodule, similar to how the user program defers control to kernel-mode execution of system call handling code placed in memory by the Linux kernel.B. FPGA to Kernel Module CommunicationThe framework requires a signaling meth
18、od between the FPGA and the FISH kernel module for two purposes: to signal the start of a system call (step 2), and to signal the completion of the system call (step 7).In addition to signaling, the following data must also be passed between the FPGA and the kernel (step 3):.Host programs PID
19、 System call IDArguments to the system call Data buffersFigure 1: Parallels between the traditional system call convention and the FPGA-initiated, SW-handled conventionThe host programs PID is passed so that the kmodule is aware of which host program is associated with the accelerator. The system ca
20、ll ID is passed to identify which system call is being requested. Some system calls have input arguments which have to be passed from the accelerator to the FISH kmodule. In the event that an argument is a pointer to a buffer, memory for the buffers must be allocated and initialized by the accelerat
21、or.In the diagram shown in Figure 2, the data is transferred through system memory. This is a natural way to pass data, as CPU-FPGA systems generally have a global memory which is accessible by both CPU and FPGA. With other system topologies, alternative media may be used.C. Handling the System Call
22、 in the FISH Kernel ModuleIn the FISH framework, the FISH kmodule is used as the intermediary between system resources/services and the FPGA accelerator. This kmodule runs on the CPU in privileged mode in place of the accelerator, and executes the code needed to consume system resources and services
23、.The FISH kernel module remains asleep until it is awoken by the accelerator (step 2). It then determines which system call is being invoked (via system call ID), and the process for which this system call is being invoked (via PID). The kernel module then calls modified versions of the Linux ABI su
24、broutines that handle the requested system calls. As these system-call subroutines were originally written to be called by a user process, they must first be modified before the FISH kernel module can call them. An example of a necessary modification is when a context-specific call is made by the su
25、broutine (such as one requesting data or settings specific to the current process), which would have different effects being called from a kernel module context rather than from a user process running in kernel mode.D. Handling System Calls with Persistent EffectsSome system calls have a persistent
26、effect on the calling process. For example, a call to open() allocates a file descriptor until the process exits or until close() is called. The FISH kmodule needs to provide this persistence. Further, for the system calls to behave seamlessly, the persistent effect must be visible to both the host
27、program and the FPGA accelerator. In other words, a file opened by the FPGA accelerator must be accessible by the host, and vice versa.To keep track of the running processes and their various state data, the Linux kernel uses a table of task_struct objects. Every process has a task_struct object whi
28、ch keeps track ofWhen the FPGA accelerator initiates a system call, it is handled by the process shown in Figure 2. The steps in the process are described below:1.Accelerator stores in memory the data needed by the FISH kmodule to handle the system call: hostprograms PID, system call ID, arguments t
29、o system call, and buffersAccelerator signals the FISH kmodule to start system callFISH kmodule reads the information in memorythe2.the3.4.FISH kmodule consumes system resources on behalf of the acceleratorFISH kmodule modifies the host programs task_struct object to ensure persistent change as a re
30、sult of the system callThe result of the system call (if applicable) is written to the buffer in memoryFISH kmodule signals the FPGA accelerator that the system call has finishedAccelerator reads the results of the system call from the buffer in memory (if applicable)The persistent effects of the sy
31、stem call are visible to the host program in its execution further down the line..9.Figure 2: Steps for handling an FPGA-initiated system callprocess-specific information like execution state, file descriptor tables, child processes, etc. To achieve the required persistent effects of system c
32、alls, the FISH kmodule modifies the host processs task_struct object (step 5). Any system call further down the line consults the task_struct ensuring the persisting effects of prior system calls.III. IMPLEMENTING THE FRAMEWORKThis section describes our proof-of-concept implementation of the propose
33、d framework on an Intel DE1-SoC board containing the Intel Cyclone V SoC (which includes an ARM Cortex-A9 Dual-Core CPU 925 MHz) and 1GB of DDR3 800 MHz memory. A custom Linux distribution with kernel version 3.18 was used. The DDR3 memory was used as the global memory to transfer data between the C
34、PU and the FPGA. The Cyclone V SoCs Generic Interrupt Controller (GIC) that allows FPGA to CPU interrupts was used for FPGA- to-FISH kmodule signaling. LegUP HLS 6 was used to create the accelerator HDL which was then hand-modified to initiate three common system calls open(), close(), and write().
35、These system calls provide access to the filesystem from the FPGA, which is has been of interest to researchers in the past 8.A. Allocating the Shared Data BuffersIn order to pass data between the accelerator and the FISH kernel module, the DE1-SoC boards DDR3 memory was used as the shared global me
36、mory. Since the Linux kernel manages this memory, it was necessary to allocate buffers before transferring data. Using a malloc() was problematic, as the allocated memory could be paged and scattered in memory, and the LegUP-generated accelerator cannot read discontiguous pages in memory. To solve t
37、his problem, Linuxs Contiguous Memory Allocator (CMA) was used. Note that contiguous memory is not a general requirement of the FISH framework. For example, some hardware architectures may provide accelerators access to an MMU allowing paged memory access.B. Implementing the AcceleratorThe LegUp HLS
38、 tool was used as a starting point to generate the accelerator. The C-language function shown in Figure 3 was accelerated using the HLS filewrite()int fd = open(“test.txt”, O_WRONLY, S_IRWXU); write(fd, “Hello! The accelerator wrote this file!”, 40); close(fd);return fd;Figure 3: C Function
39、 that was accelerated using LegUp HLSregister is used by the FISH kernel module to pass on the return value of the completed system call to the accelerator. A write to this register doubles as the finished signal, telling the accelerator that the system call has completed.The LegUp tool instantiates
40、 a Verilog module for each C- language function that is synthesized. Since LegUp does not support synthesis of system calls, it instantiates an empty Verilog submodule in place of the system call but generates the rest of the accelerator circuit, including the logic for controlling the empty submodu
41、les and for consuming the submodules results. The empty submodules for open(), close(), and write() were therefore implemented by hand. These submodules were made to set the accelerators registers (mentioned above), initialize argument buffers in the CMA memory if required, and send the interrupt re
42、quest to the kernel to trigger the system call.C. Implementing the FISH Kernel ModuleThe FISH kernel module was implemented to handle the system calls requested by the accelerator. When the kernel module is loaded into the kernel, it registers a threaded IRQ handler for the IRQ coming from the accel
43、erator. A threaded interrupt handler is itself interruptible, which is necessary for calling some of the system call handler functions. For example, system call code that requires thread synchronization functionality can only run if the thread itself can be interrupted (and de-scheduled) by the Linu
44、x scheduler. Handling the system call tasks in a thread is also beneficial to the overall system performance compared to having more uninterruptible high-priority code. The irq handler function first reads the accelerators registers and uses the PID to get the task_struct object corresponding to the
45、 host program. It then uses the System Call ID to call the appropriate handler function (for example, sys_open_handler as shown in Figure 4).Abbreviated code for sys_open_handler is shown in Figure4. The host programs task_struct object is used to get its files table (files_struct). Using a modified
46、 version of the Linux function alloc_fd, a file descriptor is allocated. The file is then opened and installed into the host programs file descriptors table using fd_ sys_open_handler(struct task_struct* ts, char* filename, int flags, int mode)struct files_struct* files = ts-files;int fd
47、= alloc_fd(files, 0, rlimit(RLIMIT_NOFILE), flags); char filename_buf256; struct file * f;/ Get full file path (build it relative to host program CWD) build_filepath(filename_buf, filename);/ Open the file and install itf = filp_open(filename_buf, flags, mode); fsnotify_open(f);fd_install(files, fd,
48、 f); return fd;Figure 4: Abbreviated code for the open system call handlerLegUp creates a memory-mapped register interface for theaccelerator which the host program uses to control it.Inaddition to the default registers, the following were added:.PIDSystem Call IDSystem Call Arguments 0 3 Sys
49、tem Call Return ValueIV. RESULTSThe latencies of the software and accelerator-initiated versions of the three system calls open(), close(), and write() were measured. The latencies of the software-only systemThe PID, SC ID, and SC arguments registers are used to provide their namesake data to the FI
50、SH kmodule. The return valuecalls were measured using the ARM processors timers, accessed with gettimeofday(). The latencies of the accelerator- initiated system calls were measured using the Intel SignalTap signal analyzer to count the number of accelerator clock cycles elapsed between sending the
51、start signal to the system call submodule, and receiving the finished signal. This gave the “HW-Initiated (Total)” times shown in Figure 5. Similarly, a cycle count was also recorded between sending the interrupt to the FISH kernel module and receiving back a response, to measure how long the kernel
52、 module took to handle each system call. This time is shown as the “HW-Initiated (Kmodule)” time in Figure 5.Latencies of SW and HW-Initiated System Callsapplication. Using PushPush, an application of interleaved software and hardware computation can be created, meaning the software portion could ma
53、ke system calls. ReconOS, HThreads, and PushPush offer a system call mechanism that effectively uses a user-level thread to make the call on behalf of the FPGA.BORPH 4 is a modified Linux operating system that presents hardware interfaces in FPGA which serve as message passers to the Linux kernel. T
54、heir work focuses on providing UNIX file system access to FPGA circuits 8. An important difference from FISH is that BORPHs interface provides direct FPGA to OS filesystem access, with no interaction between a host program and the accelerator.VI. CONCLUSIONThis paper presented a kernel module-based
55、approach for providing a Linux system call interface to FPGA accelerators in CPU-FPGA systems. Using a proof-of-concept implementation, we showed that this approach can deliver good performance with latencies that are not too far from system calls in software. With further work, we believe that this
56、 approach can be useful in many situations where the Linux OS is used on CPU-FPGA systems. Future work could include integrating this mechanism into a high-level synthesis tool, and expanding the library of HDL code and kernel module subroutines to support additional system calls.VII. REFERENCES1008
57、0Pure-SW6040HW-Initiated(Total)HW-Initiated (Kmodule)200OpenCloseWriteSystem CallFigure 5: Latencies of Pure-SW and HW-initiated system callsThe measured latencies show that the open(), close(), and write() FPGA-initiated system calls take roughly 1x-2x as long to finish compared to their software-o
58、nly counterparts. These numbers show that the FISH framework is capable of providing an System Call interface with reasonably low overhead.V. RELATED WORKOther works have studied FPGA-to-OS interfaces running on CPU-FPGA systems. For example, ReconOS 9 is a RTOS which provides system resources to hardware in the FPGA. Their approach uses a hardware interface in the FPGA which communicates with an interrupt handler on the CPU side, much like our work. Instead of using the interrupt handler to handle the system call request immediat
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 湖南省电力设计院有限公司2026届春季校园招聘考试参考题库及答案解析
- 2026上半年四川事业单位统考青白江区考试招聘13人考试备考题库及答案解析
- 2026河北衡水恒通热力有限责任公司公开招聘工作人员28名笔试参考题库及答案解析
- 拖拉机底盘部件装试工冲突解决测试考核试卷含答案
- 2026江西吉安市吉水县面向社会招聘农村(社区)“多员合一岗”工作人员146人考试参考题库及答案解析
- 2026广东广州番禺中学编外教师招聘1人笔试模拟试题及答案解析
- 养路机械安全使用审批办法培训课件
- 2026中智集团诚聘贸易运营经理1人考试备考题库及答案解析
- 2026上海地铁招聘地铁人员150人笔试参考题库及答案解析
- 2026年自贡市事业单位面向退役士兵招聘工作人员(20人)笔试备考题库及答案解析
- 2026甘肃平凉华亭市招聘社区工作者10人考试参考试题及答案解析
- 优先内部采购制度
- 国开2026年春季《形势与政策》大作业答案
- 浙江省嘉兴市2025-2026学年高二上学期期末地理试题卷
- 2026金华兰溪市机关事业单位编外招聘20人考试备考试题及答案解析
- 基于数字孪生技术的草原监测与智能放牧管理系统研究
- 2026年南京机电职业技术学院单招职业技能考试题库及答案详解(历年真题)
- (2026年春新版)人教版三年级英语下册全册教学设计
- 2025年沙洲职业工学院单招职业技能考试题库附答案
- GB/T 11918.4-2025工业用插头、固定式或移动式插座和器具输入插座第4部分:有或无联锁带开关的插座
- 微软Dynamics 365系统方案
评论
0/150
提交评论