Main Page
From REuP Project
Revision as of 13:42, 13 July 2010 Apiotro (Talk | contribs) (→General concept) ← Previous diff |
Revision as of 14:12, 13 July 2010 Admin (Talk | contribs) Next diff → |
||
Line 21: | Line 21: | ||
[[Presentations]] | [[Presentations]] | ||
- | [[Projects]] | + | [[State of The Art]] |
Revision as of 14:12, 13 July 2010
Introduction
For many years the applications of FPGA devices are not limited only to prototyping or to small volume production. Since the dynamic reconfiguration techniques of FPGAs are commercially available, there are different concepts of using them for acceleration of many types of computations. Initially, the FPGAs were used as coprocessors supporting conventional, sequential processors in the most demanding calculations, where the parallel data processing inside the FPGA was very helpful. Since advanced FPGAs are equipped with processor cores, the sequential processing and parallel coprocessing can be implemented in a single die, significantly alleviating the interconnection problems between these two modules. In this paper a completely novel solution is proposed. It exploits the parallelism of data processing and resources reuse of dynamically reconfigurable FPGAs, without any impact on development of new applications as well as migration (porting) of the existing ones. The proposed solution is an FPGA-based implementation of general-purpose processor, in which there are no sequentially executed instructions but all tasks are implemented in the reconfigurable hardware, in the completely automatic way. The whole array of configurable blocks can be modified at runtime, except for small part responsible for reconfiguration management.
General concept
Bearing in mind that the processor proposed in this paper is intended to be a general-purpose one, it should be able to execute any user application. Since this processor is a kind of fully reconfigurable FPGA, without any typical hard- nor even soft- processor core, the application to be executed should be stored in a memory as a set of bitstreams, which can implement its required functionality when loaded into FPGA in a correct sequence. The same must apply to the operating system, shared libraries, interrupt handlers, drivers, etc. Each bitstream must be loaded into FPGA on demand, when operations corresponding to this bitstream should be performed now or in the near future. When the operations being performed by a given part of FPGA are finished, this part can be configured with a different bitstream, serving a new functionality. This means that the FPGA should be partially reconfigurable and should support very efficient reconfiguration techniques. Devoting a part of FPGA for each thread allows running many threads concurrently, without time-division technique.
To manage the thread executions (in other words: to load the required bitstreams into the proper FPGA partitions) a special control block is required. This block is the only static module in the FPGA – its configuration cannot be modified at the runtime. It must be stressed, that this block does not need to perform any kind of scheduling. The required dependencies between bitstreams should be determined during the bitstream set generation process and stored, together with these bitstreams, in the memory. The control block can only suspend the thread execution if there is no place for loading of the new bitstream.
The functionality of most threads is too complex to be implemented outright in the FPGA at the same time. Therefore it must be split into the sequence of smaller pieces – partitions. The implementation of the partitions in FPGA should fulfill the following condition: before a given partition is finished, subsequent partition should be implemented. If a given partition is finishing with a jump to one of two subsequent partitions, both partitions should be speculatively prepared for further execution. In order to obtain the satisfying flexibility, it should be possible to place the partitions in any part of the FPGA. It means that the current, commercially available reconfiguration techniques, provided e.g. by Xilinx, cannot be efficiently used for this purpose. The efficient implementation of proposed processor requires specific integrated circuit to be designed, since the current off-the-shelf reconfigurable devices do not offer the demanded reconfiguration flexibility.
The crucial point of the proposed general-purpose reconfigurable processor is an automatic transformation of any software operation into the hardware partition. The approach based on intermediate code or assembly language has been chosen. It allows using advanced front-end tools, existing e.g. in a GNU Compiler, to reduce the complex, high-level programming structures to simple assembler instructions.
Each assembler instruction can be easily transformed into HDL description statements. The data dependencies can be extracted from the assembler code or from the compiler intermediate format (e.g. GIMPLE). The compiler back-end tools can automatically generate an adequate HDL description, for which the synthesis tools will generate the optimized hardware giving the functionality corresponding to converted software operation.