Project Description

From REuP Project

(Difference between revisions)
Jump to: navigation, search
Revision as of 04:55, 14 July 2010
Apiotro (Talk | contribs)

← Previous diff
Revision as of 13:23, 14 July 2010
Apiotro (Talk | contribs)

Next diff →
Line 1: Line 1:
-XXX+=Instructionless processor architecture using dynamically reconfigurable logic=
 +==Abstract==
 +In this project we would like to propose the idea of the general-purpose processor implemented in dynamically reconfigurable FPGA. The novelty of the presented solution lays in the lack of typical sequential processing – all operations are realized in parallel in the hardware. At the same time the new architecture does not impose any modification of the software development process.
 +==Introduction==
 +For many years the applications of FPGA devices have not been limited only to prototyping or to small volume production. Since the dynamic reconfiguration techniques of FPGAs are commercially available, there are different concepts of using them for acceleration of many types of computations. Initially, the FPGAs were used as coprocessors supporting conventional sequential processors in the most demanding calculations, where the parallel data processing inside the FPGA was very helpful. Since advanced FPGAs are equipped with processor cores, the sequential processing and parallel coprocessing can be implemented in a single die, significantly alleviating the interconnection problems between these two modules. In [1] a slightly modified UNIX OS running on PowerPC of Virtex allows programmers to treat the tasks implemented in the reconfigurable hardware in the same way as software tasks running on PowerPC. Nevertheless, developing a new or porting an existing application to use such a powerful architecture efficiently requires a quite low-level programming approach, which is far from current standards and trends.
 + 
 +Currently an active research is performed in the domain of Application-Specific-Instruction set-Processors (ASIPs), in which the pipeline structure can be customized and utilized in the program through custom instructions. An extension to this approach is the No-Instruction-Set-Computer (NISC). The NISC compiler maps the application directly to the datapath [2]. It can achieve better parallelism and resource utilization than conventional instruction-set based compilers. Using this approach up to 70% speedup compared to an instruction-set based compiler has been achieved. The pipeline structure remains constant throughout the entire execution of the program.
 + 
 +In this paper a completely novel solution is proposed. It exploits the parallelism of data processing and resources reuse of dynamically reconfigurable FPGAs, without any impact on development of new applications as well as migration (porting) of the existing ones. The proposed solution is an FPGA-based implementation of general-purpose processor, in which there are no sequentially executed instructions but all tasks are implemented in the reconfigurable hardware, in the completely automatic way. The whole array of configurable blocks can be modified at runtime, except for small part responsible for reconfiguration management.
 +The next chapter presents a main idea of fully reconfigurable general-purpose processor. Then some technical issues such as partitioning, software to hardware conversion and reconfiguration are discussed in more detail. Before the conclusions, some considerations dealing with adaptation of the operating system to the proposed processor architecture are included.
 +==General concept==
 +Bearing in mind that the processor proposed in this paper is intended to be a general-purpose one, it should be able to execute any user application. Since this processor is a kind of fully reconfigurable FPGA, without any typical hard- nor even soft- processor core, the application to be executed should be stored in a memory as a set of bitstreams, which can implement its required functionality when loaded into FPGA in a correct sequence. The same must apply to the operating system, shared libraries, interrupt handlers, drivers, etc. Each bitstream must be loaded into FPGA on demand, when operations corresponding to this bitstream should be performed now or in the near future. This issue is explained in more detail in the next chapter. For general considerations it is enough to assume, that when the operations being performed by a given part of FPGA are finished, this part can be configured with a different bitstream, serving a new functionality. This means that the FPGA should be partially reconfigurable and should support very efficient reconfiguration techniques, which is discussed later.
 + 
 +Devoting a part of FPGA for each thread allows running many threads concurrently (Figure 1), without time-division technique. This is a first source of data processing acceleration obtained in the proposed architecture. Obviously, similar acceleration is obtained in multi-core, traditional processors, but this approach is much more flexible and scaleable.
 + 
 +To manage the thread executions (in other words: to load the required bitstreams into the proper FPGA partitions) a special Control Unit is required. This block is the only static module in the FPGA – its configuration cannot be modified at runtime. It must be stressed, that this block does not need to perform any kind of scheduling. The required dependencies between bitstreams should be determined during the bitstream set generation process and stored, together with these bitstreams, in memory. The control block can only suspend a thread execution if there is no place for loading of the new bitstream.
 + 
 +[[Image:Conc_threads.jpg]]

Revision as of 13:23, 14 July 2010

Contents

Instructionless processor architecture using dynamically reconfigurable logic

Abstract

In this project we would like to propose the idea of the general-purpose processor implemented in dynamically reconfigurable FPGA. The novelty of the presented solution lays in the lack of typical sequential processing – all operations are realized in parallel in the hardware. At the same time the new architecture does not impose any modification of the software development process.

Introduction

For many years the applications of FPGA devices have not been limited only to prototyping or to small volume production. Since the dynamic reconfiguration techniques of FPGAs are commercially available, there are different concepts of using them for acceleration of many types of computations. Initially, the FPGAs were used as coprocessors supporting conventional sequential processors in the most demanding calculations, where the parallel data processing inside the FPGA was very helpful. Since advanced FPGAs are equipped with processor cores, the sequential processing and parallel coprocessing can be implemented in a single die, significantly alleviating the interconnection problems between these two modules. In [1] a slightly modified UNIX OS running on PowerPC of Virtex allows programmers to treat the tasks implemented in the reconfigurable hardware in the same way as software tasks running on PowerPC. Nevertheless, developing a new or porting an existing application to use such a powerful architecture efficiently requires a quite low-level programming approach, which is far from current standards and trends.

Currently an active research is performed in the domain of Application-Specific-Instruction set-Processors (ASIPs), in which the pipeline structure can be customized and utilized in the program through custom instructions. An extension to this approach is the No-Instruction-Set-Computer (NISC). The NISC compiler maps the application directly to the datapath [2]. It can achieve better parallelism and resource utilization than conventional instruction-set based compilers. Using this approach up to 70% speedup compared to an instruction-set based compiler has been achieved. The pipeline structure remains constant throughout the entire execution of the program.

In this paper a completely novel solution is proposed. It exploits the parallelism of data processing and resources reuse of dynamically reconfigurable FPGAs, without any impact on development of new applications as well as migration (porting) of the existing ones. The proposed solution is an FPGA-based implementation of general-purpose processor, in which there are no sequentially executed instructions but all tasks are implemented in the reconfigurable hardware, in the completely automatic way. The whole array of configurable blocks can be modified at runtime, except for small part responsible for reconfiguration management. The next chapter presents a main idea of fully reconfigurable general-purpose processor. Then some technical issues such as partitioning, software to hardware conversion and reconfiguration are discussed in more detail. Before the conclusions, some considerations dealing with adaptation of the operating system to the proposed processor architecture are included.

General concept

Bearing in mind that the processor proposed in this paper is intended to be a general-purpose one, it should be able to execute any user application. Since this processor is a kind of fully reconfigurable FPGA, without any typical hard- nor even soft- processor core, the application to be executed should be stored in a memory as a set of bitstreams, which can implement its required functionality when loaded into FPGA in a correct sequence. The same must apply to the operating system, shared libraries, interrupt handlers, drivers, etc. Each bitstream must be loaded into FPGA on demand, when operations corresponding to this bitstream should be performed now or in the near future. This issue is explained in more detail in the next chapter. For general considerations it is enough to assume, that when the operations being performed by a given part of FPGA are finished, this part can be configured with a different bitstream, serving a new functionality. This means that the FPGA should be partially reconfigurable and should support very efficient reconfiguration techniques, which is discussed later.

Devoting a part of FPGA for each thread allows running many threads concurrently (Figure 1), without time-division technique. This is a first source of data processing acceleration obtained in the proposed architecture. Obviously, similar acceleration is obtained in multi-core, traditional processors, but this approach is much more flexible and scaleable.

To manage the thread executions (in other words: to load the required bitstreams into the proper FPGA partitions) a special Control Unit is required. This block is the only static module in the FPGA – its configuration cannot be modified at runtime. It must be stressed, that this block does not need to perform any kind of scheduling. The required dependencies between bitstreams should be determined during the bitstream set generation process and stored, together with these bitstreams, in memory. The control block can only suspend a thread execution if there is no place for loading of the new bitstream.

Image:Conc_threads.jpg

Personal tools