Papers
From REuP Project
(Difference between revisions)
Revision as of 12:59, 7 December 2010 Apiotro (Talk | contribs) (→Diploma Thesis - Porting the GCC-Backend to a VLIW-Architecture) ← Previous diff |
Revision as of 07:43, 18 March 2011 Apiotro (Talk | contribs) (→Generating the Communications Infrastructure for Module-based Dynamic Reconfiguration of FPGAs) Next diff → |
||
Line 57: | Line 57: | ||
* Authors: Shannon Koh | * Authors: Shannon Koh | ||
* Year: 2008 | * Year: 2008 | ||
- | * File: [[Media:Phdkoh.pdf|download]] | ||
* Abstract: Current approaches to supporting module-based FPGA reconfiguration focus on various aspects and sub-problems in the area but do not combine to form a coherent, top-down methodology that factors low-level device parameters into every step of the design flow. This thesis proposes such a top-down methodology from application specification to low-level implementation, centered around examining the problem of generating a point-to-point communications infrastructure to support the changing interfaces of dynamically placed modules. Low-level implementation parameters are considered at every stage to ensure that area, timing and budget constraints of the application are met. The approach advocates the regular layout of modules surrounded by a wiring harness supporting the communications for those modules, and thus provides an advanced understanding of how to implement the “fixed wiring harness" model of reconfigurable computing proposed by Brebner. Results have shown that compared to flattened netlists the regularity of the layout does not impose significant overheads on critical path delays. At high communication densities it can even result in lower delays. The core of the methodology is an infrastructure generation process that allocates modules to slots and merges configuration graphs to form wiring harnesses that support the communications for these merged configurations. This thesis suggests methods and evaluates algorithms for configuration graph merging so as to reduce run-time reconfiguration overheads. Initial experiments with a greedy merging algorithm performed on an optical flow application resulted in a substantial reduction of 64% in reconfiguration time. The effects of graph merging with the initial greedy algorithm and an improved dynamic programming algorithm were explored for a range of device sizes and architectural parameters. Results show that configuration merging using the greedy method results in significant reductions to the reconfiguration delay. The dynamic programming algorithm provides consistent improvements above and beyond the savings provided by the greedy method. In addition, a strong correlation was identified between the quality of front-end design activities such as partitioning and the effectiveness of back-end implementations. The methodology is integrated into the Xilinx commercial tool flow for partial reconfiguration, and is effective for implementing applications for module-based FPGA reconfiguration where the modules and their communications requirements are known at design time. It also allows a system designer to consider alternate device sizes and parameters until a set is found that satisfies the application constraints. | * Abstract: Current approaches to supporting module-based FPGA reconfiguration focus on various aspects and sub-problems in the area but do not combine to form a coherent, top-down methodology that factors low-level device parameters into every step of the design flow. This thesis proposes such a top-down methodology from application specification to low-level implementation, centered around examining the problem of generating a point-to-point communications infrastructure to support the changing interfaces of dynamically placed modules. Low-level implementation parameters are considered at every stage to ensure that area, timing and budget constraints of the application are met. The approach advocates the regular layout of modules surrounded by a wiring harness supporting the communications for those modules, and thus provides an advanced understanding of how to implement the “fixed wiring harness" model of reconfigurable computing proposed by Brebner. Results have shown that compared to flattened netlists the regularity of the layout does not impose significant overheads on critical path delays. At high communication densities it can even result in lower delays. The core of the methodology is an infrastructure generation process that allocates modules to slots and merges configuration graphs to form wiring harnesses that support the communications for these merged configurations. This thesis suggests methods and evaluates algorithms for configuration graph merging so as to reduce run-time reconfiguration overheads. Initial experiments with a greedy merging algorithm performed on an optical flow application resulted in a substantial reduction of 64% in reconfiguration time. The effects of graph merging with the initial greedy algorithm and an improved dynamic programming algorithm were explored for a range of device sizes and architectural parameters. Results show that configuration merging using the greedy method results in significant reductions to the reconfiguration delay. The dynamic programming algorithm provides consistent improvements above and beyond the savings provided by the greedy method. In addition, a strong correlation was identified between the quality of front-end design activities such as partitioning and the effectiveness of back-end implementations. The methodology is integrated into the Xilinx commercial tool flow for partial reconfiguration, and is effective for implementing applications for module-based FPGA reconfiguration where the modules and their communications requirements are known at design time. It also allows a system designer to consider alternate device sizes and parameters until a set is found that satisfies the application constraints. | ||
* Our comments: | * Our comments: |
Revision as of 07:43, 18 March 2011
2010
The 2010 report on R&D in ICT in the European Union
- Authors: Geomina Turlea, Daniel Nepelski, Giuditta de Prato, Sven Lindmark, Andrea de Panizza, Lucio Picci, Paul Desruelle, David Broster
- Year: 2010
- Our comments:
Dynamically Reconfigurable Systems
- Authors: Platzner, Marco; Teich, Jürgen; Wehn, Norbert
- Year: 2010
- ISBN: 978-90-481-3484-7
- About this book: Dynamically Reconfigurable Systems is the first ever to focus on the emerging field of Dynamically Reconfigurable Computing Systems. While programmable logic and design-time configurability are well elaborated and covered by various texts, this book presents a unique overview over the state of the art and recent results for dynamic and run-time reconfigurable computing systems. Dynamically Reconfigurable Systems targets graduate students, researchers and practitioners alike. It is highly recommended as reading material for the advanced graduate level and for researchers starting to explore dynamic and run-time reconfigurable systems. Reconfigurable hardware is not only of utmost importance for large manufacturers and vendors of microelectronic devices and systems, but also a very attractive technology for smaller and medium-sized companies. Hence, Dynamically Reconfigurable Systems also addresses researchers and engineers actively working in the field and provides them with information on the newest developments and trends in dynamic and run-time reconfigurable systems.
- Our comments:
2009
An Adaptable Task Manager for Reconfigurable Architecture Kernels
- Authors: Yuriy Shiyanovskii, Francis Wolff, Chris Papachristou, Dan Weyer
- This paper appears in: Adaptive Hardware and Systems, 2009. AHS 2009. NASA/ESA Conference on
- Year: 2009
- Abstract: Self-reconfigurable hardware is a new emerging technology which will enable adaptation of computing systems to changing environments.This paper deals with the design of architecture kernels for an autonomous on-board system and the development of an adaptation manager for real-time scheduling of the reconfigurable hardware fabric.Our approach employs a reconfigurable computer architecture with two key layers: the adaptation manager and the real time configuration kernel. This provides significant advantages in terms of flexibility, scalability, cost, and compatibility with embedded technology. Some preliminary results are presented.
- Our comments:
A Middleware Aided Robust and Fault Tolerant Dynamic Reconfigurable Architecture
- Authors: Dadji Y., Osterloh B., Michalik H.
- This paper appears in: Reconfigurable Mechanisms and Robots, 2009. ReMAR 2009. ASME/IFToMM International Conference on
- Year: 2009
- ISBN: 978-88-89007-37-2
- Abstract: Dynamic reconfiguration enhances embedded system with at run-time adaptive functionality and is an improvement in terms of resource utilization and system adaptability. SRAM-based FPGAs provides a dynamic reconfigurable platform with high logic density. The requirements for such an embedded high flexible system based on FPGAs are robustness and reliability to prevent operation interrupts or even system failures. The complexity of a dynamic reconfigurable system with adaptive processing module demands high effort for the user. Therefore a high level abstraction of the communication issues is required to support application development by an appropriate middleware. To achieve such a flexible embedded system we present our network-on-chip (NoC) approach system-on-chip wire (SoCWire) and outline its performance and suitability for robust dynamic reconfigurable systems. Furthermore we introduce a suitable embedded middleware concept to support the system reconfiguration and the software application development process.
- Our comments:
The Design and Implementation of Hardware Task Configuration Management Unit on Dynamically Reconfigurable SoC
- Authors: Xiao-Wei Wang ; Wei-Nan Chen ; Ying Wang ; Hong-Jun You ; Cheng-Lian Peng ;
- This paper appears in: Embedded Software and Systems, 2009. ICESS '09. International Conference on
- Year: 2009
- ISBN: 978-1-4244-4359-8
- Abstract: The advantages and the flexibility introduced into the hardware implementation by partial dynamic reconfiguration have rapidly changed the design flow of embedded systems. Configuration management is an important issue in operating system for dynamically reconfigurable system-on-chip. Reconfiguration overhead affects the performance of reconfigurable system. This paper presents a hardware implemented efficient configuration management unit. Wet define our operating system framework based on unified multitask programming model for reconfigurable system-on-chip at first. Then the detailed design and implementation of the configuration management unit are given. Finally a use case is presented, which shows the efficiency of the configuration.
- Our comments:
Exploiting Application Data-Parallelism on Dynamically Reconfigurable Architectures: Placement and Architectural Considerations
- Authors: Banerjee, S. ; Bozorgzadeh, E. ; Dutt, N. ;
- This paper appears in: Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
- Year: 2009
- ISSN : 1063-8210
- Abstract: Partial dynamic reconfiguration, often called run-time reconfiguration (RTR), is a key feature in modern reconfigurable platforms. In this paper, we present parallelism granularity selection (PARLGRAN), an application mapping approach that maximizes performance of application task chains on architectures with such capability. PARLGRAN essentially selects a suitable granularity of data-parallelism for individual data parallel tasks while considering key issues such as significant reconfiguration overhead and placement constraints. It integrates granularity selection very effectively in a joint scheduling and placement formulation, necessary due to constraints imposed by partial RTR. As a key step to validating PARLGRAN, we additionally present an exact strategy (integer linear programming formulation). We demonstrate that PARLGRAN generates high-quality schedules with: (1) a set of small test cases where we compare our results with the exact strategy; (2) a very large set of synthetic experiments with over a thousand data-points where we compare it with a simpler strategy that tries to statically maximize data-parallelism, i.e., only considers resource availability; and (3) a detailed application case study of JPEG encoding. The application case-study confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism. Last, but very important, we demonstrate that our approach is well-suited for true on-demand computing with detailed execution time estimates on a typical embedded processor. Heuristic execution time is comparable to task execution time, i.e., it is feasible to integrate PARLGRAN in a run-time scheduler for dynamically reconfigurable architectures.
- Our comments:
2008
Generating the Communications Infrastructure for Module-based Dynamic Reconfiguration of FPGAs
- Authors: Shannon Koh
- Year: 2008
- Abstract: Current approaches to supporting module-based FPGA reconfiguration focus on various aspects and sub-problems in the area but do not combine to form a coherent, top-down methodology that factors low-level device parameters into every step of the design flow. This thesis proposes such a top-down methodology from application specification to low-level implementation, centered around examining the problem of generating a point-to-point communications infrastructure to support the changing interfaces of dynamically placed modules. Low-level implementation parameters are considered at every stage to ensure that area, timing and budget constraints of the application are met. The approach advocates the regular layout of modules surrounded by a wiring harness supporting the communications for those modules, and thus provides an advanced understanding of how to implement the “fixed wiring harness" model of reconfigurable computing proposed by Brebner. Results have shown that compared to flattened netlists the regularity of the layout does not impose significant overheads on critical path delays. At high communication densities it can even result in lower delays. The core of the methodology is an infrastructure generation process that allocates modules to slots and merges configuration graphs to form wiring harnesses that support the communications for these merged configurations. This thesis suggests methods and evaluates algorithms for configuration graph merging so as to reduce run-time reconfiguration overheads. Initial experiments with a greedy merging algorithm performed on an optical flow application resulted in a substantial reduction of 64% in reconfiguration time. The effects of graph merging with the initial greedy algorithm and an improved dynamic programming algorithm were explored for a range of device sizes and architectural parameters. Results show that configuration merging using the greedy method results in significant reductions to the reconfiguration delay. The dynamic programming algorithm provides consistent improvements above and beyond the savings provided by the greedy method. In addition, a strong correlation was identified between the quality of front-end design activities such as partitioning and the effectiveness of back-end implementations. The methodology is integrated into the Xilinx commercial tool flow for partial reconfiguration, and is effective for implementing applications for module-based FPGA reconfiguration where the modules and their communications requirements are known at design time. It also allows a system designer to consider alternate device sizes and parameters until a set is found that satisfies the application constraints.
- Our comments:
Reconfigurable computing- The Theory and Practice of FPGA-Based Computation
- Authors: Scott Hauck and Andre DeHon
- Year: 2008
- ISBN: 978-0123705228
- Ksiazka jest calkiem nowa (2008 rok) i ma prawie 1000 stron wiec raczej nikt nie da rady przeczytac w calosci. Wydaje mi sie jednak, ze kazdy moze znalezc cos dla siebie. Jesli ktos chce sobie odswiezyc wiadomosci o FPGA, to polecam rozdzial 1. W rozdzialach 2 i 3 sa w skrocie przedstawione przyklady architektur reprogramowalnych. Rozdzial 4 mowi o rekonfiguracji, rozdzial 7 dotyczy kompilacji, rozdzial 11 systemow operacyjnych itd.
- Our comments:
Runtime Filesystem Support for Reconfigurable FPGA Hardware Processes in BORPH
- Authors: Hayden Kwok-Hay So and Robert Brodersen
- This paper appears in: Field-Programmable Custom Computing Machines, 2008. FCCM '08. 16th International Symposium on
- Year: 2008
- ISBN: 978-0-7695-3307-0
- Abstract: This paper presents the design of BORPH's file system layer for FPGA-based reconfigurable computers. BORPH provides user FPGA designs that execute as hardware processes access to the general file system using familiar UNIX file I/O semantics. Such capability provides FPGA designers an intuitive interface not only for regular file I/O, but also for representing streaming hardware/software and hardware/hardware communication using UNIX pipes. Design trade-offs among system manageability, user usability and application performance are explored. A case of mixed hardware/software video processing is presented as a proof-of-concept.
- Our comments:
Hardware Supported Task Scheduling on Dynamically Reconfigurable SoC Architectures
- Authors: Zexin Pan and Wells, B.E.
- This paper appears in: Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
- Year: 2008
- ISBN: 1063-8210
- Abstract: Dynamically reconfigurable system-on-a-chip (RSoC) technology features embedded microprocessors that are dispersed on the same die with significant amounts of programmable logic fabric. In this paper, we present a strategy to solve the recently emerging problem of how to utilize the flexible but still limited RSoC resources in an effective manner for a multi-task application. The major contribution of this paper is the development of a dynamic task scheduling algorithm that can be implemented in fixed or reconfigurable hardware that will perform the online scheduling of task systems onto the RSoC type architecture. The results from extensive simulations demonstrate the benefits of the proposed dynamic scheduling approach as compared with that of other static scheduling techniques taken from the technical literature.
- Our comments:
Application Acceleration with the Explicitly Parallel Operations System - the EPOS Processor
- Authors: Papakonstantinou, A. ; Deming Chen ; Wen-Mei Hwu ;
- This paper appears in: Application Specific Processors, 2008. SASP 2008. Symposium on
- Year: 2008
- ISBN: 978-1-4244-2333-0
- Abstract: Different approaches have been proposed over the years for automatically transforming high-level-languages (HLL) descriptions of applications into custom hardware implementations. Most of these approaches however are confined by basic block level parallelism described within the CDFGs (control-data flow graphs). In this work we propose a new high-level synthesis flow which can leverage instruction-level parallelism (ILP) beyond the boundary of the basic blocks. We extract statistical parallelism from the applications through the use of Superblocks and Hyperblocks formed by advanced front-end compilation techniques. The output of the front-end compilation is then used in our high-level synthesis in order to map the application onto a new domain-specific architecture named EPOS (explicitly parallel operations system). EPOS is a stylized micro-code driven processor equipped with novel architectural features that help take advantage of the instruction-level parallelism generated in the front-end compilation. A novel forwarding-path optimization engine is also employed during the high-level synthesis flow in order to minimize the long interconnection wires and the multiplexers in the processor. To evaluate the EPOS processor, we compare its performance with a previous domain-specific processor NISC on a common set of benchmarks. Experimental results show that significant performance gain (3.45times on average) is obtained compared to NISC.
- Our comments:
A Unified HW/SW Operating System for Partially Runtime Reconfigurable FPGA based Computer Systems
- Authors: Qingxu Deng, Yi Zhang, Nan Guan1 and Zonghua Gu
- Year: 2008
- Abstract: Partially Runtime-Reconfigurable (PRTR) FPGAs allow hardware tasks to be placed and removed dynamically at runtime. We present an OS for hybrid computing systems consisting of both CPUs and PRTR FPGAs. The OS is based on Linux, and provides unified interfaces for both HW and SW processes to ease the design of such hybrid systems. The scheduler of HW processes is implemented on the hardware, to alleviate the performance penalty of the time-consuming HW task scheduling algorithms.
- Our comments:
VEBoC: Variation and Error-Aware Design for Billions of Devices on a Chip
- Authors: Shoaib Akram, Scott Cromar, Gregory Lucas, Alexandros Papakonstantinou, Deming Chen
- This paper appears in: Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific
- Year: 2008
- ISBN: 978-1-4244-1921-0
- Description: Zalaczony artykul przedstawia cos podobnego do tego, co chcemy zrealizowac, choc troche innego w koncepcji - application-specific multicores.
- Abstract: Billions of devices on a chip is around the corner and the trend of deep submicron (DSM) technology scaling will continue for at least another decade. Meanwhile, designers also face severe on-chip parameter variations, soft/hard errors, and high leakage power. How to use these billions of devices to deliver power-efficient, high-performance, and yet error-resilient computation is a challenging task. In this paper, we attempt to demonstrate some of our perspectives to address these critical issues. We elaborate on variation-aware synthesis, holistic error modeling, reliable multicore, and synthesis for application-specific multicore. We also present some of our insights for future reliable computing.
- Our comments:
2007
An empirical comparison of ANSI-C to VHDL compilers:SPARK, ROCCC and DWARV
- Authors: Arcilio J. Virginia, Yana D. Yankova, Koen L.M. Bertels
- Year: 2007
- Abstract: Custom computing machines combine the flexibility of the general purpose processor with the high performance of the application tailored hardware. This combination results in a computing platform that allows for performance improvements to a wide range of applications. Nevertheless in order to develop applications for such a platform, a software designer currently needs also hardware design skills. In order to hide the hardware details from the designers and to assist in the non-trivial development process sevreral tools that translate ANSI-C to VHDL are proposed. In this paper we compare three such tools, SPARK, ROCCC and DWARV. The comparison is based on the performance of designs generated by the compilers and the experiences gained during the acquisition of empirical data.
- Our comments:
FPGA-friendly Code Compression for Horizontal Microcoded Custom IPs
- Authors: Bita Gorjiara, Daniel Gajski
- This paper appears in: International Symposium on Field Programmable Gate Arrays Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
- Year: 2007
- ISBN: 978-1-59593-600-4
- Abstract: Shrinking time-to-market and high demand for productivity has driven traditional hardware designers to use design methodologies that start from high-level languages. However, meeting timing constraints of automatically generated IPs is often a challenging and time-consuming task that must be repeated every time the specification is modified. To address this issue, a new generation of IP-design technologies that is capable of generating custom datapaths as well as programming an existing one is developed. These technologies are often based on Horizontal Microcoded Architectures. Large code size is a well-know problem in HMAs, and is referred to as "code bloating" problem.In this paper, we study the code size of one of the new HMA-based technologies called NISC. We show that NISC code size can be several times larger than a typical RISC processor, and we propose several low-overhead dictionary-based code compression techniques to reduce the code size. Our compression algorithm leverages the knowledge of "don't care" values in the control words to better compress the content of dictionary memories. Our experiments show that by selecting proper memory architectures the code size of NISC can be reduced by 70% (i.e. 3.3 times) at cost of only 9% performance degradation. We also show that some code compression techniques may increase number of utilized block RAMs in FPGA-based implementations. To address this issue, we propose combining dictionaries and implementing them using embedded dual-port memories.
- Our comments:
Borph: an operating system for fpga-based reconfigurable computers
- Authors: So, Hayden Kwok-Hay
- Year: 2007
- ISSN : 978-0-549-52957-6
- Abstract: Reconfigurable computing is a promising technology to meet future computational demand by leveraging flexibilities and the high degree of parallelism found in reconfigurable hardware fabrics, such as field programmable gate arrays (FPGAs). However, despite their promising performance researchers have demonstrated, reconfigurable computers are yet to be widely adopted. One reason is the lack of a common and intuitive operating system for these platforms.This dissertation work explores the design and implementation trade-offs of an operating system for FPGA-based reconfigurable computers, BORPH, the Berkeley Operating system for ReProgrammable Hardware. The goal of this work is to explore and demonstrate the feasibility of providing a systematic and easy to understand view into reconfigurable computers through OS support without incurring significant performance penalties. BORPH provides kernel support for FPGA applications by extending a standard Linux operating system. It establishes the notion of hardware process for executing user FPGA applications. Users therefore compile and execute hardware designs on FPGA resources the same way they run software programs on conventional processor-based systems. BORPH offers run-time general file system support to hardware processes as if they were software. The unified file interface allows hardware and software processes to communicate via standard UNIX file pipes. Furthermore, a virtual file system is built to allow access to memories and registers defined in the FPGA, providing communication links between hardware and software. The functions of BORPH are demonstrated on a BEE2 compute module. Performances of BORPH are measured to identify bottlenecks of our system. The clean OS kernel/user separation of BORPH has allowed us to improve overall system performance without affecting existing user designs. Furthermore, BORPH's unified runtime environment has enabled designers to make fair and end-to-end comparisons among software/hardware implementations of the same application. Most importantly, since the introduction of BORPH to our FPGA-based platform, we have observed increased productivity among high-level application developers who have little experience in FPGA application design.
- Our comments:
Adapting and automating XILINX's partial reconfiguration flow for multiple module implementations
- Authors: Rainer Scholz
- This paper appears in: Lecture Notes In Computer Science Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
- Year: 2007
- ISSN : :0302-9743 , 978-3-540-71430-9
- Abstract: In this paper, we present a modification of XILINX's Partial Reconfiguration Design Flow. Starting with either HDL-Design files or synthesised netlists, the presented flow generates all partial as well as the complete configuration bitstreams. In contrast to the established XILINX design flows, our flow is completely automated by a generator. By checking partial reconfiguration constraints it assists the user to avoid typical errors in module and bus macro placement. Compared with the PlanAhead partial reconfiguration flow, it is a single flow for generating multiple implementation for each reconfigurable area.
- Our comments:
ATHER: Dynamic and Self adaptive Middleware
- Authors: Muhammad, F. ; Muller, F. ; Auguin, M. ;
- This paper appears in: Multitopic Conference, 2007. INMIC 2007. IEEE International
- Year: 2007
- Abstract: The development of reconfigurable devices that could make themselves domain-specialized at run time is becoming more and more common. Future reconfigurable architecture will have these computing devices as basic blocks, and reconfigurable architecture could make assemblies of these devices, on the fly, to execute concurrent applications. The migration from completely generic lookup tables and highly connected routing fabrics to self adaptive specialized coarse-grain reconfigurable devices and very structured communication resources presents designers with the problem of how to best customize the system based upon anticipated usage. Then there is a need of not only exploiting parallelism from applications at micro-thread level, dynamically, but system also starves for a dynamic and self adaptive middleware to schedule these micro-threads on thousands of such computing devices. This paper focuses at the problem of dynamic allocation and scheduling of resources to numbers of applications on such architecture.
- Our comments:
Scheduling of periodic tasks on a dynamically reconfigurable device using timed discrete event systems
- Authors: Ushio, T. ; Onogi, K. ;
- This paper appears in: Discrete Event Systems, 2006 8th International Workshop on
- Year: 2007
- ISBN: 1-4244-0053-8
- Abstract: A dynamically reconfigurable device is a device which can change its hardware configuration arbitrarily often in order to achieve the desired performance and functions. Since several tasks are executed on the device concurrently, scheduling of both task execution and reconfiguration is an important problem. In our model, the dynamically reconfigurable device is represented by a two-level hierarchical automaton, and execution of each of periodic tasks is represented by a timed discrete event system. We propose a composition rule to get an automaton, which represents non-preemptive execution of periodic tasks on the dynamically reconfigurable device. We introduce a method to get a feasible scheduling by using state feedback control.
- Our comments:
2006
A Hardware Resource Management System for Adaptive Computing on Dynamically Reconfigurable Devices
- Authors: Toshiyuki Ito ; Kazuya Mishou ; Yuichi Okuyama ; Kenichi Kuroda ;
- This paper appears in: Frontier of Computer Science and Technology, 2006. FCST '06. Japan-China Joint Workshop on
- Year: 2006
- ISBN: 0-7695-2721-3
- Abstract: This paper proposes a realization method of the computer system with dynamical hardware-resource allocation on dynamically reconfigurable devices. The system consists of two or more parts and they can change the number of processing units according to each processing load. In the system, there is a competition problem between these parts. In order to solve this problem, we investigate required functions of resource management units on a simple processing model. This model is an adapted load balancing model consisting of an upper management unit, two management units and processing units shared by them.
- Our comments:
Invited Paper: Enhanced Architectures, Design Methodologies and CAD Tools for Dynamic Reconfiguration of Xilinx FPGAs
- Authors: Lysaght, P. ; Blodget, B. ; Mason, J. ; Young, J. ; Bridgford, B. ;
- This paper appears in: Field Programmable Logic and Applications, 2006. FPL '06. International Conference on
- Year: 2006
- ISBN: 1-4244-0312-X
- Abstract: The paper describes architectural enhancements to Xilinx FPGAs that provide better support for the creation of dynamically reconfigurable designs. These are augmented by a new design methodology that uses pre-routed IP cores for communication between static and dynamic modules and permits static designs to route through regions otherwise reserved for dynamic modules. A new CAD tool flow to automate the methodology is also presented. The new tools initially target the Virtex-II, Virtex-II Pro and Virtex-4 families and are derived from Xilinx's commercial CAD tools.
- Our comments:
ReConfigME: A Detailed Implementation of an Operating System for Reconfigurable Computing
- Authors: Grant Wigley, David Kearney and Mark Jasiunas
- This paper appears in: Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International Issue Date : 25-29 April 2006 On page(s): 8 pp.
- ISBN: 1-4244-0054-6
- Year: 2006
- Abstract: Reconfigurable computing applications have traditionally had the exclusive use of the field programmable gate array, primarily because the logic densities of the available devices have been relatively similar in size compared to the application. But with the modern FPGA expanding beyond 10 million system gates, and through the use of dynamic reconfiguration, it has become feasible for several applications to share a single high density device. However, developing applications that share a device is difficult as the current design flow assumes the exclusive use of the FPGA resources. As a consequence, the designer must ensure that resources have been allocated for all possible combinations of loaded applications at design time. If the sequence of application loading and unloading is not known in advance, all resource allocation cannot be performed at design time because the availability of resources changes dynamically. In this paper, we present an implementation of an operating system that has the ability to share its FPGA resources dynamically among multiple executing applications.
- Our comments:
A Unified Hardware/Software Runtime Environment for FPGABased Reconfigurable Computers using BORPH
- Authors: Hayden KwokHay So, Artem Tkachenko and Robert Brodersen
- This paper appears in: Hardware/Software Codesign and System Synthesis, 2006. CODES+ISSS '06. Proceedings of the 4th International Conference
- Year: 2006
- Abstract: This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's). By providing native kernel support for FPGA hardware, BORPH offers a homogeneous UNIX interface for both software and hardware processes. Hardware processes inherit the same level of service from the kernel, such as file system support, as typical UNIX software processes. Hardware and software components of a design therefore run as hardware and software processes within BORPH's run-time environment. The familiar and language independent UNIX kernel interface facilitates easy design reuse and rapid application development. Performance of our current implementation and our experience with developing a real-time wireless digital signal processing system based on BORPH will be presented.
- Our comments:
2005
Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths
- Authors: Mehrdad Reshadi, Bita Gorjiara, Daniel Gajski
- This paper appears in: Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. Proceedings. 2005 IEEE International Conference on
- Year: 2005
- ISBN: 0-7695-2451-6
- Abstract: Performance of programs can be improved by utilizing their horizontal and vertical parallelism. In some processors (VLIW based), compiler can utilize horizontal parallelism by controlling the schedule of independent operations. Vertical parallelism is utilized through pipelining. However, in all processors, structure of pipeline is fixed and compiler has no control over it. In application-specific-instruction set-processors (ASIPs), pipeline structure can be customized and utilized in the program through custom instructions. Practical constraints on the instruction decoder limit the number and complexity of custom instructions in ASIPs. Detecting the frequent and beneficial custom instructions and incorporating them in the compiler are complex and sometimes very time consuming tasks. In this paper, we present an architecture that does not limit the number of custom functionalities that can be implemented on its datapath. Instead of using custom instructions and then relying on the decoder in hardware to generate the control signals, we generate the control signal values in compiler. Since there are no predefined instructions in this architecture, we call it no-instruction-set-computer (NISC). The NISC compiler maps the application directly on the datapath. It has complete fine grain control over datapath and hence can very well utilize resources in the hardware as well as horizontal and vertical parallelism in the program. We also explain the algorithm for mapping the CDFG of a program on a given datapath in NISC. Using our algorithm and a NISC architecture with the datapath of a MIPS, we achieved up to 70% speedup over the traditional MIPS compiler. In another experiment, we started from a base architecture and customized it by adding resources and interconnect to increase its horizontal and vertical parallelism. The algorithm achieved up to 15.5 times speedup by utilizing the available parallelism in the program and the datapath.
- Our comments:
Custom Processor Design Using NISC: A Case-Study on DCT algorithm
- Authors: Bita Gorjiara, Daniel Gajski
- This paper appears in: Embedded Systems for Real-Time Multimedia, 2005. 3rd Workshop on
- Year: 2005
- Abstract: Designing application-specific instruction-set processors (ASIPs) usually requires designing a custom datapath, and modifying instruction-set, instruction decoder, and compiler. A new alternative to ASIPs is no-instruction-set-computers (NISCs) that eliminate the instruction abstraction by compiling programs directly to a given datapath. The compiler analyzes the datapath and extracts possible operations and data flows. The NISC approach simplifies and accelerates the task of custom processor design. In this paper, we present a case-study of designing a custom datapath for a 2D DCT algorithm. We applied several optimization techniques such as software transformations, operation chaining, datapath pipelining, controller pipelining, and functional unit customization to improve the quality of the design. Most of the techniques are general and can be applied to other applications. The result of synthesizing our final custom datapath on a Xilinx FPGA shows 7.14 times performance improvement, 1.64 times power reduction, 12.5 times energy savings, and more than 3 times area reduction compared to a softcore MIPS implementation.
- Our comments: No-Instruction-Set-Computers (NISCs) that eliminate the instruction abstraction by compiling programs directly to a given datapath.
Object Oriented Programming Constructs' in VHSIC Hardware Description Language ‘Why & How’
- Authors: Deepak Jain
- This paper appears in: Journal of Theoretical and Applied Information Technology
- Year: 2005
- Abstract: Object Oriented Programming Structure (OOPS) has proved its importance in software development in terms of advantages like Abstraction, Encapsulation, Polymorphism, Concurrency, Modularity and Reusability. Also the Object Oriented codes are found to be more verifiable & maintainable. Hence they allow reduction in efforts for development, testing & maintenance of the software. In current scenario, digital-VLSI design life cycle begins with modeling using some Hardware Description Language (HDL) followed by functional verification of the HDL-model by its simulation. Often, VLSI-developers show interest in getting software that simulates the functional behavior of the hardware for its analysis from different points of concern. For the sack of effort minimization in co-designing of Digital VLSI chips and their simulating software, it is of interest to introduce automation in code conversion from HDL to OOPS and vise versa. Author’s efforts in this direction are summarized in this document. The outcome of this paper may be developed as a code converter from C++ to VHDL and vise versa.
- Our comments:
2004
Diploma Thesis - Porting the GCC-Backend to a VLIW-Architecture
- Authors: Jan Parthey
- Year: 2004
- Abstract: Digital Signal Processors (DSP) are widely used today, e.g. in sound and video processing. In order to accelerate the design process of applications, large parts of DSP programs can normally be written in C and only a few performance-critical functions need to be written in Assembler. Of course, an essential prerequisite for this way of software design is a C compiler with target support for the envisaged DSP platform. The TMS320-C6000, henceforth denoted as ‘C6x’, is a DSP series designed by Texas Instruments. Commercial C compilers for the ‘C6x’ are available from the manufacturer. However, as explained below, there would be many advantages in having a compiler available that would be free in the sense defined by the Free Software Foundation [3]. At the time of starting this work, the author was (and still is) not aware of any such compiler being available and therefore deemed it worthwhile to make an effort in this direction. This diploma thesis discusses the implementation of basic support for the C6x platform into GCC 3.3 [11], which is achieved by using the mechanisms provided by GCC for porting to new target platforms. As a prerequisite, fundamental concepts of the inner workings of GCC have to be investigated. The documentation [7] distributed with GCC provides much of the necessary information. The present work uses this information and puts it into the context of the ‘C6x’ architecture. Some details, however, remain unclear in [7] and will be investigated in this thesis by means of the GCC sources. Based on the results of this, it will be discussed which design options were available and which were chosen in implementing a ‘C6x’ backend. Finally, directions for future work will be shown.
- Our comments:
Operating Systems for FPGA Based Computers and Their Memory Management
- Authors: Klaus Danne
- Year: 2004
- Abstract: We introduce the concept of an operating system for platforms that consist beside memory and peripheral devices of FPGAs as the only computational resource. Applications can be developed independent from each other and due to device drivers with little dependency on the platform. The OS supports the multitasking execution of applications using static as well as dynamic resource assignment. A main focus of the paper is the management of the resource memory. Memory management as part of the OS is introduced which allows multiple tasks to access the same memory banks using virtual addressing and dynamic memory allocation. Access conflicts are solved by a priority based scheduling. Since no microprocessor is part of the system, the entire OS including its memory management is executed on the FPGAs.
- Our comments:
Task scheduling for heterogeneous reconfigurable computers
- Authors: Ahmadinia, A. ; Bobda, C. ; Koch, D. ; Majer, M. ; Teich, J. ;
- This paper appears in: Integrated Circuits and Systems Design, 2004. SBCCI 2004. 17th Symposium on
- Year: 2004
- ISSN : 1-58113-947-0
- Abstract: We consider the problem of executing a dynamically changing set of tasks on a reconfigurable system, made upon a processor and a reconfigurable device. Task execution on such a platform is managed by a scheduler that can allocate tasks either to the processor or to the reconfigurable device. The scheduler can be seen as part of an operating system running on the software or as core in the reconfigurable device. For each tasks to be executed on reconfigurable device, an equivalent implementation exists as rectangular block in a database. This block has to be placed on the device at run-time. A placer is responsible for the placement of tasks received from the scheduler on the reconfigurable device. However, the placement of tasks on the reconfigurable device cannot be successful if enough space is not available on the device to hold the task. In this case, the scheduler receive an acknowledgment from the placer and decide either to preempt a running task or to run the task on software. We present in this work an implementation of a placer module as well as investigations on task preemption. The two modules are part of an operating system for reconfigurable system currently under development.
- Our comments:
The MOLEN Polymorphic Processor
- S. Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, E. Moscu Panainte
- This paper appears in: Computers, IEEE Transactions on
- Year: 2004
- ISSN : 0018-9340
- Abstract: In this paper, we present a polymorphic processor paradigm incorporating both general-purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program: In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions, is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.
- Our comments:
2003
GENERIC and GIMPLE: A New Tree Representation for Entire Functions
- Authors: Jason Merrill
- Year: 2003
- Abstract: The tree SSA project requires a tree representation of functions for the optimizers to operate on. There was an existing functions-as-trees representation shared by the C and C++ front ends, and another used by the Java front end, but neither was adequate for use in optimization. In this paper, we will discuss the design of GENERIC, the new language-independent tree representation, and GIMPLE, the reduced subset used during optimization.
- Our comments:
Asynchronous PipeRench: architecture and performance evaluations
- Authors: Kagotani, H. ; Schmit, H. ;
- This paper appears in: Field-Programmable Custom Computing Machines, 2003. FCCM 2003. 11th Annual IEEE Symposium on
- Year: 2003
- ISBN: 0-7695-1979-2
- Abstract: PipeRench is a configurable architecture that has the unique ability to virtualize an application using dynamic reconfiguration. This paper investigates the potential benefits and costs of implementing this architecture using an asynchronous methodology. Since clock distribution and gating are relatively easy in the synchronous PipeRench, we focus on the benefit due to decreased timing pessimism in an asynchronous implementation. Two architectures for fully asynchronous implementation are considered. PE-based asynchronous implementation yields approximately 80% improvement in performance per stripe. This implementation, however, requires significant increases in configuration storage and wire count. A few particular features of the architecture, such as the crossbar interconnect structure within the stripe, are primarily responsible for this growth in configuration bits and wires. These features, however, are the primary aspects of the PipeRench architecture that make it a good compilation target.
- Our comments:
2002
Research Issues in Operating Systems for Reconfigurable Computing
- Authors: Grant B. Wigley and David A. Kearney
- This paper appears in: In Proceedings of the International Conference on Engineering of Reconfigurable System and Algorithms(ERSA)
- Year: 2002
- Abstract: As the number of system gates available on reconfigurable platforms increase beyond 20 million, the issue of the management of these resources and their sharing among may applications and users will become more of a concern. In this paper we describe the research issues for managing these resources in an operating system for a reconfigurable computer. We also detail a feasible set of components for the operating system and a feasible software architecture We show there is no current operating system implementation with these components. We propose a number of performance metrics which we believe are important measures of the quality of an operating system implementation. These include fragmentation of area, algorithm performance and application performance. We complete the paper with a status report on our implementation of an operating system for a reconfigurable computer.
- Our comments:
PipeRench: A virtualized programmable datapath in 0.18 micron technology
- Authors: Schmit, H. ; Whelihan, D. ; Tsai, A. ; Moe, M. ; Levine, B. ; Reed Taylor, R. ;
- This paper appears in: Custom Integrated Circuits Conference, 2002. Proceedings of the IEEE 2002
- Year: 2002
- ISBN: 0-7803-7250-6
- Abstract: PipeRench is a programmable datapath that can be used to accelerate numerically intensive applications. The unique aspect of PipeRench is its ability to virtualize hardware through self-managed dynamic reconfiguration. This capability provides application portability and scalability without redesign or recompilation. This paper describes the implementation of PipeRench in a 0.18 micron process. The implementation has 3.65 million transistors and runs at 120 MHz. Performance is competitive with high-end commercial DSP architectures and more than five times faster than a commercial microprocessor. Executing at 33 MHz, an FIR filter without virtualization consumes 519 mW. When virtualization is required, the implementation consumes approximately 675 mW.
- Our comments:
2001
Optimization of Dynamic Hardware Reconfigurations
- Authors: Jürgen Teich and Sándor P. Fekete and Jörg Schepers
- This paper appears in: The Journal of Supercomputing Volume 19, Number 1 / May, 2001
- Year: 2001
- ISBN: 0920-8542 (Print) 1573-0484 (Online)
- Abstract: Recent generations of Field Programmable Gate Arrays (FPGA) allow the dynamic reconfiguration of cells on the chip during run-time. For a given problem consisting of a set of tasks with computation requirements modeled by rectangles of cells, several optimization problems such as finding the array of minimal size to accomplish the tasks within a given time limit are considered. Existing approaches based on ILP formulations to solve these problems as multi-dimensional packing problems turn out not to be applicable for problem sizes of interest. Here, a breakthrough is achieved in solving these problems to optimality by using the new notion of packing classes. It allows a significant reduction of the search space such that problems of the above type may be solved exactly using a special branch-and-bound technique. We validate the usefulness of our method by providing computational results.
- Our comments:
2000
A review of high-level synthesis for dynamically reconfigurable FPGAs
- Authors: Xuejie Zhanga and Kam W. Nga
- This paper appears in: Microprocessors and Microsystems Volume 24, Issue 4, 1 August 2000, Pages 199-211
- Year: 2000
- ISSN: 0141-9331
- Abstract: Dynamically Reconfigurable Field Programmable Gate Arrays (DR FPGAs) change many of the basic assumptions of what hardware is. DR FPGA-based dynamically reconfigurable computing has become a powerful methodology for achieving high performance while minimizing the resource required in the implementation of many applications. The key to harnessing the power of DR FPGAs for most applications is to develop high-level synthesis tools for transforming automatically an algorithmic level behavioral specification into DR FPGA configurations. In this paper we survey the current state-of-the-art in high-level synthesis techniques for dynamically reconfigurable systems. The differences in high-level synthesis technology between classical systems and dynamically reconfigurable systems are discussed. Then, we describe the basic tasks in the high-level synthesis of dynamically reconfigurable systems. Finally, techniques that have been developed in the past few years for the high-level synthesis of dynamically reconfigurable systems are presented.
- Our comments:
Multitasking on FPGA Coprocessors
- Authors: Simmler, H. and Levinson, L. and Manner, Reinhard
- This paper appears in: FPL '00: Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
- Year: 2000
- ISBN: 3-540-67899-9
- Abstract: Multitasking on an FPGA-based processor is one possibility to explore the eficacy of reconfigurable computing. Conventional computers and operating systems have demonstrated the many advantages of sharing computational hardware by several tasks over time. The ability to do run-time configuration and readback of FPGAs in a coprocessor architecture allows investigating the problems of implementing realistic multitasking. This paper explores the control software required to support task switching for an application split over the host processor { coprocessor boundary as well as the requirements and features of context saving and restoring in the FPGA coprocessor context. An FPGA coprocessor designed especially to support multitasking of such applications is described.
- Our comments:
Static and Dynamic Reconfigurable Designs for a 2D Shape-Adaptive DCT
- Authors: Gause, Jorn and Cheung, Peter Y. K. and Luk, Wayne
- This paper appears in: Lecture Notes In Computer Science; Vol. 1896 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications table of contents
- Year: 2000
- ISSN: 3-540-67899-9
- Abstract: This paper presents two reconfigurable design approaches for a two dimensional Shape-Adaptive Discrete Cosine Transform (2D SA-DCT). The SA-DCT is an example of a new type of multimedia video processing algorithm where the computations performed are data dependent. A static design, where the configuration does not change during execution of the task, is presented. The use of a data dependence graph (DDG) is proposed which represents the computations and input signals required to calculate a particular output signal depending on a variable input parameter. By re-structuring the DDG and exploiting possible sharing of FPGA resources for different entities within the SA-DCT, it is demonstrated that the area required for an implementation can be significantly reduced. An alternative dynamic approach is also introduced where the FPGA’s configuration may change over time. This is well suited to using dynamically reconfigurable logic but suffers from long reconfiguration time if current FPGAs are used.
- Our comments:
1999
Temporal partitioning and scheduling data flow graphs for reconfigurable computers
- Authors: Purna, K.M.G. ; Bhatia, D. ;
- This paper appears in: Computers, IEEE Transactions on
- Year: 1999
- ISSN: 0018-9340
- Abstract: FPGA-based configurable computing machines are evolving rapidly. They offer the ability to deliver very high performance at a fraction of the cost when compared to supercomputers. The first generation of configurable computers (those with multiple FPGAs connected using a specific interconnect) used statically reconfigurable FPGAs. On these configurable computers, computations are performed by partitioning an entire task into spatially interconnected subtasks. Such configurable computers are used in logic emulation systems and for functional verification of hardware. In general, configurable computers provide the ability to reconfigure rapidly to any desired custom form. Hence, the available resources can be reused effectively to cut down the hardware costs and also improve the performance. In this paper, we introduce the concept of temporal partitioning to partition a task into temporally interconnected subtasks. Specifically, we present algorithms for temporal partitioning and scheduling data flow graphs for configurable computers. We are given a configurable computing unit (RPU) with a logic capacity of SRPU and a computational task represented by an acyclic data flow graph G=(V, E). Computations with logic area requirements that exceed SRPU cannot be completely mapped on a configurable computer (using traditional spatial mapping techniques). However, a temporal partitioning of the data flow graph followed by proper scheduling can facilitate the configurable computer based execution. Temporal partitioning of the data flow graph is a k-way partitioning of G=(V, E) such that each partitioned segment will not exceed SRPU in its logic requirement. Scheduling assigns an execution order to the partitioned segments so as to ensure proper execution. Thus, for each segment in {s1,s2,...,s k}, scheduling assigns a unique ordering Si-j,1⩽i⩽k,1⩽j⩽k, such that the computation would execute in proper sequential order as defined by the flow graph G=(V, E)
- Our comments:
1997
VHDL Modeling of Fast Dynamic Reconfiguration on Novel Multicontext RAM-based Field Programmable Devices
- Authors: Julio Faura , Juan Manuel Moreno , Jordi Madrenas , Josep Maria Insenser
- This paper appears in: n VHDL User's Forum in Europe
- Year: 1997
- Abstract: We describe in this paper how VHDL greatly helps in modeling the dynamic reconfiguration of a novel Field Programmable Device (FPD) and the applications especially suitable for it. This dynamic reconfiguration methodology is based on multicontext operation, that is, having several copies of the configuration memory controlling the programmable features of the device. Configuration dynamic management is carried out by an onchip microprocessor. We model all these interface operations - programmable hardware, configuration memory and internal microprocessor - with behavioral VHDL benches acting over the configuration signals on achitecturally described VHDL blocks. We also describe how a novel analysis and synthesis tool could be written to map general purpose applications on dynamically reconfigurable arrays like the one we present.
- Our comments:
Incremental reconfiguration for pipelined applications
- Authors: Schmit, H.
- This paper appears in: FPGAs for Custom Computing Machines, 1997. Proceedings., The 5th Annual IEEE Symposium on
- Year: 1997
- ISBN: 0-8186-8159-4
- Abstract: This paper examines the implementation of pipelined applications using run-time reconfiguration. Throughput and latency of pipelined applications can be significantly improved when reconfiguration is performed at the level of individual pipeline stages, as opposed to configuration of the entire FPGA. If reconfiguration and execution can be performed simultaneously, the performance of a pipelined application approaches its theoretical maximum. This paper proposes a new FPGA configuration mechanism, called striping, that supports pipeline stage reconfiguration and simultaneous configuration and execution. Additionally, the use of the pipeline stage as the atomic unit of reconfiguration introduces a design abstraction that enables the development families of upwardly-compatible FPGAs and virtual hardware design.
- Our comments: