Papers

From REuP Project

Revision as of 10:23, 15 July 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Contents

Research Issues in Operating Systems for Reconfigurable Computing

  • Authors: Grant B. Wigley and David A. Kearney
  • This paper appears in: In Proceedings of the International Conference on Engineering of Reconfigurable System and Algorithms(ERSA)
  • Year: 2002
  • Abstract: As the number of system gates available on reconfigurable platforms increase beyond 20 million, the issue of the management of these resources and their sharing among may applications and users will become more of a concern. In this paper we describe the research issues for managing these resources in an operating system for a reconfigurable computer. We also detail a feasible set of components for the operating system and a feasible software architecture We show there is no current operating system implementation with these components. We propose a number of performance metrics which we believe are important measures of the quality of an operating system implementation. These include fragmentation of area, algorithm performance and application performance. We complete the paper with a status report on our implementation of an operating system for a reconfigurable computer.
  • Our comments:
Download

ReConfigME: A Detailed Implementation of an Operating System for Reconfigurable Computing

  • Authors: Grant Wigley, David Kearney and Mark Jasiunas
  • This paper appears in: Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International Issue Date : 25-29 April 2006 On page(s): 8 pp.
  • ISBN: 1-4244-0054-6
  • Year: 2006
  • Abstract: Reconfigurable computing applications have traditionally had the exclusive use of the field programmable gate array, primarily because the logic densities of the available devices have been relatively similar in size compared to the application. But with the modern FPGA expanding beyond 10 million system gates, and through the use of dynamic reconfiguration, it has become feasible for several applications to share a single high density device. However, developing applications that share a device is difficult as the current design flow assumes the exclusive use of the FPGA resources. As a consequence, the designer must ensure that resources have been allocated for all possible combinations of loaded applications at design time. If the sequence of application loading and unloading is not known in advance, all resource allocation cannot be performed at design time because the availability of resources changes dynamically. In this paper, we present an implementation of an operating system that has the ability to share its FPGA resources dynamically among multiple executing applications.
  • Our comments:
Download

Operating Systems for FPGA Based Computers and Their Memory Management

  • Authors: Klaus Danne
  • Year: 2004
  • Abstract: We introduce the concept of an operating system for platforms that consist beside memory and peripheral devices of FPGAs as the only computational resource. Applications can be developed independent from each other and due to device drivers with little dependency on the platform. The OS supports the multitasking execution of applications using static as well as dynamic resource assignment. A main focus of the paper is the management of the resource memory. Memory management as part of the OS is introduced which allows multiple tasks to access the same memory banks using virtual addressing and dynamic memory allocation. Access conflicts are solved by a priority based scheduling. Since no microprocessor is part of the system, the entire OS including its memory management is executed on the FPGAs.
  • Our comments:
Download

An Adaptable Task Manager for Reconfigurable Architecture Kernels

  • Authors: Yuriy Shiyanovskii, Francis Wolff, Chris Papachristou, Dan Weyer
  • This paper appears in: Adaptive Hardware and Systems, 2009. AHS 2009. NASA/ESA Conference on
  • Year: 2009
  • Abstract: Self-reconfigurable hardware is a new emerging technology which will enable adaptation of computing systems to changing environments.This paper deals with the design of architecture kernels for an autonomous on-board system and the development of an adaptation manager for real-time scheduling of the reconfigurable hardware fabric.Our approach employs a reconfigurable computer architecture with two key layers: the adaptation manager and the real time configuration kernel. This provides significant advantages in terms of flexibility, scalability, cost, and compatibility with embedded technology. Some preliminary results are presented.
  • Our comments:
Download

VEBoC: Variation and Error-Aware Design for Billions of Devices on a Chip

  • Authors: Shoaib Akram, Scott Cromar, Gregory Lucas, Alexandros Papakonstantinou, Deming Chen
  • This paper appears in: Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific
  • Year: 2008
  • ISBN: 978-1-4244-1921-0
  • Description: Zalaczony artykul przedstawia cos podobnego do tego, co chcemy zrealizowac, choc troche innego w koncepcji - application-specific multicores.
  • Abstract: Billions of devices on a chip is around the corner and the trend of deep submicron (DSM) technology scaling will continue for at least another decade. Meanwhile, designers also face severe on-chip parameter variations, soft/hard errors, and high leakage power. How to use these billions of devices to deliver power-efficient, high-performance, and yet error-resilient computation is a challenging task. In this paper, we attempt to demonstrate some of our perspectives to address these critical issues. We elaborate on variation-aware synthesis, holistic error modeling, reliable multicore, and synthesis for application-specific multicore. We also present some of our insights for future reliable computing.
  • Our comments:
Download

A Unified Hardware/Software Runtime Environment for FPGABased Reconfigurable Computers using BORPH

  • Authors: Hayden KwokHay So, Artem Tkachenko and Robert Brodersen
  • This paper appears in: Hardware/Software Codesign and System Synthesis, 2006. CODES+ISSS '06. Proceedings of the 4th International Conference
  • Year: 2006
  • Abstract: This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's). By providing native kernel support for FPGA hardware, BORPH offers a homogeneous UNIX interface for both software and hardware processes. Hardware processes inherit the same level of service from the kernel, such as file system support, as typical UNIX software processes. Hardware and software components of a design therefore run as hardware and software processes within BORPH's run-time environment. The familiar and language independent UNIX kernel interface facilitates easy design reuse and rapid application development. Performance of our current implementation and our experience with developing a real-time wireless digital signal processing system based on BORPH will be presented.
  • Our comments:
Download

Generating the Communications Infrastructure for Module-based Dynamic Reconfiguration of FPGAs

  • Authors: Shannon Koh
  • Year: 2003
  • File: download
  • Abstract: I would like to thank my supervisor, Dr. Oliver Diessel, for his unwavering support in this project. His supervision was exemplary, he made the entire experience of getting a PhD rich and full, and he encouraged me to think critically in ways I would not have previously imagined. I would also like to thank my co-supervisor Prof. Sri Parameswaran for his excellent insights. I would like to thank my wife, Molly Hu, for all her support and understanding throughout the pursuit of my degree, especially at the most critical moments. I would also like to thank my mother who encouraged me to pursue my PhD and supported me throughout. I would like to thank all of my fellow PhD students on the 5th floor in the Architecture Group, especially Jorgen Peddersen, who, not only being the best friend one might have, also encouraged me to think. The rest of my fellow students Jeremy Chan, Krutartha Patel, Anjelo Ambrose, Carol He, Michael Chong all made my research experience in UNSW Sydney the best anyone could have. Last but not least I would like to thank the Australian Government for the Australian Postgraduate
  • Our comments:
Download

The MOLEN Polymorphic Processor

  • S. Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, E. Moscu Panainte
  • This paper appears in: Computers, IEEE Transactions on
  • Year: 2004
  • ISSN : 0018-9340
  • Abstract: In this paper, we present a polymorphic processor paradigm incorporating both general-purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program: In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions, is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.
  • Our comments:
Download

Custom Processor Design Using NISC: A Case-Study on DCT algorithm

  • Authors: Bita Gorjiara, Daniel Gajski
  • This paper appears in: Embedded Systems for Real-Time Multimedia, 2005. 3rd Workshop on
  • Year: 2005
  • Abstract: Designing application-specific instruction-set processors (ASIPs) usually requires designing a custom datapath, and modifying instruction-set, instruction decoder, and compiler. A new alternative to ASIPs is no-instruction-set-computers (NISCs) that eliminate the instruction abstraction by compiling programs directly to a given datapath. The compiler analyzes the datapath and extracts possible operations and data flows. The NISC approach simplifies and accelerates the task of custom processor design. In this paper, we present a case-study of designing a custom datapath for a 2D DCT algorithm. We applied several optimization techniques such as software transformations, operation chaining, datapath pipelining, controller pipelining, and functional unit customization to improve the quality of the design. Most of the techniques are general and can be applied to other applications. The result of synthesizing our final custom datapath on a Xilinx FPGA shows 7.14 times performance improvement, 1.64 times power reduction, 12.5 times energy savings, and more than 3 times area reduction compared to a softcore MIPS implementation.
  • Our comments: No-Instruction-Set-Computers (NISCs) that eliminate the instruction abstraction by compiling programs directly to a given datapath.
Download

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

  • Authors: Jason Merrill
  • Year: 2003
  • Abstract: The tree SSA project requires a tree representation of functions for the optimizers to operate on. There was an existing functions-as-trees representation shared by the C and C++ front ends, and another used by the Java front end, but neither was adequate for use in optimization. In this paper, we will discuss the design of GENERIC, the new language-independent tree representation, and GIMPLE, the reduced subset used during optimization.
  • Our comments:
Download

An empirical comparison of ANSI-C to VHDL compilers:SPARK, ROCCC and DWARV

  • Authors: Arcilio J. Virginia, Yana D. Yankova, Koen L.M. Bertels
  • Year: 2007
  • Abstract: Custom computing machines combine the flexibility of the general purpose processor with the high performance of the application tailored hardware. This combination results in a computing platform that allows for performance improvements to a wide range of applications. Nevertheless in order to develop applications for such a platform, a software designer currently needs also hardware design skills. In order to hide the hardware details from the designers and to assist in the non-trivial development process sevreral tools that translate ANSI-C to VHDL are proposed. In this paper we compare three such tools, SPARK, ROCCC and DWARV. The comparison is based on the performance of designs generated by the compilers and the experiences gained during the acquisition of empirical data.
  • Our comments:
Download

Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths

  • Authors: Mehrdad Reshadi, Bita Gorjiara, Daniel Gajski
  • This paper appears in: Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. Proceedings. 2005 IEEE International Conference on
  • Year: 2005
  • ISBN: 0-7695-2451-6
  • Abstract: Performance of programs can be improved by utilizing their horizontal and vertical parallelism. In some processors (VLIW based), compiler can utilize horizontal parallelism by controlling the schedule of independent operations. Vertical parallelism is utilized through pipelining. However, in all processors, structure of pipeline is fixed and compiler has no control over it. In application-specific-instruction set-processors (ASIPs), pipeline structure can be customized and utilized in the program through custom instructions. Practical constraints on the instruction decoder limit the number and complexity of custom instructions in ASIPs. Detecting the frequent and beneficial custom instructions and incorporating them in the compiler are complex and sometimes very time consuming tasks. In this paper, we present an architecture that does not limit the number of custom functionalities that can be implemented on its datapath. Instead of using custom instructions and then relying on the decoder in hardware to generate the control signals, we generate the control signal values in compiler. Since there are no predefined instructions in this architecture, we call it no-instruction-set-computer (NISC). The NISC compiler maps the application directly on the datapath. It has complete fine grain control over datapath and hence can very well utilize resources in the hardware as well as horizontal and vertical parallelism in the program. We also explain the algorithm for mapping the CDFG of a program on a given datapath in NISC. Using our algorithm and a NISC architecture with the datapath of a MIPS, we achieved up to 70% speedup over the traditional MIPS compiler. In another experiment, we started from a base architecture and customized it by adding resources and interconnect to increase its horizontal and vertical parallelism. The algorithm achieved up to 15.5 times speedup by utilizing the available parallelism in the program and the datapath.
  • Our comments:
Download

FPGA-friendly Code Compression for Horizontal Microcoded Custom IPs

  • Authors: Bita Gorjiara, Daniel Gajski
  • This paper appears in: International Symposium on Field Programmable Gate Arrays Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
  • Year: 2007
  • ISBN: 978-1-59593-600-4
  • Abstract: Shrinking time-to-market and high demand for productivity has driven traditional hardware designers to use design methodologies that start from high-level languages. However, meeting timing constraints of automatically generated IPs is often a challenging and time-consuming task that must be repeated every time the specification is modified. To address this issue, a new generation of IP-design technologies that is capable of generating custom datapaths as well as programming an existing one is developed. These technologies are often based on Horizontal Microcoded Architectures. Large code size is a well-know problem in HMAs, and is referred to as "code bloating" problem.In this paper, we study the code size of one of the new HMA-based technologies called NISC. We show that NISC code size can be several times larger than a typical RISC processor, and we propose several low-overhead dictionary-based code compression techniques to reduce the code size. Our compression algorithm leverages the knowledge of "don't care" values in the control words to better compress the content of dictionary memories. Our experiments show that by selecting proper memory architectures the code size of NISC can be reduced by 70% (i.e. 3.3 times) at cost of only 9% performance degradation. We also show that some code compression techniques may increase number of utilized block RAMs in FPGA-based implementations. To address this issue, we propose combining dictionaries and implementing them using embedded dual-port memories.
  • Our comments:
Download

A Middleware Aided Robust and Fault Tolerant Dynamic Reconfigurable Architecture

  • Authors: Dadji Y., Osterloh B., Michalik H.
  • This paper appears in: Reconfigurable Mechanisms and Robots, 2009. ReMAR 2009. ASME/IFToMM International Conference on
  • Year: 2009
  • ISBN: 978-88-89007-37-2
  • Abstract: Dynamic reconfiguration enhances embedded system with at run-time adaptive functionality and is an improvement in terms of resource utilization and system adaptability. SRAM-based FPGAs provides a dynamic reconfigurable platform with high logic density. The requirements for such an embedded high flexible system based on FPGAs are robustness and reliability to prevent operation interrupts or even system failures. The complexity of a dynamic reconfigurable system with adaptive processing module demands high effort for the user. Therefore a high level abstraction of the communication issues is required to support application development by an appropriate middleware. To achieve such a flexible embedded system we present our network-on-chip (NoC) approach system-on-chip wire (SoCWire) and outline its performance and suitability for robust dynamic reconfigurable systems. Furthermore we introduce a suitable embedded middleware concept to support the system reconfiguration and the software application development process.
  • Our comments:
Download

A review of high-level synthesis for dynamically reconfigurable FPGAs

  • Authors: Xuejie Zhanga and Kam W. Nga
  • This paper appears in: Microprocessors and Microsystems Volume 24, Issue 4, 1 August 2000, Pages 199-211
  • Year: 2000
  • ISSN: 0141-9331
  • Abstract: Dynamically Reconfigurable Field Programmable Gate Arrays (DR FPGAs) change many of the basic assumptions of what hardware is. DR FPGA-based dynamically reconfigurable computing has become a powerful methodology for achieving high performance while minimizing the resource required in the implementation of many applications. The key to harnessing the power of DR FPGAs for most applications is to develop high-level synthesis tools for transforming automatically an algorithmic level behavioral specification into DR FPGA configurations. In this paper we survey the current state-of-the-art in high-level synthesis techniques for dynamically reconfigurable systems. The differences in high-level synthesis technology between classical systems and dynamically reconfigurable systems are discussed. Then, we describe the basic tasks in the high-level synthesis of dynamically reconfigurable systems. Finally, techniques that have been developed in the past few years for the high-level synthesis of dynamically reconfigurable systems are presented.
  • Our comments:
Download

Static and Dynamic Reconfigurable Designs for a 2D Shape-Adaptive DCT

  • Authors: Gause, Jorn and Cheung, Peter Y. K. and Luk, Wayne
  • This paper appears in: Lecture Notes In Computer Science; Vol. 1896 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications table of contents
  • Year: 2000
  • ISSN: 3-540-67899-9
  • Abstract: This paper presents two reconfigurable design approaches for a two dimensional Shape-Adaptive Discrete Cosine Transform (2D SA-DCT). The SA-DCT is an example of a new type of multimedia video processing algorithm where the computations performed are data dependent. A static design, where the configuration does not change during execution of the task, is presented. The use of a data dependence graph (DDG) is proposed which represents the computations and input signals required to calculate a particular output signal depending on a variable input parameter. By re-structuring the DDG and exploiting possible sharing of FPGA resources for different entities within the SA-DCT, it is demonstrated that the area required for an implementation can be significantly reduced. An alternative dynamic approach is also introduced where the FPGA’s configuration may change over time. This is well suited to using dynamically reconfigurable logic but suffers from long reconfiguration time if current FPGAs are used.
  • Our comments:
Download

Runtime Filesystem Support for Reconfigurable FPGA Hardware Processes in BORPH

  • Authors: Hayden Kwok-Hay So and Robert Brodersen
  • This paper appears in: Field-Programmable Custom Computing Machines, 2008. FCCM '08. 16th International Symposium on
  • Year: 2008
  • ISBN: 978-0-7695-3307-0
  • Abstract: This paper presents the design of BORPH's file system layer for FPGA-based reconfigurable computers. BORPH provides user FPGA designs that execute as hardware processes access to the general file system using familiar UNIX file I/O semantics. Such capability provides FPGA designers an intuitive interface not only for regular file I/O, but also for representing streaming hardware/software and hardware/hardware communication using UNIX pipes. Design trade-offs among system manageability, user usability and application performance are explored. A case of mixed hardware/software video processing is presented as a proof-of-concept.
  • Our comments:
Download

Optimization of Dynamic Hardware Reconfigurations

  • Authors: Jürgen Teich and Sándor P. Fekete and Jörg Schepers
  • This paper appears in: The Journal of Supercomputing Volume 19, Number 1 / May, 2001
  • Year: 2001
  • ISBN: 0920-8542 (Print) 1573-0484 (Online)
  • Abstract: Recent generations of Field Programmable Gate Arrays (FPGA) allow the dynamic reconfiguration of cells on the chip during run-time. For a given problem consisting of a set of tasks with computation requirements modeled by rectangles of cells, several optimization problems such as finding the array of minimal size to accomplish the tasks within a given time limit are considered. Existing approaches based on ILP formulations to solve these problems as multi-dimensional packing problems turn out not to be applicable for problem sizes of interest. Here, a breakthrough is achieved in solving these problems to optimality by using the new notion of packing classes. It allows a significant reduction of the search space such that problems of the above type may be solved exactly using a special branch-and-bound technique. We validate the usefulness of our method by providing computational results.
  • Our comments:
Download

Multitasking on FPGA Coprocessors

  • Authors: Simmler, H. and Levinson, L. and Manner, Reinhard
  • This paper appears in: FPL '00: Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
  • Year: 2000
  • ISBN: 3-540-67899-9
  • Abstract: Multitasking on an FPGA-based processor is one possibility to explore the eficacy of reconfigurable computing. Conventional computers and operating systems have demonstrated the many advantages of sharing computational hardware by several tasks over time. The ability to do run-time configuration and readback of FPGAs in a coprocessor architecture allows investigating the problems of implementing realistic multitasking. This paper explores the control software required to support task switching for an application split over the host processor { coprocessor boundary as well as the requirements and features of context saving and restoring in the FPGA coprocessor context. An FPGA coprocessor designed especially to support multitasking of such applications is described.
  • Our comments:
Download

Incremental reconfiguration for pipelined applications

  • Authors: Schmit, H.
  • This paper appears in: FPGAs for Custom Computing Machines, 1997. Proceedings., The 5th Annual IEEE Symposium on
  • Year: 1997
  • ISBN: 0-8186-8159-4
  • Abstract: This paper examines the implementation of pipelined applications using run-time reconfiguration. Throughput and latency of pipelined applications can be significantly improved when reconfiguration is performed at the level of individual pipeline stages, as opposed to configuration of the entire FPGA. If reconfiguration and execution can be performed simultaneously, the performance of a pipelined application approaches its theoretical maximum. This paper proposes a new FPGA configuration mechanism, called striping, that supports pipeline stage reconfiguration and simultaneous configuration and execution. Additionally, the use of the pipeline stage as the atomic unit of reconfiguration introduces a design abstraction that enables the development families of upwardly-compatible FPGAs and virtual hardware design.
  • Our comments:
Download

Hardware Supported Task Scheduling on Dynamically Reconfigurable SoC Architectures

  • Authors: Zexin Pan and Wells, B.E.
  • This paper appears in: Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
  • Year: 2008
  • ISBN: 1063-8210
  • Abstract: Dynamically reconfigurable system-on-a-chip (RSoC) technology features embedded microprocessors that are dispersed on the same die with significant amounts of programmable logic fabric. In this paper, we present a strategy to solve the recently emerging problem of how to utilize the flexible but still limited RSoC resources in an effective manner for a multi-task application. The major contribution of this paper is the development of a dynamic task scheduling algorithm that can be implemented in fixed or reconfigurable hardware that will perform the online scheduling of task systems onto the RSoC type architecture. The results from extensive simulations demonstrate the benefits of the proposed dynamic scheduling approach as compared with that of other static scheduling techniques taken from the technical literature.
  • Our comments:
Download

A Hardware Resource Management System for Adaptive Computing on Dynamically Reconfigurable Devices

  • Authors: Toshiyuki Ito  ; Kazuya Mishou  ; Yuichi Okuyama  ; Kenichi Kuroda  ;
  • This paper appears in: Frontier of Computer Science and Technology, 2006. FCST '06. Japan-China Joint Workshop on
  • Year: 2006
  • ISBN: 0-7695-2721-3
  • Abstract: This paper proposes a realization method of the computer system with dynamical hardware-resource allocation on dynamically reconfigurable devices. The system consists of two or more parts and they can change the number of processing units according to each processing load. In the system, there is a competition problem between these parts. In order to solve this problem, we investigate required functions of resource management units on a simple processing model. This model is an adapted load balancing model consisting of an upper management unit, two management units and processing units shared by them.
  • Our comments:
Download

The Design and Implementation of Hardware Task Configuration Management Unit on Dynamically Reconfigurable SoC

  • Authors: Xiao-Wei Wang  ; Wei-Nan Chen  ; Ying Wang  ; Hong-Jun You  ; Cheng-Lian Peng  ;
  • This paper appears in: Embedded Software and Systems, 2009. ICESS '09. International Conference on
  • Year: 2009
  • ISBN: 978-1-4244-4359-8
  • Abstract: The advantages and the flexibility introduced into the hardware implementation by partial dynamic reconfiguration have rapidly changed the design flow of embedded systems. Configuration management is an important issue in operating system for dynamically reconfigurable system-on-chip. Reconfiguration overhead affects the performance of reconfigurable system. This paper presents a hardware implemented efficient configuration management unit. Wet define our operating system framework based on unified multitask programming model for reconfigurable system-on-chip at first. Then the detailed design and implementation of the configuration management unit are given. Finally a use case is presented, which shows the efficiency of the configuration.
  • Our comments:
Download

Application Acceleration with the Explicitly Parallel Operations System - the EPOS Processor

  • Authors: Papakonstantinou, A.  ; Deming Chen  ; Wen-Mei Hwu  ;
  • This paper appears in: Application Specific Processors, 2008. SASP 2008. Symposium on
  • Year: 2008
  • ISBN: 978-1-4244-2333-0
  • Abstract: Different approaches have been proposed over the years for automatically transforming high-level-languages (HLL) descriptions of applications into custom hardware implementations. Most of these approaches however are confined by basic block level parallelism described within the CDFGs (control-data flow graphs). In this work we propose a new high-level synthesis flow which can leverage instruction-level parallelism (ILP) beyond the boundary of the basic blocks. We extract statistical parallelism from the applications through the use of Superblocks and Hyperblocks formed by advanced front-end compilation techniques. The output of the front-end compilation is then used in our high-level synthesis in order to map the application onto a new domain-specific architecture named EPOS (explicitly parallel operations system). EPOS is a stylized micro-code driven processor equipped with novel architectural features that help take advantage of the instruction-level parallelism generated in the front-end compilation. A novel forwarding-path optimization engine is also employed during the high-level synthesis flow in order to minimize the long interconnection wires and the multiplexers in the processor. To evaluate the EPOS processor, we compare its performance with a previous domain-specific processor NISC on a common set of benchmarks. Experimental results show that significant performance gain (3.45times on average) is obtained compared to NISC.
  • Our comments:
Download

VHDL Modeling of Fast Dynamic Reconfiguration on Novel Multicontext RAM-based Field Programmable Devices

  • Authors: Julio Faura , Juan Manuel Moreno , Jordi Madrenas , Josep Maria Insenser
  • This paper appears in: n VHDL User's Forum in Europe
  • Year: 1997
  • Abstract: We describe in this paper how VHDL greatly helps in modeling the dynamic reconfiguration of a novel Field Programmable Device (FPD) and the applications especially suitable for it. This dynamic reconfiguration methodology is based on multicontext operation, that is, having several copies of the configuration memory controlling the programmable features of the device. Configuration dynamic management is carried out by an onchip microprocessor. We model all these interface operations - programmable hardware, configuration memory and internal microprocessor - with behavioral VHDL benches acting over the configuration signals on achitecturally described VHDL blocks. We also describe how a novel analysis and synthesis tool could be written to map general purpose applications on dynamically reconfigurable arrays like the one we present.
  • Our comments:
Download

Invited Paper: Enhanced Architectures, Design Methodologies and CAD Tools for Dynamic Reconfiguration of Xilinx FPGAs

  • Authors: Lysaght, P.  ; Blodget, B.  ; Mason, J.  ; Young, J.  ; Bridgford, B.  ;
  • This paper appears in: Field Programmable Logic and Applications, 2006. FPL '06. International Conference on
  • Year: 2006
  • ISBN: 1-4244-0312-X
  • Abstract: The paper describes architectural enhancements to Xilinx FPGAs that provide better support for the creation of dynamically reconfigurable designs. These are augmented by a new design methodology that uses pre-routed IP cores for communication between static and dynamic modules and permits static designs to route through regions otherwise reserved for dynamic modules. A new CAD tool flow to automate the methodology is also presented. The new tools initially target the Virtex-II, Virtex-II Pro and Virtex-4 families and are derived from Xilinx's commercial CAD tools.
  • Our comments:
Download

A Unified HW/SW Operating System for Partially Runtime Reconfigurable FPGA based Computer Systems

  • Authors: Qingxu Deng, Yi Zhang, Nan Guan1 and Zonghua Gu
  • Year: 2008
  • Abstract: Partially Runtime-Reconfigurable (PRTR) FPGAs allow hardware tasks to be placed and removed dynamically at runtime. We present an OS for hybrid computing systems consisting of both CPUs and PRTR FPGAs. The OS is based on Linux, and provides unified interfaces for both HW and SW processes to ease the design of such hybrid systems. The scheduler of HW processes is implemented on the hardware, to alleviate the performance penalty of the time-consuming HW task scheduling algorithms.
  • Our comments:
Download

PipeRench: A virtualized programmable datapath in 0.18 micron technology

  • Authors: Schmit, H.  ; Whelihan, D.  ; Tsai, A.  ; Moe, M.  ; Levine, B.  ; Reed Taylor, R.  ;
  • This paper appears in: Custom Integrated Circuits Conference, 2002. Proceedings of the IEEE 2002
  • Year: 2002
  • ISBN: 0-7803-7250-6
  • Abstract: PipeRench is a programmable datapath that can be used to accelerate numerically intensive applications. The unique aspect of PipeRench is its ability to virtualize hardware through self-managed dynamic reconfiguration. This capability provides application portability and scalability without redesign or recompilation. This paper describes the implementation of PipeRench in a 0.18 micron process. The implementation has 3.65 million transistors and runs at 120 MHz. Performance is competitive with high-end commercial DSP architectures and more than five times faster than a commercial microprocessor. Executing at 33 MHz, an FIR filter without virtualization consumes 519 mW. When virtualization is required, the implementation consumes approximately 675 mW.
  • Our comments:
Download

Asynchronous PipeRench: architecture and performance evaluations

  • Authors: Kagotani, H.  ; Schmit, H.  ;
  • This paper appears in: Field-Programmable Custom Computing Machines, 2003. FCCM 2003. 11th Annual IEEE Symposium on
  • Year: 2003
  • ISBN: 0-7695-1979-2
  • Abstract: PipeRench is a configurable architecture that has the unique ability to virtualize an application using dynamic reconfiguration. This paper investigates the potential benefits and costs of implementing this architecture using an asynchronous methodology. Since clock distribution and gating are relatively easy in the synchronous PipeRench, we focus on the benefit due to decreased timing pessimism in an asynchronous implementation. Two architectures for fully asynchronous implementation are considered. PE-based asynchronous implementation yields approximately 80% improvement in performance per stripe. This implementation, however, requires significant increases in configuration storage and wire count. A few particular features of the architecture, such as the crossbar interconnect structure within the stripe, are primarily responsible for this growth in configuration bits and wires. These features, however, are the primary aspects of the PipeRench architecture that make it a good compilation target.
  • Our comments:
Download

Exploiting Application Data-Parallelism on Dynamically Reconfigurable Architectures: Placement and Architectural Considerations

  • Authors: Banerjee, S.  ; Bozorgzadeh, E.  ; Dutt, N.  ;
  • This paper appears in: Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
  • Year: 2009
  • ISSN : 1063-8210
  • Abstract: Partial dynamic reconfiguration, often called run-time reconfiguration (RTR), is a key feature in modern reconfigurable platforms. In this paper, we present parallelism granularity selection (PARLGRAN), an application mapping approach that maximizes performance of application task chains on architectures with such capability. PARLGRAN essentially selects a suitable granularity of data-parallelism for individual data parallel tasks while considering key issues such as significant reconfiguration overhead and placement constraints. It integrates granularity selection very effectively in a joint scheduling and placement formulation, necessary due to constraints imposed by partial RTR. As a key step to validating PARLGRAN, we additionally present an exact strategy (integer linear programming formulation). We demonstrate that PARLGRAN generates high-quality schedules with: (1) a set of small test cases where we compare our results with the exact strategy; (2) a very large set of synthetic experiments with over a thousand data-points where we compare it with a simpler strategy that tries to statically maximize data-parallelism, i.e., only considers resource availability; and (3) a detailed application case study of JPEG encoding. The application case-study confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism. Last, but very important, we demonstrate that our approach is well-suited for true on-demand computing with detailed execution time estimates on a typical embedded processor. Heuristic execution time is comparable to task execution time, i.e., it is feasible to integrate PARLGRAN in a run-time scheduler for dynamically reconfigurable architectures.
  • Our comments:
Download

Task scheduling for heterogeneous reconfigurable computers

  • Authors: Ahmadinia, A.  ; Bobda, C.  ; Koch, D.  ; Majer, M.  ; Teich, J.  ;
  • This paper appears in: Integrated Circuits and Systems Design, 2004. SBCCI 2004. 17th Symposium on
  • Year: 2004
  • ISSN : 1-58113-947-0
  • Abstract: We consider the problem of executing a dynamically changing set of tasks on a reconfigurable system, made upon a processor and a reconfigurable device. Task execution on such a platform is managed by a scheduler that can allocate tasks either to the processor or to the reconfigurable device. The scheduler can be seen as part of an operating system running on the software or as core in the reconfigurable device. For each tasks to be executed on reconfigurable device, an equivalent implementation exists as rectangular block in a database. This block has to be placed on the device at run-time. A placer is responsible for the placement of tasks received from the scheduler on the reconfigurable device. However, the placement of tasks on the reconfigurable device cannot be successful if enough space is not available on the device to hold the task. In this case, the scheduler receive an acknowledgment from the placer and decide either to preempt a running task or to run the task on software. We present in this work an implementation of a placer module as well as investigations on task preemption. The two modules are part of an operating system for reconfigurable system currently under development.
  • Our comments:
Download

Borph: an operating system for fpga-based reconfigurable computers

  • Authors: So, Hayden Kwok-Hay
  • Year: 2007
  • ISSN : 978-0-549-52957-6
  • Abstract: Reconfigurable computing is a promising technology to meet future computational demand by leveraging flexibilities and the high degree of parallelism found in reconfigurable hardware fabrics, such as field programmable gate arrays (FPGAs). However, despite their promising performance researchers have demonstrated, reconfigurable computers are yet to be widely adopted. One reason is the lack of a common and intuitive operating system for these platforms.This dissertation work explores the design and implementation trade-offs of an operating system for FPGA-based reconfigurable computers, BORPH, the Berkeley Operating system for ReProgrammable Hardware. The goal of this work is to explore and demonstrate the feasibility of providing a systematic and easy to understand view into reconfigurable computers through OS support without incurring significant performance penalties. BORPH provides kernel support for FPGA applications by extending a standard Linux operating system. It establishes the notion of hardware process for executing user FPGA applications. Users therefore compile and execute hardware designs on FPGA resources the same way they run software programs on conventional processor-based systems. BORPH offers run-time general file system support to hardware processes as if they were software. The unified file interface allows hardware and software processes to communicate via standard UNIX file pipes. Furthermore, a virtual file system is built to allow access to memories and registers defined in the FPGA, providing communication links between hardware and software. The functions of BORPH are demonstrated on a BEE2 compute module. Performances of BORPH are measured to identify bottlenecks of our system. The clean OS kernel/user separation of BORPH has allowed us to improve overall system performance without affecting existing user designs. Furthermore, BORPH's unified runtime environment has enabled designers to make fair and end-to-end comparisons among software/hardware implementations of the same application. Most importantly, since the introduction of BORPH to our FPGA-based platform, we have observed increased productivity among high-level application developers who have little experience in FPGA application design.
  • Our comments:
Download

Adapting and automating XILINX's partial reconfiguration flow for multiple module implementations

  • Authors: Rainer Scholz
  • This paper appears in: Lecture Notes In Computer Science Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
  • Year: 2007
  • ISSN : :0302-9743 , 978-3-540-71430-9
  • Abstract: In this paper, we present a modification of XILINX's Partial Reconfiguration Design Flow. Starting with either HDL-Design files or synthesised netlists, the presented flow generates all partial as well as the complete configuration bitstreams. In contrast to the established XILINX design flows, our flow is completely automated by a generator. By checking partial reconfiguration constraints it assists the user to avoid typical errors in module and bus macro placement. Compared with the PlanAhead partial reconfiguration flow, it is a single flow for generating multiple implementation for each reconfigurable area.
  • Our comments:
Download
Personal tools