Papers

From REuP Project

Revision as of 09:21, 14 July 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Contents

Research Issues in Operating Systems for Reconfigurable Computing

  • Authors: Grant B. Wigley and David A. Kearney
  • This paper appears in: In Proceedings of the International Conference on Engineering of Reconfigurable System and Algorithms(ERSA)
  • Year: 2002
  • Abstract: As the number of system gates available on reconfigurable platforms increase beyond 20 million, the issue of the management of these resources and their sharing among may applications and users will become more of a concern. In this paper we describe the research issues for managing these resources in an operating system for a reconfigurable computer. We also detail a feasible set of components for the operating system and a feasible software architecture We show there is no current operating system implementation with these components. We propose a number of performance metrics which we believe are important measures of the quality of an operating system implementation. These include fragmentation of area, algorithm performance and application performance. We complete the paper with a status report on our implementation of an operating system for a reconfigurable computer.
  • Our comments:
Download

ReConfigME: A Detailed Implementation of an Operating System for Reconfigurable Computing

  • Authors: Grant Wigley, David Kearney and Mark Jasiunas
  • This paper appears in: Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International Issue Date : 25-29 April 2006 On page(s): 8 pp.
  • ISBN: 1-4244-0054-6
  • Year: 2006
  • Abstract: Reconfigurable computing applications have traditionally had the exclusive use of the field programmable gate array, primarily because the logic densities of the available devices have been relatively similar in size compared to the application. But with the modern FPGA expanding beyond 10 million system gates, and through the use of dynamic reconfiguration, it has become feasible for several applications to share a single high density device. However, developing applications that share a device is difficult as the current design flow assumes the exclusive use of the FPGA resources. As a consequence, the designer must ensure that resources have been allocated for all possible combinations of loaded applications at design time. If the sequence of application loading and unloading is not known in advance, all resource allocation cannot be performed at design time because the availability of resources changes dynamically. In this paper, we present an implementation of an operating system that has the ability to share its FPGA resources dynamically among multiple executing applications.
  • Our comments:
Download

Operating Systems for FPGA Based Computers and Their Memory Management

  • Authors: Klaus Danne
  • Year: 2004
  • Abstract: We introduce the concept of an operating system for platforms that consist beside memory and peripheral devices of FPGAs as the only computational resource. Applications can be developed independent from each other and due to device drivers with little dependency on the platform. The OS supports the multitasking execution of applications using static as well as dynamic resource assignment. A main focus of the paper is the management of the resource memory. Memory management as part of the OS is introduced which allows multiple tasks to access the same memory banks using virtual addressing and dynamic memory allocation. Access conflicts are solved by a priority based scheduling. Since no microprocessor is part of the system, the entire OS including its memory management is executed on the FPGAs.
  • Our comments:
Download

An Adaptable Task Manager for Reconfigurable Architecture Kernels

  • Authors: Yuriy Shiyanovskii, Francis Wolff, Chris Papachristou, Dan Weyer
  • This paper appears in: Adaptive Hardware and Systems, 2009. AHS 2009. NASA/ESA Conference on
  • Year: 2009
  • Abstract: Self-reconfigurable hardware is a new emerging technology which will enable adaptation of computing systems to changing environments.This paper deals with the design of architecture kernels for an autonomous on-board system and the development of an adaptation manager for real-time scheduling of the reconfigurable hardware fabric.Our approach employs a reconfigurable computer architecture with two key layers: the adaptation manager and the real time configuration kernel. This provides significant advantages in terms of flexibility, scalability, cost, and compatibility with embedded technology. Some preliminary results are presented.
  • Our comments:
Download

VEBoC: Variation and Error-Aware Design for Billions of Devices on a Chip

  • Authors: Shoaib Akram, Scott Cromar, Gregory Lucas, Alexandros Papakonstantinou, Deming Chen
  • This paper appears in: Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific
  • Year: 2008
  • ISBN: 978-1-4244-1921-0
  • Description: Zalaczony artykul przedstawia cos podobnego do tego, co chcemy zrealizowac, choc troche innego w koncepcji - application-specific multicores.
  • Abstract: Billions of devices on a chip is around the corner and the trend of deep submicron (DSM) technology scaling will continue for at least another decade. Meanwhile, designers also face severe on-chip parameter variations, soft/hard errors, and high leakage power. How to use these billions of devices to deliver power-efficient, high-performance, and yet error-resilient computation is a challenging task. In this paper, we attempt to demonstrate some of our perspectives to address these critical issues. We elaborate on variation-aware synthesis, holistic error modeling, reliable multicore, and synthesis for application-specific multicore. We also present some of our insights for future reliable computing.
  • Our comments:
Download

A Unified Hardware/Software Runtime Environment for FPGABased Reconfigurable Computers using BORPH

  • Authors: Hayden KwokHay So, Artem Tkachenko and Robert Brodersen
  • This paper appears in: Hardware/Software Codesign and System Synthesis, 2006. CODES+ISSS '06. Proceedings of the 4th International Conference
  • Year: 2006
  • Abstract: This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's). By providing native kernel support for FPGA hardware, BORPH offers a homogeneous UNIX interface for both software and hardware processes. Hardware processes inherit the same level of service from the kernel, such as file system support, as typical UNIX software processes. Hardware and software components of a design therefore run as hardware and software processes within BORPH's run-time environment. The familiar and language independent UNIX kernel interface facilitates easy design reuse and rapid application development. Performance of our current implementation and our experience with developing a real-time wireless digital signal processing system based on BORPH will be presented.
  • Our comments:
Download

Generating the Communications Infrastructure for Module-based Dynamic Reconfiguration of FPGAs

  • Authors: Shannon Koh
  • Year: 2003
  • File: download
  • Abstract: I would like to thank my supervisor, Dr. Oliver Diessel, for his unwavering support in this project. His supervision was exemplary, he made the entire experience of getting a PhD rich and full, and he encouraged me to think critically in ways I would not have previously imagined. I would also like to thank my co-supervisor Prof. Sri Parameswaran for his excellent insights. I would like to thank my wife, Molly Hu, for all her support and understanding throughout the pursuit of my degree, especially at the most critical moments. I would also like to thank my mother who encouraged me to pursue my PhD and supported me throughout. I would like to thank all of my fellow PhD students on the 5th floor in the Architecture Group, especially Jorgen Peddersen, who, not only being the best friend one might have, also encouraged me to think. The rest of my fellow students Jeremy Chan, Krutartha Patel, Anjelo Ambrose, Carol He, Michael Chong all made my research experience in UNSW Sydney the best anyone could have. Last but not least I would like to thank the Australian Government for the Australian Postgraduate
  • Our comments:
Download

The MOLEN Polymorphic Processor

  • S. Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, E. Moscu Panainte
  • This paper appears in: Computers, IEEE Transactions on
  • Year: 2004
  • ISSN : 0018-9340
  • Abstract: In this paper, we present a polymorphic processor paradigm incorporating both general-purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program: In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions, is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.
  • Our comments:
Download

Custom Processor Design Using NISC: A Case-Study on DCT algorithm

  • Authors: Bita Gorjiara, Daniel Gajski
  • This paper appears in: Embedded Systems for Real-Time Multimedia, 2005. 3rd Workshop on
  • Year: 2005
  • Abstract: Designing application-specific instruction-set processors (ASIPs) usually requires designing a custom datapath, and modifying instruction-set, instruction decoder, and compiler. A new alternative to ASIPs is no-instruction-set-computers (NISCs) that eliminate the instruction abstraction by compiling programs directly to a given datapath. The compiler analyzes the datapath and extracts possible operations and data flows. The NISC approach simplifies and accelerates the task of custom processor design. In this paper, we present a case-study of designing a custom datapath for a 2D DCT algorithm. We applied several optimization techniques such as software transformations, operation chaining, datapath pipelining, controller pipelining, and functional unit customization to improve the quality of the design. Most of the techniques are general and can be applied to other applications. The result of synthesizing our final custom datapath on a Xilinx FPGA shows 7.14 times performance improvement, 1.64 times power reduction, 12.5 times energy savings, and more than 3 times area reduction compared to a softcore MIPS implementation.
  • Our comments: No-Instruction-Set-Computers (NISCs) that eliminate the instruction abstraction by compiling programs directly to a given datapath.
Download

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

  • Authors: Jason Merrill
  • Year: 2003
  • Abstract: The tree SSA project requires a tree representation of functions for the optimizers to operate on. There was an existing functions-as-trees representation shared by the C and C++ front ends, and another used by the Java front end, but neither was adequate for use in optimization. In this paper, we will discuss the design of GENERIC, the new language-independent tree representation, and GIMPLE, the reduced subset used during optimization.
  • Our comments:
Download

An empirical comparison of ANSI-C to VHDL compilers:SPARK, ROCCC and DWARV

  • Authors: Arcilio J. Virginia, Yana D. Yankova, Koen L.M. Bertels
  • Year: 2007
  • Abstract: Custom computing machines combine the flexibility of the general purpose processor with the high performance of the application tailored hardware. This combination results in a computing platform that allows for performance improvements to a wide range of applications. Nevertheless in order to develop applications for such a platform, a software designer currently needs also hardware design skills. In order to hide the hardware details from the designers and to assist in the non-trivial development process sevreral tools that translate ANSI-C to VHDL are proposed. In this paper we compare three such tools, SPARK, ROCCC and DWARV. The comparison is based on the performance of designs generated by the compilers and the experiences gained during the acquisition of empirical data.
  • Our comments:
Download

Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths

  • Authors: Mehrdad Reshadi, Bita Gorjiara, Daniel Gajski
  • This paper appears in: Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. Proceedings. 2005 IEEE International Conference on
  • Year: 2005
  • ISBN: 0-7695-2451-6
  • Abstract: Performance of programs can be improved by utilizing their horizontal and vertical parallelism. In some processors (VLIW based), compiler can utilize horizontal parallelism by controlling the schedule of independent operations. Vertical parallelism is utilized through pipelining. However, in all processors, structure of pipeline is fixed and compiler has no control over it. In application-specific-instruction set-processors (ASIPs), pipeline structure can be customized and utilized in the program through custom instructions. Practical constraints on the instruction decoder limit the number and complexity of custom instructions in ASIPs. Detecting the frequent and beneficial custom instructions and incorporating them in the compiler are complex and sometimes very time consuming tasks. In this paper, we present an architecture that does not limit the number of custom functionalities that can be implemented on its datapath. Instead of using custom instructions and then relying on the decoder in hardware to generate the control signals, we generate the control signal values in compiler. Since there are no predefined instructions in this architecture, we call it no-instruction-set-computer (NISC). The NISC compiler maps the application directly on the datapath. It has complete fine grain control over datapath and hence can very well utilize resources in the hardware as well as horizontal and vertical parallelism in the program. We also explain the algorithm for mapping the CDFG of a program on a given datapath in NISC. Using our algorithm and a NISC architecture with the datapath of a MIPS, we achieved up to 70% speedup over the traditional MIPS compiler. In another experiment, we started from a base architecture and customized it by adding resources and interconnect to increase its horizontal and vertical parallelism. The algorithm achieved up to 15.5 times speedup by utilizing the available parallelism in the program and the datapath.
  • Our comments:
Download

FPGA-friendly Code Compression for Horizontal Microcoded Custom IPs

  • Authors: Bita Gorjiara, Daniel Gajski
  • This paper appears in: International Symposium on Field Programmable Gate Arrays Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
  • Year: 2007
  • ISBN: 978-1-59593-600-4
  • Abstract: Shrinking time-to-market and high demand for productivity has driven traditional hardware designers to use design methodologies that start from high-level languages. However, meeting timing constraints of automatically generated IPs is often a challenging and time-consuming task that must be repeated every time the specification is modified. To address this issue, a new generation of IP-design technologies that is capable of generating custom datapaths as well as programming an existing one is developed. These technologies are often based on Horizontal Microcoded Architectures. Large code size is a well-know problem in HMAs, and is referred to as "code bloating" problem.In this paper, we study the code size of one of the new HMA-based technologies called NISC. We show that NISC code size can be several times larger than a typical RISC processor, and we propose several low-overhead dictionary-based code compression techniques to reduce the code size. Our compression algorithm leverages the knowledge of "don't care" values in the control words to better compress the content of dictionary memories. Our experiments show that by selecting proper memory architectures the code size of NISC can be reduced by 70% (i.e. 3.3 times) at cost of only 9% performance degradation. We also show that some code compression techniques may increase number of utilized block RAMs in FPGA-based implementations. To address this issue, we propose combining dictionaries and implementing them using embedded dual-port memories.
  • Our comments:
Download

A Middleware Aided Robust and Fault Tolerant Dynamic Reconfigurable Architecture

  • Authors: Dadji Y., Osterloh B., Michalik H.
  • This paper appears in: Reconfigurable Mechanisms and Robots, 2009. ReMAR 2009. ASME/IFToMM International Conference on
  • Year: 2009
  • ISBN: 978-88-89007-37-2
  • Abstract: Dynamic reconfiguration enhances embedded system with at run-time adaptive functionality and is an improvement in terms of resource utilization and system adaptability. SRAM-based FPGAs provides a dynamic reconfigurable platform with high logic density. The requirements for such an embedded high flexible system based on FPGAs are robustness and reliability to prevent operation interrupts or even system failures. The complexity of a dynamic reconfigurable system with adaptive processing module demands high effort for the user. Therefore a high level abstraction of the communication issues is required to support application development by an appropriate middleware. To achieve such a flexible embedded system we present our network-on-chip (NoC) approach system-on-chip wire (SoCWire) and outline its performance and suitability for robust dynamic reconfigurable systems. Furthermore we introduce a suitable embedded middleware concept to support the system reconfiguration and the software application development process.
  • Our comments:
Download

A review of high-level synthesis for dynamically reconfigurable FPGAs

  • Authors: Xuejie Zhanga and Kam W. Nga
  • This paper appears in: Microprocessors and Microsystems Volume 24, Issue 4, 1 August 2000, Pages 199-211
  • Year: 2000
  • ISSN: 0141-9331
  • Abstract: Dynamically Reconfigurable Field Programmable Gate Arrays (DR FPGAs) change many of the basic assumptions of what hardware is. DR FPGA-based dynamically reconfigurable computing has become a powerful methodology for achieving high performance while minimizing the resource required in the implementation of many applications. The key to harnessing the power of DR FPGAs for most applications is to develop high-level synthesis tools for transforming automatically an algorithmic level behavioral specification into DR FPGA configurations. In this paper we survey the current state-of-the-art in high-level synthesis techniques for dynamically reconfigurable systems. The differences in high-level synthesis technology between classical systems and dynamically reconfigurable systems are discussed. Then, we describe the basic tasks in the high-level synthesis of dynamically reconfigurable systems. Finally, techniques that have been developed in the past few years for the high-level synthesis of dynamically reconfigurable systems are presented.
  • Our comments:
Download

Static and Dynamic Reconfigurable Designs for a 2D Shape-Adaptive DCT

  • Authors: Gause, Jorn and Cheung, Peter Y. K. and Luk, Wayne
  • This paper appears in: Lecture Notes In Computer Science; Vol. 1896 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications table of contents
  • Year: 2000
  • ISSN: 3-540-67899-9
  • Abstract: This paper presents two reconfigurable design approaches for a two dimensional Shape-Adaptive Discrete Cosine Transform (2D SA-DCT). The SA-DCT is an example of a new type of multimedia video processing algorithm where the computations performed are data dependent. A static design, where the configuration does not change during execution of the task, is presented. The use of a data dependence graph (DDG) is proposed which represents the computations and input signals required to calculate a particular output signal depending on a variable input parameter. By re-structuring the DDG and exploiting possible sharing of FPGA resources for different entities within the SA-DCT, it is demonstrated that the area required for an implementation can be significantly reduced. An alternative dynamic approach is also introduced where the FPGA’s configuration may change over time. This is well suited to using dynamically reconfigurable logic but suffers from long reconfiguration time if current FPGAs are used.
  • Our comments:
Download

Runtime Filesystem Support for Reconfigurable FPGA Hardware Processes in BORPH

  • Authors: Hayden Kwok-Hay So and Robert Brodersen
  • This paper appears in: Field-Programmable Custom Computing Machines, 2008. FCCM '08. 16th International Symposium on
  • Year: 2008
  • ISBN: 978-0-7695-3307-0
  • Abstract: This paper presents the design of BORPH's file system layer for FPGA-based reconfigurable computers. BORPH provides user FPGA designs that execute as hardware processes access to the general file system using familiar UNIX file I/O semantics. Such capability provides FPGA designers an intuitive interface not only for regular file I/O, but also for representing streaming hardware/software and hardware/hardware communication using UNIX pipes. Design trade-offs among system manageability, user usability and application performance are explored. A case of mixed hardware/software video processing is presented as a proof-of-concept.
  • Our comments:
Download

Optimization of Dynamic Hardware Reconfigurations

  • Authors: Jürgen Teich and Sándor P. Fekete and Jörg Schepers
  • This paper appears in: The Journal of Supercomputing Volume 19, Number 1 / May, 2001
  • Year: 2001
  • ISBN: 0920-8542 (Print) 1573-0484 (Online)
  • Abstract: Recent generations of Field Programmable Gate Arrays (FPGA) allow the dynamic reconfiguration of cells on the chip during run-time. For a given problem consisting of a set of tasks with computation requirements modeled by rectangles of cells, several optimization problems such as finding the array of minimal size to accomplish the tasks within a given time limit are considered. Existing approaches based on ILP formulations to solve these problems as multi-dimensional packing problems turn out not to be applicable for problem sizes of interest. Here, a breakthrough is achieved in solving these problems to optimality by using the new notion of packing classes. It allows a significant reduction of the search space such that problems of the above type may be solved exactly using a special branch-and-bound technique. We validate the usefulness of our method by providing computational results.
  • Our comments:
Download

Optimization of Dynamic Hardware Reconfigurations

  • Authors: Jürgen Teich and Sándor P. Fekete and Jörg Schepers
  • This paper appears in: The Journal of Supercomputing Volume 19, Number 1 / May, 2001
  • Year: 2001
  • ISBN: 0920-8542 (Print) 1573-0484 (Online)
  • Abstract: Recent generations of Field Programmable Gate Arrays (FPGA) allow the dynamic reconfiguration of cells on the chip during run-time. For a given problem consisting of a set of tasks with computation requirements modeled by rectangles of cells, several optimization problems such as finding the array of minimal size to accomplish the tasks within a given time limit are considered. Existing approaches based on ILP formulations to solve these problems as multi-dimensional packing problems turn out not to be applicable for problem sizes of interest. Here, a breakthrough is achieved in solving these problems to optimality by using the new notion of packing classes. It allows a significant reduction of the search space such that problems of the above type may be solved exactly using a special branch-and-bound technique. We validate the usefulness of our method by providing computational results.
  • Our comments:
Download
Personal tools