High-Level Synthesis of Digital Systems
Written by Jalil   
Wednesday, 26 March 2008

ABSTRACT

This paper represents a report of an individual research project on the important aspects of High-Level (Behavioural) Synthesis of digital systems. The motivations behind automating the process of VLSI design are outlined and the challenges defined. A review of some of the significant works on the subtasks of behavioural synthesis is presented. A number of scheduling techniques are explained, and the interdependence between the HLS subtasks is stressed. The report also shows the importance of low power, reliability and testability considerations at high levels of abstraction in the design flow. The necessity of comparing and evaluating different high-level synthesis techniques is discussed.

1. INTRODUCTION

In the early days of digital systems design the silicon area was limited to a small number of transistors, and designers managed to manually map their systems directly into transistor level layout. The majority of those systems were relatively simple and the designer had the capability of exploring most of the different possible implementations in a limited time. However, technological advances dramatically reduced the size of transistors and thus increased their density in a single chip. This allowed integrated circuits to accommodate increasingly complex systems and paved the way to SoC implementations, therefore making the designer’s job increasingly difficult and time consuming.

The product design cycle and time-to-market is similarly if not more important than area or performance. The complexity of the development cycle and the pressure to meet market needs in the shortest possible times have lead to addressing the idea of automating the design process. As a result, the term ‘Silicon Compiler’ has gained increasing popularity as a research topic. We define a Silicon Compiler as an integrated system that translates the algorithmic or behavioural description of a system into the physical (silicon) layout representation. One of the significant early developments in silicon compilers is the ‘Bristle Blocks’ system by Dave Johannsen [12]. Although in his system parts of the chip had to be manually produced, the Bristle Blocks system managed to automatically generate data-path and control blocks from a high-level input description of the chip in a short time. New silicon compilers then emerged which produced complete circuit layouts automatically

Part of the silicon compilation process is the High-Level Synthesis, also known as the Behavioural Synthesis which is the process of transforming a behavioural description of a digital system into a Register Transfer Level (RTL) representation. The input representation to such a process describes the behaviour of a digital system in terms of variables, operations and conditional statements and may be written in any high level programming language (like C or C++) or any hardware description language (like VHDL or Verilog). The RTL representation describes a digital system in terms of functional units, storage units, and interconnection structures and the flow of signals between the components. This description is then transformed into a logic representation and into physical layout using RTL synthesis tools and Place-&-Route tools.

The aim of this paper is to cover the significant developments in high level synthesis during the past three decades and sum up the different approaches and techniques that were proposed. The rest of the paper is organized as follows. Section 2 defines the basic building blocks of a high-level synthesis system and discusses the different approaches undertaken to arrive at an optimal solution. Section 3 discusses the design for low power techniques at the behavioural level, whereas Section 4 outlines the importance of circuits testability and how it can be achieved at the high-level of abstraction. Section 5 looks at the current status of benchmarks used to evaluate or compare synthesis techniques and Section 6 identifies the present trends of digital systems synthesis before concluding in Section 7.

2. HIGH-LEVEL SYNTHESIS

A multitude of high-level synthesis systems have been developed in different universities and as part of research projects in industrial companies. Early synthesis systems were each centred round a specific application, be it digital signal processors, instruction set microprocessors or digital control systems. At one hand, this resulted in the diversity of the proposed methods and approaches to high-level synthesis. But in the other hand, it made it difficult to compare those methods and assess their effectiveness, since there was no single system that covers all the general characteristics of digital systems.

Different applications require different sets of constraints and requirements. High-end microprocessors for example require high performance and high reliability, whereas lowend products such as mobile phones require low area, low cost and low power consumption but not necessarily high speeds. This means that in addition to the behavioural description, the user needs to input some constraints to the synthesis system. The synthesis system would then shape the RTL structure towards meeting those constraints.

2.1 High-Level Synthesis design flow

Most of the high-level synthesis systems that were proposed partitioned the synthesis task into three very close and interdependent subtasks: Scheduling, Resource Allocation and Module Binding (see Figure 1). The starting point however, is always an input description written using one of the high-level programming languages or a Hardware Description Language (HDL). This description represents the behaviour of a system as a set of variables, operations and conditions. Since this description is tailored for human readability, no scheduling, allocation or binding can proceed before it is parsed into a machine friendly intermediate representation. The majority of synthesis systems use a graphbased internal representation derived from the flow of data in the description.

The Data Flow Graph (DFG) captures the movement of data derived from assignment statements where the vertices represent operations and the edges represent the data dependencies. As an example, Figure 2 shows a high-level description of a simple system and the corresponding data flow graph. The Control Graph captures the control portions of the description such as conditional branches and loops. Many systems have adopted a different name for the combination of the two graphs. The Control and Data Flow Graph (CDFG) is used in the HAL system [21], the Value Trace can be found in the CMUDA system [27] while others call it the Data Dependency Graph (DDG) or the Directed Acyclic Graph (DAC). IGR is a slightly different Internal Graph Representation used in the SCHOLAR silicon compiler [1], where nodes correspond to time steps instead and each node contains the operations to be executed in the same time step. Another intermediate representation was used in parallel controller synthesis called Petri Net which is a bipartite, weighted, directed graph which has two types of nodes called places and transitions [2]. Connections can only exist between a place and a transition, but never between two places. Petri Nets can be used to represent state machines. Figure 3 shows an example Petri Net representation where the circles correspond to places and bars correspond to transitions.

At this stage, a number of optimizations can be carried out such as: Dead code elimination, Constant propagation, Common sub-expression elimination, In-line expansion of procedures and Loop unrolling. These are common compilerlike optimizations of the internal representation in order to make it more suitable for direct translation into hardware.

Once the intermediate representation is prepared, the subtasks of scheduling, allocation and binding can proceed. Scheduling is the process where timing information (necessary for hardware implementation) is introduced to the data flow graph in the intermediate representation giving an overall order of operations execution. This is done by mapping the operations (vertices) onto time steps or control steps as commonly known. These control steps can be viewed as clock cycles. Resource Allocation is the process of determining the number and type of resources that will be assigned to the operations of the behaviour. This is not to be confused with the Module Binding task, which is the process of actual mapping of each hardware component to
one or more operations.

There is a strong link between the three subtasks and the order in which they are carried out varies between different systems. Some of the traditional behavioural synthesizers start the scheduling process first and then the allocation of functional units and the binding of hardware cells. Other systems carry out the subtasks concurrently or in a different order. The difficulty of deciding the order in which these tasks should proceed is due to the close interrelation between them since a slight change in the scheduling process may dramatically impact the results of the allocation and those of module binding and vice-versa.

 

Comments (0)add comment

Write comment

busy
Last Updated ( Monday, 31 March 2008 )