**Transform-Based Lossless Image Compression Algorithm for Electron Beam Direct Write Lithography Systems**

Jeehong Yang1 and Serap A. Savari2 *1University of Michigan, Ann Arbor 2Texas A&M University USA* 

#### **1. Introduction**

94 Recent Advances in Nanofabrication Techniques and Applications

Yasuda, H.; Haraguchi, T. & Yamada, A. (2004). A proposal for an MCC (multi-column

cell with lotus root lens) system to be used as a mask-making e-beam tool. *Proceedings of SPIE*, pp. 911-921, ISBN 9780819455130, Monterery, CA, USA,

Williams, H. P. (1999). Model Building in Mathematical Programming

December 2004

Conventional photolithography systems use physical masks which are expensive and difficult to create and cannot be used forever. Electron Beam Direct Write (EBDW) lithography systems are a noteworthy alternative which do not need physical masks [Chokshi et al. (1999)]. As shown in Figure 1 they rely on an array of lithography writers to directly write a mask image on a photo-resist coated wafer using electron beams. EBDW systems are attractive for a few reasons: First, their flexibility is advantageous in processes requiring the rapid prototyping of chips. Second, they are known to reduce fabrication costs [Lin (2009)]. Third, they are well suited for Next-Generation Lithography (NGL) because they are able to produce circuits with smaller features than state-of-the-art photolithography systems. Finally, since the mask images are electronically controlled EBDW systems could be improved by software. Our focus here will be on this last point.

EBDW is not at this time used in many circuit fabrication processes because it is much slower than physical mask lithography systems. One current focus of research to address the throughput problem is massively-parallel electron beam lithography. Some of the research groups/companies which are developing such systems include KLA-Tencor [Petric et al. (2009)], IMS [Klein et al. (2009)], and MAPPER [Wieland et al. (2009)].

Chokshi et al. (1999) proposed a maskless lithography system using a bank of 80,000 lithography writers running in parallel at 24 MHz. Dai & Zakhor (2006) pointed out that this lithography system can achieve the conventional photolithography throughput of one wafer layer per minute, but layout image data is often several hundred terabits per wafer and therefore data delivery becomes an important issue. Dai & Zakhor (2006) proposed using a data delivery system with a lossless image compression component which is illustrated in Figure 2. They hold compressed layout images in storage disks and transmit the compressed data to the processor memory board. This kind of EBDW lithography system can achieve higher throughput if the decoder embedded within the lithography writer can sufficiently rapidly recover the original images from the compressed files.

Dai (2008) discussed two constraints on this type of system: 1) the compression ratio should be at least (Transfer rate of Decoder to Writer / Transfer rate of Memory to Decoder), and 2) the decoding algorithm has to be simple enough to be implemented as a small add-on within the maskless lithography writer. Therefore the decoder must operate with little memory.

• GDSII/OASIS Format

**Layout** 

making better quality circuits.

coding [Huffman (1952)].

compressed.

grid.

• Copied Struct. Actual feature

**Flatten** 

Algorithm for Electron Beam Direct Write Lithography Systems

• N-layered polygons/ lines

<sup>97</sup> Transform-Based Lossless Image Compression

Fig. 3. Preparing Layout Images from a Circuit Layout - Rasterizing Process

**Separate Layers** 

removing hierarchical structures by replacing all of the copied parts with actual features, 2) placing the circuit features such as polygons and lines into the correct layers of the circuit, and 3) rasterizing (see Figure 3). This conversion process often lasts hours or even days using a complex computer system with large memory and cannot be executed by the decoder chip. The final rasterizing step consists of two parts: a) it produces a binary image on a finer grid, and b) the binary image is processed in blocks to generate a gray-level image. In the second step the input binary string is partitioned into *m* × *m* pixel blocks. For each block the number of filled pixels are computed and normalized/quantized to the corresponding gray level. When this gray-level image is transmitted to the EBDW lithography system the lithography writer interprets the gray level (or pixel intensity) as an exposure dose which is controlled by exposing the corresponding region multiple times with an electron beam. Through this process the printed layout pattern becomes more robust to the electron beam proximity effect

Our approach is motivated by the compactness of the GDSII/OASIS format and uses corner representation. However, we bypass the complex flattening and rasterizing processes and instead work with a simple decoding process. Yang & Savari (2011) considered some of these ideas for binary images which handle the proximity correction by rasterizing the input binary image on a fine enough grid. Here we will extend these ideas to gray-level images on a coarser

Figure 4 summarizes the components of the compression algorithm. We begin by applying a corner transformation to the image like the one in *Corner2* [Yang & Savari (2011)]. However, unlike *Corner2* this transformation outputs two streams: a "corner stream" and an "intensity stream". The corner stream is a binary stream which locates the polygon corners1 and the intensity stream is a stream of pixel (corner/edge) intensities. Each stream is input to a separate entropy coding scheme which outputs a compressed bit stream. The corner stream is compressed using a combination of run length encoding [Golomb (1966)], end-of-block coding, and arithmetic coding [Moffat et al. (1998)]. The intensity stream is compressed using end-of-block coding and then compressed by LZ77 [Ziv & Lempel (1977)] and Huffman

In Section 2.2 we will first describe the corner transform process which outputs the corner stream and the intensity stream. In Section 2.3 we will describe the final entropy coding process of the corner stream, and in Section 2.4 we will describe how the intensity value is

<sup>1</sup> This is not actually a corner, but a horizontal/vertical transition point as explained in Subsection 2.2.

• Polygons/ Lines Image pixels

**Rasterize** 

• Multiple layered image

> **Layout Image**

Fig. 2. Data Delivery for an EBDW Lithography System introduced in Dai & Zakhor (2006)

Dai & Zakhor (2006) reported that the layout images of control logic circuits are often irregular while the layout images of memory cells frequently contain repeated patterns. Their first algorithm *C4* attempts to handle the varying characteristics of layout images by using context prediction and finding repeated regions within an image. Liu et al. (2007) later proposed *Block C4*, which significantly reduces the encoding complexity.

Based on the framework of Dai & Zakhor (2006) and Liu et al. (2007), Yang & Savari (2010) improved the compression algorithm via a corner-based representation of the Manhattan polygons. Their initial algorithm *Corner* achieves higher compression rates than *Block C4* on an irregular circuit. Yang & Savari (2011) recently proposed an improvement *Corner2* which simplifies the corner transformation to deal with the irregular parts of the layout images and also uses a frequent pattern replacement scheme to deal with the repeated parts. Their experimental results indicate that their approach is often more efficient than the context prediction method used in *C4* and *Block C4*.

In this paper we extend the work of Yang & Savari (2011) to gray-level images to better address the issue of handling proximity correction for EBDW systems and show that we obtain better compression performance and faster encoding/decoding than *C4* and *Block C4*. Hence our work can be used to solve the data delivery problem of EBDW lithography systems with smaller features. Moreover, since our decoding speed is faster than *C4* and *Block C4* we can improve the throughput of the EBDW lithography system.

### **2. The compression algorithm**

#### **2.1 Overview**

Layout image data is commonly cached in GDSII [Rubin (1987)] or OASIS [Chen et al. (2004)] formats. GDSII and OASIS describe circuit features such as polygons and lines by their corner points [see Rubin (1987) and Reich et al. (2003)]. GDSII and OASIS formatted data are far more compact than the uncompressed image of a circuit layer. Therefore GDSII and OASIS initially seem to be well-suited for this application, but the problem is that EBDW writers operate directly on pixel bit streams and GDSII and OASIS layout representations must therefore be converted into layout images before the process begins. The conversion process involves 1) 2 Lithography / Book 2

**10101101…** 

Fig. 2. Data Delivery for an EBDW Lithography System introduced in Dai & Zakhor (2006)

Dai & Zakhor (2006) reported that the layout images of control logic circuits are often irregular while the layout images of memory cells frequently contain repeated patterns. Their first algorithm *C4* attempts to handle the varying characteristics of layout images by using context prediction and finding repeated regions within an image. Liu et al. (2007) later proposed *Block*

Based on the framework of Dai & Zakhor (2006) and Liu et al. (2007), Yang & Savari (2010) improved the compression algorithm via a corner-based representation of the Manhattan polygons. Their initial algorithm *Corner* achieves higher compression rates than *Block C4* on an irregular circuit. Yang & Savari (2011) recently proposed an improvement *Corner2* which simplifies the corner transformation to deal with the irregular parts of the layout images and also uses a frequent pattern replacement scheme to deal with the repeated parts. Their experimental results indicate that their approach is often more efficient than the context

In this paper we extend the work of Yang & Savari (2011) to gray-level images to better address the issue of handling proximity correction for EBDW systems and show that we obtain better compression performance and faster encoding/decoding than *C4* and *Block C4*. Hence our work can be used to solve the data delivery problem of EBDW lithography systems with smaller features. Moreover, since our decoding speed is faster than *C4* and *Block C4* we can

Layout image data is commonly cached in GDSII [Rubin (1987)] or OASIS [Chen et al. (2004)] formats. GDSII and OASIS describe circuit features such as polygons and lines by their corner points [see Rubin (1987) and Reich et al. (2003)]. GDSII and OASIS formatted data are far more compact than the uncompressed image of a circuit layer. Therefore GDSII and OASIS initially seem to be well-suited for this application, but the problem is that EBDW writers operate directly on pixel bit streams and GDSII and OASIS layout representations must therefore be converted into layout images before the process begins. The conversion process involves 1)

**Circuit Layout Image** 

> Processor Board Memory

*C4*, which significantly reduces the encoding complexity.

improve the throughput of the EBDW lithography system.

prediction method used in *C4* and *Block C4*.

**2. The compression algorithm**

**2.1 Overview**

Fig. 1. EBDW Lithography

Storage Disks

Wafer

Decoder

Photoresist

**e-Beam Writer** 

Decoder Chip

Chip

Lithography Writer

Deco

**e W**

Fig. 3. Preparing Layout Images from a Circuit Layout - Rasterizing Process

removing hierarchical structures by replacing all of the copied parts with actual features, 2) placing the circuit features such as polygons and lines into the correct layers of the circuit, and 3) rasterizing (see Figure 3). This conversion process often lasts hours or even days using a complex computer system with large memory and cannot be executed by the decoder chip. The final rasterizing step consists of two parts: a) it produces a binary image on a finer grid, and b) the binary image is processed in blocks to generate a gray-level image. In the second step the input binary string is partitioned into *m* × *m* pixel blocks. For each block the number of filled pixels are computed and normalized/quantized to the corresponding gray level. When this gray-level image is transmitted to the EBDW lithography system the lithography writer interprets the gray level (or pixel intensity) as an exposure dose which is controlled by exposing the corresponding region multiple times with an electron beam. Through this process the printed layout pattern becomes more robust to the electron beam proximity effect

making better quality circuits.

Our approach is motivated by the compactness of the GDSII/OASIS format and uses corner representation. However, we bypass the complex flattening and rasterizing processes and instead work with a simple decoding process. Yang & Savari (2011) considered some of these ideas for binary images which handle the proximity correction by rasterizing the input binary image on a fine enough grid. Here we will extend these ideas to gray-level images on a coarser grid.

Figure 4 summarizes the components of the compression algorithm. We begin by applying a corner transformation to the image like the one in *Corner2* [Yang & Savari (2011)]. However, unlike *Corner2* this transformation outputs two streams: a "corner stream" and an "intensity stream". The corner stream is a binary stream which locates the polygon corners1 and the intensity stream is a stream of pixel (corner/edge) intensities. Each stream is input to a separate entropy coding scheme which outputs a compressed bit stream. The corner stream is compressed using a combination of run length encoding [Golomb (1966)], end-of-block coding, and arithmetic coding [Moffat et al. (1998)]. The intensity stream is compressed using end-of-block coding and then compressed by LZ77 [Ziv & Lempel (1977)] and Huffman coding [Huffman (1952)].

In Section 2.2 we will first describe the corner transform process which outputs the corner stream and the intensity stream. In Section 2.3 we will describe the final entropy coding process of the corner stream, and in Section 2.4 we will describe how the intensity value is compressed.

<sup>1</sup> This is not actually a corner, but a horizontal/vertical transition point as explained in Subsection 2.2.

(a) Original Image (b) Horizontal coding of (a) (c) Vertical coding of (b)

<sup>99</sup> Transform-Based Lossless Image Compression

(a) Original Image (b) Horizontal coding of (a) (c) Vertical coding of (b)

Yang & Savari (2011) more recently observed that a row (or a column) of the original binary layout image consists of alternating runs of 1s (fill) and runs of 0s (empty). Therefore it is more efficient to encode pixels where there are transitions from 0 to 1 (or 1 to 0) using symbol "1" and to encode the other places using symbol "0." Observe, as in Figure 6(b), that after applying this encoding in the horizontal direction to a collection of Manhattan polygons that the output consists of alternating runs of 1s and 0s in the vertical direction. To increase the compression we repeat this encoding in the other direction to obtain the final corner image. In the binary corner transformation the final encoded image is binary and the "1"-pixels give information about the corners of the polygons. To describe the algorithm we begin with a two-step transformation process and then shorten it to a one-step procedure which requires less memory during the encoding process and is faster than the two-step transformation

The two-step transformation process begins with a horizontal encoding step in which we process each row from left to right. For each row, the encoder sets the (imaginary) pixel value to the left of the leftmost pixel to 0 (not filled). If the value of the current pixel differs from the preceding one we represent it with a "1" and otherwise with a "0." The second step inputs the intermediate encoded result to the vertical encoding process in which each column is processed from top to bottom. In the specification of the algorithms *x* denotes the column index [1, ··· , *C*] of the image and *y* represents the row index [1, ··· , *R*]. Algorithm 1 illustrates

Line 13 of Algorithm 1 constructs OUT(*x*, *y*) = 1 only if TEMP(*x*, *y*) �= TEMP(*x*, *y* − 1). That is, OUT(*x*, *y*) = 1 only if TEMP(*x*, *y*) = 1 and TEMP(*x*, *y* − 1) = 0 or if TEMP(*x*, *y*) = 0 and TEMP(*x*, *y* − 1) = 1. Since TEMP(*x*, *y*) = 1 only if IN(*x* − 1, *y*) �= IN(*x*, *y*) as in Line 5, we can shorten the corner transform process to Algorithm 2, which has no need for intermediate memory since pixel (*x*, *y*) is processed in terms of the input pixels (*x* − 1, *y*), (*x*, *y* − 1), and (*x* − 1, *y* − 1). Algorithm 2 is much faster than Algorithm 1. Finally, Figure 7 illustrates how

Fig. 6. 2-Symbol Corner Transformation

Algorithm for Electron Beam Direct Write Lithography Systems

Fig. 7. Handling width-1 lines

process.

this process.

the transformation handles width-1 lines.

Fig. 4. Compression Process Overview


Fig. 5. Required decoder memory (red) to reconstruct a line (blue) from (*x*1, *y*1) to (*x*2, *y*2).

#### **2.2 Corner transformation**

The GDSII/OASIS representation of a structurally flattened single layer describes the layout polygons by their corner points. This representation requires large decoder memory since the decoder needs to access a memory block of size (|*x*<sup>1</sup> − *x*2| + 1) × (|*y*<sup>1</sup> − *y*2| + 1) for the encoder to connect an arbitrary pair of points (*x*1, *y*1) and (*x*2, *y*2) as in Figure 5. Therefore this representation is infeasible for our application.

However, the rasterizing process becomes much less complex if the angle of a contour line is constrained to a small set. Yang & Savari (2010) took advantage of horizontal and vertical contour lines and decomposed an arbitrary polygon into a collection of Manhattan polygons, i.e., polygons with right angle corners. This approach is effective because most components of circuit layouts are produced using CAD tools which design the circuit in a rectilinear space, and the non-Manhattan parts can also be described by Manhattan components.

In this framework, the decoder scans the image in raster order, i.e., each row in order from left to right. When the decoder processes a corner it must determine whether it should reconstruct a horizontal and/or a vertical line. Observe that a corner is either the beginning of a line going to the right and/or down or the end of a line. Yang & Savari (2010) assigned each pixel one of five possible values – 'not corner,' 'right,' 'right and down,' 'down,' and 'stop.'

Fig. 7. Handling width-1 lines

4 Lithography / Book 2

Corner

Stream

Stream Separator

Intensity Stream

EOB+LZ+HUFF

**Compressed Intensity Stream** 

(x2,y2)

**Input: Layer Image** 

Corner Transformation

RLE+EOB+AC

**Compressed Corner Stream** 

(x1,y1)

and the non-Manhattan parts can also be described by Manhattan components.

Fig. 5. Required decoder memory (red) to reconstruct a line (blue) from (*x*1, *y*1) to (*x*2, *y*2).

The GDSII/OASIS representation of a structurally flattened single layer describes the layout polygons by their corner points. This representation requires large decoder memory since the decoder needs to access a memory block of size (|*x*<sup>1</sup> − *x*2| + 1) × (|*y*<sup>1</sup> − *y*2| + 1) for the encoder to connect an arbitrary pair of points (*x*1, *y*1) and (*x*2, *y*2) as in Figure 5. Therefore

However, the rasterizing process becomes much less complex if the angle of a contour line is constrained to a small set. Yang & Savari (2010) took advantage of horizontal and vertical contour lines and decomposed an arbitrary polygon into a collection of Manhattan polygons, i.e., polygons with right angle corners. This approach is effective because most components of circuit layouts are produced using CAD tools which design the circuit in a rectilinear space,

In this framework, the decoder scans the image in raster order, i.e., each row in order from left to right. When the decoder processes a corner it must determine whether it should reconstruct a horizontal and/or a vertical line. Observe that a corner is either the beginning of a line going to the right and/or down or the end of a line. Yang & Savari (2010) assigned each pixel one of five possible values – 'not corner,' 'right,' 'right and down,' 'down,' and 'stop.'

Fig. 4. Compression Process Overview

**2.2 Corner transformation**

this representation is infeasible for our application.

Yang & Savari (2011) more recently observed that a row (or a column) of the original binary layout image consists of alternating runs of 1s (fill) and runs of 0s (empty). Therefore it is more efficient to encode pixels where there are transitions from 0 to 1 (or 1 to 0) using symbol "1" and to encode the other places using symbol "0." Observe, as in Figure 6(b), that after applying this encoding in the horizontal direction to a collection of Manhattan polygons that the output consists of alternating runs of 1s and 0s in the vertical direction. To increase the compression we repeat this encoding in the other direction to obtain the final corner image. In the binary corner transformation the final encoded image is binary and the "1"-pixels give information about the corners of the polygons. To describe the algorithm we begin with a two-step transformation process and then shorten it to a one-step procedure which requires less memory during the encoding process and is faster than the two-step transformation process.

The two-step transformation process begins with a horizontal encoding step in which we process each row from left to right. For each row, the encoder sets the (imaginary) pixel value to the left of the leftmost pixel to 0 (not filled). If the value of the current pixel differs from the preceding one we represent it with a "1" and otherwise with a "0." The second step inputs the intermediate encoded result to the vertical encoding process in which each column is processed from top to bottom. In the specification of the algorithms *x* denotes the column index [1, ··· , *C*] of the image and *y* represents the row index [1, ··· , *R*]. Algorithm 1 illustrates this process.

Line 13 of Algorithm 1 constructs OUT(*x*, *y*) = 1 only if TEMP(*x*, *y*) �= TEMP(*x*, *y* − 1). That is, OUT(*x*, *y*) = 1 only if TEMP(*x*, *y*) = 1 and TEMP(*x*, *y* − 1) = 0 or if TEMP(*x*, *y*) = 0 and TEMP(*x*, *y* − 1) = 1. Since TEMP(*x*, *y*) = 1 only if IN(*x* − 1, *y*) �= IN(*x*, *y*) as in Line 5, we can shorten the corner transform process to Algorithm 2, which has no need for intermediate memory since pixel (*x*, *y*) is processed in terms of the input pixels (*x* − 1, *y*), (*x*, *y* − 1), and (*x* − 1, *y* − 1). Algorithm 2 is much faster than Algorithm 1. Finally, Figure 7 illustrates how the transformation handles width-1 lines.

easily done by mapping all of the nonzero intensities to 1 (fill) and leaving the zero intensities

<sup>101</sup> Transform-Based Lossless Image Compression

The corner stream typically contains long runs of zeroes and is therefore well-suited to compression algorithms like run length encoding [Golomb (1966)] and end-of-block (EOB) coding. Because the corner transformed image is a sparse binary image, if read in raster order (as we read) the string would consists of ones and runs of zeroes. During the compression process, the transitional corners (ones) of the transformed image are written unchanged, but each run of zeroes is described by its run length via an *M*-ary representation which we next describe. Define the new symbols "2", "3", ··· , "M+1" to respectively represent the base-*M* symbols "0*M*", "1*M*", ··· , "(*M* − 1)*M*". For example, if the transformed stream was "1 00000 00000 1 00000 0000 1 00000 00000 000" and *M* = 3, then the encoding of the stream is "1 323 1 322 1 333" because the run length are 10 (=1013), 9 (=1003), and 13 (=1113), and 2/3/4 to

We find that the addition of EOB coding helps represent the corner stream more efficiently. When the polygons are aligned and start/end at the same rows of the image the resulting runs of zeroes could be longer than a multiple of the row width. Although this could be handled by choosing *M* sufficiently large the memory requirements for the encoding and decoding of the final *M*-ary representation via arithmetic coding [Moffat et al. (1998)] for further compression

We observe that it is effective to divide each line into *k* blocks of length *L*, and we define a new EOB symbol "X". If a run of zeroes appears at the end of a block we represent that run using an end-of-block symbol X instead of an *M*-ary representation. Hence the encoding for a line of zeroes is *k* X's instead of approximately log*M*(*kL*) symbols. For the previous example, if *M*=2, *k* = 5, and *L* = 7, then the transformed stream "1000000 0000100 0000000 1000000 0000000" is described as "1X 3221X X 1X X," where 2/3 (=02/12) is used for the binary representations of

We find that EOB coding results in long runs of "X"s and it is useful to employ an *N*-ary run length encoding to these runs. For the previous example, if *M* = *N* = 2, *k* = 5, and *L* = 7, then the next description of the string is "1 4 3221 5 1 5," where 2/3 (or 4/5) handles the binary

Finally, we compress the preceding stream using the version of arithmetic coding offered by Witten et al. (1987), and the decoder in this case requires four bytes per alphabet symbol. Since we used *M* + *N* + 1 symbols2, 4(*M* + *N* + 1) bytes were used for arithmetic decoding.

The corner stream contains no intensity information. Since we are applying row-by-row decompression (from left to right), the intensity values have to be given in that order. The intensity values that we require are for corner pixels and pixels on the edges. As we have mentioned earlier in Section 2.2, the pixels outside the polygons will have 0 intensity (empty)

To obtain better prediction we could apply linear prediction along the neighboring pixels as is done in *Block C4*. However, this approach requires the full information of the previous row which translates to decoder memory. Therefore we instead apply EOB encoding to the pixels corresponding to horizontal/vertical edges because the pixel intensity along an edge seldom

<sup>2</sup> *M* symbols are used for runs of zeroes, *N* symbols are used for runs of "X"s, and 1 is used for the

and pixels inside the polygon boundaries will have nLv − 1 intensity (fully filled).

requires a choice of M as small as possible in our restricted decoder memory setting.

(not fill) unchanged.

**2.3 Entropy coding - corner stream**

Algorithm for Electron Beam Direct Write Lithography Systems

respectively represent 03/13/23.

representation of runs of zeroes (or "X"s).

**2.4 Entropy coding - intensity stream**

transitional corners.

runs of zeroes.


**Input:** Binary layer image IN ∈ {0, 1}*C*·*<sup>R</sup>* **Output:** Corner image OUT ∈ {0, 1}*C*·*<sup>R</sup>* **Intermediate:** Temporary image TEMP ∈ {0, 1}*C*·*<sup>R</sup>* {**Horizontal Encoding**} 1: Initialize TEMP(*x*, *y*) = 0, ∀*x*, *y*. 2: **for** *y* = 1 **to** *R* **do** 3: **for** *x* = 1 **to** *C* **do** 4: **if** IN(*x*, *y*) �= IN(*x* − 1, *y*) **then** 5: TEMP(*x*, *y*) = 1. 6: **end if** 7: **end for** 8: **end for** {**Vertical Encoding**} 9: Initialize OUT(*x*, *y*) = 0, ∀*x*, *y*. 10: **for** *x* = 1 **to** *C* **do** 11: **for** *y* = 1 **to** *R* **do** 12: **if** TEMP(*x*, *y*) �= TEMP(*x*, *y* − 1) **then** 13: OUT(*x*, *y*) = 1. 14: **end if** 15: **end for** 16: **end for**

#### **Algorithm 2** Transformation : One-Step Algorithm

**Input:** Binary layer image IN ∈ {0, 1}*C*·*<sup>R</sup>* **Output:** Corner image OUT ∈ {0, 1}*C*·*<sup>R</sup>* 1: Initialize OUT(*x*, *y*) = 0, ∀*x*, *y*. 2: **for** *y* = 1 **to** *R* **do** 3: **for** *x* = 1 **to** *C* **do** 4: **if** IN(*x* − 1, *y* − 1) = IN(*x*, *y* − 1) and IN(*x* − 1, *y*) �= IN(*x*, *y*) **then** 5: OUT(*x*, *y*) = 1 6: **end if** 7: **if** IN(*x* − 1, *y* − 1) �= IN(*x*, *y* − 1) and IN(*x* − 1, *y*) = IN(*x*, *y*) **then** 8: OUT(*x*, *y*) = 1 9: **end if** 10: **end for** 11: **end for**

Based on the experimental success in Yang & Savari (2010) and Yang & Savari (2011) for binary layout images it is natural to expect that a combination of the corner transformation for the outline of gray-level polygons and a separate representation for the intensity stream would outperform *Block C4*. Note that nLv-level gray images for this application have pixel intensity 0 (empty) outside the polygon outline, nLv − 1 (fully filled) inside the polygon outline, and an element of (0,nLv) along the polygon outline. Therefore we need only consider intensities along polygon corners and edges. Finally, in order to obtain the polygon outline using the corner transformation, we first have to map the gray-level image to a binary image. This is

easily done by mapping all of the nonzero intensities to 1 (fill) and leaving the zero intensities (not fill) unchanged.

#### **2.3 Entropy coding - corner stream**

6 Lithography / Book 2

**Algorithm 1** Transformation : Two-Step Algorithm

**Intermediate:** Temporary image TEMP ∈ {0, 1}*C*·*<sup>R</sup>*

**Input:** Binary layer image IN ∈ {0, 1}*C*·*<sup>R</sup>* **Output:** Corner image OUT ∈ {0, 1}*C*·*<sup>R</sup>*

4: **if** IN(*x*, *y*) �= IN(*x* − 1, *y*) **then**

12: **if** TEMP(*x*, *y*) �= TEMP(*x*, *y* − 1) **then**

**Algorithm 2** Transformation : One-Step Algorithm

4: **if** IN(*x* − 1, *y* − 1) = IN(*x*, *y* − 1) and IN(*x* − 1, *y*) �= IN(*x*, *y*) **then**

7: **if** IN(*x* − 1, *y* − 1) �= IN(*x*, *y* − 1) and IN(*x* − 1, *y*) = IN(*x*, *y*) **then**

Based on the experimental success in Yang & Savari (2010) and Yang & Savari (2011) for binary layout images it is natural to expect that a combination of the corner transformation for the outline of gray-level polygons and a separate representation for the intensity stream would outperform *Block C4*. Note that nLv-level gray images for this application have pixel intensity 0 (empty) outside the polygon outline, nLv − 1 (fully filled) inside the polygon outline, and an element of (0,nLv) along the polygon outline. Therefore we need only consider intensities along polygon corners and edges. Finally, in order to obtain the polygon outline using the corner transformation, we first have to map the gray-level image to a binary image. This is

**Input:** Binary layer image IN ∈ {0, 1}*C*·*<sup>R</sup>* **Output:** Corner image OUT ∈ {0, 1}*C*·*<sup>R</sup>* 1: Initialize OUT(*x*, *y*) = 0, ∀*x*, *y*.

{**Horizontal Encoding**} 1: Initialize TEMP(*x*, *y*) = 0, ∀*x*, *y*.

5: TEMP(*x*, *y*) = 1.

{**Vertical Encoding**} 9: Initialize OUT(*x*, *y*) = 0, ∀*x*, *y*.

13: OUT(*x*, *y*) = 1.

2: **for** *y* = 1 **to** *R* **do** 3: **for** *x* = 1 **to** *C* **do**

6: **end if**

9: **end if** 10: **end for** 11: **end for**

5: OUT(*x*, *y*) = 1

8: OUT(*x*, *y*) = 1

10: **for** *x* = 1 **to** *C* **do** 11: **for** *y* = 1 **to** *R* **do**

14: **end if** 15: **end for** 16: **end for**

2: **for** *y* = 1 **to** *R* **do** 3: **for** *x* = 1 **to** *C* **do**

6: **end if** 7: **end for** 8: **end for**

The corner stream typically contains long runs of zeroes and is therefore well-suited to compression algorithms like run length encoding [Golomb (1966)] and end-of-block (EOB) coding. Because the corner transformed image is a sparse binary image, if read in raster order (as we read) the string would consists of ones and runs of zeroes. During the compression process, the transitional corners (ones) of the transformed image are written unchanged, but each run of zeroes is described by its run length via an *M*-ary representation which we next describe. Define the new symbols "2", "3", ··· , "M+1" to respectively represent the base-*M* symbols "0*M*", "1*M*", ··· , "(*M* − 1)*M*". For example, if the transformed stream was "1 00000 00000 1 00000 0000 1 00000 00000 000" and *M* = 3, then the encoding of the stream is "1 323 1 322 1 333" because the run length are 10 (=1013), 9 (=1003), and 13 (=1113), and 2/3/4 to respectively represent 03/13/23.

We find that the addition of EOB coding helps represent the corner stream more efficiently. When the polygons are aligned and start/end at the same rows of the image the resulting runs of zeroes could be longer than a multiple of the row width. Although this could be handled by choosing *M* sufficiently large the memory requirements for the encoding and decoding of the final *M*-ary representation via arithmetic coding [Moffat et al. (1998)] for further compression requires a choice of M as small as possible in our restricted decoder memory setting.

We observe that it is effective to divide each line into *k* blocks of length *L*, and we define a new EOB symbol "X". If a run of zeroes appears at the end of a block we represent that run using an end-of-block symbol X instead of an *M*-ary representation. Hence the encoding for a line of zeroes is *k* X's instead of approximately log*M*(*kL*) symbols. For the previous example, if *M*=2, *k* = 5, and *L* = 7, then the transformed stream "1000000 0000100 0000000 1000000 0000000" is described as "1X 3221X X 1X X," where 2/3 (=02/12) is used for the binary representations of runs of zeroes.

We find that EOB coding results in long runs of "X"s and it is useful to employ an *N*-ary run length encoding to these runs. For the previous example, if *M* = *N* = 2, *k* = 5, and *L* = 7, then the next description of the string is "1 4 3221 5 1 5," where 2/3 (or 4/5) handles the binary representation of runs of zeroes (or "X"s).

Finally, we compress the preceding stream using the version of arithmetic coding offered by Witten et al. (1987), and the decoder in this case requires four bytes per alphabet symbol. Since we used *M* + *N* + 1 symbols2, 4(*M* + *N* + 1) bytes were used for arithmetic decoding.

#### **2.4 Entropy coding - intensity stream**

The corner stream contains no intensity information. Since we are applying row-by-row decompression (from left to right), the intensity values have to be given in that order. The intensity values that we require are for corner pixels and pixels on the edges. As we have mentioned earlier in Section 2.2, the pixels outside the polygons will have 0 intensity (empty) and pixels inside the polygon boundaries will have nLv − 1 intensity (fully filled).

To obtain better prediction we could apply linear prediction along the neighboring pixels as is done in *Block C4*. However, this approach requires the full information of the previous row which translates to decoder memory. Therefore we instead apply EOB encoding to the pixels corresponding to horizontal/vertical edges because the pixel intensity along an edge seldom

<sup>2</sup> *M* symbols are used for runs of zeroes, *N* symbols are used for runs of "X"s, and 1 is used for the transitional corners.

symbol (Lines 18-20) and reset the intensity values for the following rows (Lines 21-23) so that they are not processed in Lines 27-29. Otherwise, write the intensity value as is and proceed (Lines 24-26). Finally, the remaining vertical edge pixel intensities are written in Lines

<sup>103</sup> Transform-Based Lossless Image Compression

After the entire intensity stream has been processed, compress the output stream using LZ77 and Huffman coding. The LZ77 algorithm by Ziv & Lempel (1977) compresses the stream by finding matches from the previously processed data. When a pattern is repeated within the search region, it could be encoded using a short codeword. Huffman coding is used at the end of LZ77 to represent the LZ77 stream more efficiently. The combination of LZ77 and Huffman coding is widely used in a number of compression algorithms such as *gzip*. We used *zlib* [zlib (2010)] to implement it. The compression rates depend on the size of the LZ77 search region and the dictionary for the Huffman code. Because of the decoder memory restrictions we chose an encoder needing only 2,048 bytes of memory for the dictionary. 2,048 bytes is slightly less than the memory used to describe an entire row of our benchmark circuit. However, since we were applying this only to the intensity stream we were able to match more rows than *Block*

The decoder consists of an intensity stream decoder and a corner stream decoder as in Figure 8. The intensity stream decoder is actually an entropy decoder which can be decomposed into a Huffman decoder and an LZ77 decoder. The corner stream decoder consists of an entropy decoder which consists of an arithmetic decoder, a run length decoder, an end-of-block decoder, and a corner transform decoder which reconstructs the polygons from the entropy decoder output. The corner transform decoder utilizes the output of corner stream entropy decoder to reconstruct the polygon outlines and uses the output of the intensity stream

The entire process works on a row-by-row fashion. Since each part of the decoding procedure (arithmetic decoding, run length decoding, end-of-block decoding, inverse corner transformation, LZ77 decoding, and Huffman decoding) is simple and works with restricted decoder memory, the entire decoder can be implemented in hardware. Note that the most complex part will be the arithmetic decoder which is widely implemented in microcircuits [Peon et al. (1997)], and the other parts are comprised of simple branch, copy,

Decompressing the intensity stream is straightforward. We apply LZ77 and Huffman decoding to obtain the -coded intensity stream. As we have mentioned in Section 2.4, the decoder requires 2,048 bytes of memory to decode the LZ77 and Huffman codes. The -coded intensity stream is passed on to the corner transform decoder for the final reconstruction. Note that the decoder does not decompress the entire compressed intensity stream at once but rather decompresses some number of -coded intensity symbols at the request of the corner transform decoder. The detailed decompression of the -coded intensity stream will

As we have mentioned earlier, the corner stream decoder consists of an entropy decoder and a corner transform decoder. The entropy decoder reverses the procedure of the entropy encoder of Section 2.3. It first reconstructs the run length and end-of-block encoded stream using the

27-29.

*C4*.

**3. Decoder**

decoder to reconstruct the polygon pixel intensity.

Algorithm for Electron Beam Direct Write Lithography Systems

be discussed at the end of the next subsection.

**3.2 Corner stream decoder - corner transform decoder**

**3.1 Intensity stream decoder**

and computation operations as we will see in the following subsection.

changes unless oblique lines are used. We encode the intensity stream as in Algorithm 3. Note that in the algorithm *ρ* is the length of the intensity stream which is determined at the end of the encoding process.

**Algorithm 3** Intensity Stream Encoding

```
Input: Gray layer image IN ∈ {0, ··· , nLv − 1}C·R
Input: Binary layer image BIN ∈ {0, 1}C·R
Output: Intensity stream OUT ∈ {0, ··· nLv − 1}ρ
 1: Initialize ρ = 0.
 2: for y = 1 to R do
 3: for x = 1 to C do
 4: if (x, y) is a corner pixel then
 5: ρ = ρ + 1, OUT(ρ) = IN(x, y).
 6: else if (x, y) is a horizontal edge pixel with corners at (x − 1, y) and (x + α, y) then
 7: if IN(i, y) has the same value for i ∈ [x, x + α) then
 8: ρ = ρ + 1, OUT(ρ) = IN(x, y).
 9: ρ = ρ + 1, OUT(ρ) = �.
10: x = x + α − 1.
11: else
12: for i = x to x + α − 1 do
13: ρ = ρ + 1, OUT(ρ) = IN(i, y)
14: end for
15: x = i.
16: end if
17: else if (x, y) is a vertical edge pixel with corners at (x, y − 1) and (x, y + β) then
18: if IN(x, j) has the same value for j ∈ [y, y + β) then
19: ρ = ρ + 1, OUT(ρ) = IN(x, y).
20: ρ = ρ + 1, OUT(ρ) = �.
21: for j = y to y + β − 1 do
22: IN(x, j) = 0
23: end for
24: else
25: ρ = ρ + 1, OUT(ρ) = IN(x, y)
26: end if
27: else if IN(x, y) > 0 and (x, y) is a vertical edge pixel then
28: ρ = ρ + 1, OUT(ρ) = IN(x, y).
29: end if
30: end for
31: end for
```
If the current pixel corresponds to a corner (Lines 4-5), the intensity is represented as is. If the current pixel corresponds to a horizontal edge pixel (Lines 6-16) which starts from the left pixel, check the run of that intensity. If the horizontal edge pixel has constant pixel intensity throughout the entire edge, represent the intensity value followed by an end symbol *�* and skip to the ending corner pixel (Lines 7-10). Otherwise, write the entire edge intensity as is (Lines 11-15). Similarly, if the current pixel corresponds to a vertical edge (Lines 17-29) which starts from the upper pixel determine whether or not the pixel intensities are fixed throughout the vertical edge. If they are constant then represent the intensity value followed by the end symbol (Lines 18-20) and reset the intensity values for the following rows (Lines 21-23) so that they are not processed in Lines 27-29. Otherwise, write the intensity value as is and proceed (Lines 24-26). Finally, the remaining vertical edge pixel intensities are written in Lines 27-29.

After the entire intensity stream has been processed, compress the output stream using LZ77 and Huffman coding. The LZ77 algorithm by Ziv & Lempel (1977) compresses the stream by finding matches from the previously processed data. When a pattern is repeated within the search region, it could be encoded using a short codeword. Huffman coding is used at the end of LZ77 to represent the LZ77 stream more efficiently. The combination of LZ77 and Huffman coding is widely used in a number of compression algorithms such as *gzip*. We used *zlib* [zlib (2010)] to implement it. The compression rates depend on the size of the LZ77 search region and the dictionary for the Huffman code. Because of the decoder memory restrictions we chose an encoder needing only 2,048 bytes of memory for the dictionary. 2,048 bytes is slightly less than the memory used to describe an entire row of our benchmark circuit. However, since we were applying this only to the intensity stream we were able to match more rows than *Block C4*.
