Design of Storage Optimization Scheme for Bit Plane Coding in JPEG2000

1 Introduction

The two core modules of JPEG2000 (see Figure 1), wavelet transform and EBCOT [2] (embedded block coding based on optimized truncation points) are computationally expensive and occupy more than half of the entire encoder processing time. Therefore, it is necessary to study a reasonable implementation. Generally speaking, the software implementation is relatively simple, such as JPEG2000 reference code jasper[3], but the real-time processing capability is poor, even if the embedded system is used, such as using DSP. Or general-purpose processors such as ARM are basically implemented by means of software, and the speed is not greatly improved. It is necessary to design an efficient hardware structural unit for the characteristics of the block coding itself. Only in this way can JPEG2000 play its role in real-time processing applications.

JPEG2000 encoder block diagram

2. Storage optimization implementation

The embedded block coding of JPEG2000 is based on bit-plane coding. The object is a relatively small code block composed of frequency-domain coefficients after wavelet transform. The size is generally 32 & TImes; 32 or 64 & TImes; The coefficients in the code block contain symbol information and amplitude information for a number of different weights. The idea of ​​bit-plane coding is to encode the most important information first, that is, the amplitude information with larger weights is first encoded, so that the final code stream can be progressively transmitted with the subsequent code stream organization (see Figure 1). characteristic.

According to the standard [1], in addition to the symbol and amplitude information used in the encoding process, significant information of each bit is required to refine the information and access information. Therefore, for a 32 & TImes; 32 code block, when encoding a bit plane, a total of 5 & TImes; 1024 bits of information needs to be stored. In addition, since the encoding is in units of 4 bits in one column, each piece of encoded information is usually stored in a form of 256 x 4 (see Table 1).

But in fact this storage structure is inefficient because, according to the standard [1], encoding a bit consists of two steps, namely determining the channel assignment and encoding the original operation. In these two steps, the information that needs to be accessed includes the significant, symbol, amplitude, refinement, and access information of the current bit, as well as the significant and symbolic information of the 8 bits around the current bit. For the column-based encoding method, if the information is stored according to the above scheme, that is, the symbol and the salient information are stored with a word length of 4, then each time a column is encoded, the previous encoding band needs to be read (usually every 4 rows is called A code band), a current code band, and a subsequent code band have a total of 12 significant and symbol information, but in fact only 6 of them are useful, and the remaining 6 bits are redundant information. Since the encoding is based on bit operations, the storage area is frequently accessed. Each time a column is encoded, the corresponding information bits must be read. After completing a column of data, the corresponding encoded information is written back to the storage area again to update. The purpose of encoding information. It can be seen that the scheme of taking the above 4-bit word length is very inefficient.

Therefore, this paper designs a relatively reasonable storage scheme, that is, adding a row of all 0 data (this is for the significant and symbol plane) on the top row and the bottom row of the code block, forming a 34×32 block. Then act in a group of two, and in a staggered way, that is, A, B, C, B, A,. . . , C, B, A, distribute the information to three storage areas MEMA, MEMB and MEMC (see Table 2).

In addition, when writing data from the significant and symbol information buffer to the corresponding register (6 × 3 bits), it is also switched according to the corresponding encoding band. For the odd encoding band (assuming the first encoding band is zero), the order is ABC. For even coding bands, the order is CBA (see Table 3). Moreover, as can be seen from Table 3, the change of the address signals for accessing MEMA, MEMB, and MEMC is also different, wherein the address of the access MEMB increases sequentially, and the address of the MEMA transitions from the odd coded band to the even coded band. The process remains the same, growing from the even coding band to the odd coding band, and the case for MEMC is exactly the opposite of MEMA.

Therefore, the corresponding control circuit and address generation circuit must be designed to match this storage scheme.

3. Hardware architecture

According to the above analysis, the hardware architecture of the corresponding bit plane encoder is proposed. As shown in FIG. 2, the architecture is for 32×32 code blocks.

Hardware architecture of memory optimized bit plane encoder

figure 2. Hardware architecture of bit plane encoder based on memory optimization scheme

The bit plane encoder in FIG. 2 mainly includes several parts, namely, an internal buffer, a register group, an address generation module, a judgment channel attribution module, a code original operation module, a state machine module, and a counter module.

The address generation module includes two, and the address generation module 1 is responsible for generating an address signal for reading the external DWT coefficient buffer; the address generation module 2 is responsible for generating a corresponding address for reading the internal 5 block buffer area.

Determining the channel attribution module, according to the corresponding coding information in the current register group, determining whether the bit belongs to the current coding channel, and if it belongs to the current coding channel, performing the corresponding original coding operation, otherwise skipping the bit and continuing coding One bit.

The original coding module includes four parts, namely zero-value coding, symbol coding, refinement coding and run-length coding. The general implementation uses a lookup table to implement the original encoding operation, and this design is implemented in the form of a combined circuit, which can improve the speed of generating CX (encoding mode) and D (coding bit).

The state machine module determines the encoding process of the entire encoder. The encoding is mainly divided into two phases, namely the preprocessing phase and the mode generation phase. The pre-processing stage is mainly used to complete the initialization of the contents of the five buffer areas, and the pattern generation stage sequentially performs coding in the order of significant, thinning and clearing channels, and outputs the coding mode to the subsequent arithmetic coding module. The state machine module also receives the output from the counter to determine what state it is currently in. After each bit plane is encoded, it must go to the preprocessing stage, update the amplitude information of the next bit plane, and clear the contents of the access buffer.

4.Verilog design

The hardware architecture proposed in this paper is described by VERILOG[4]. The main module is bpc.v, including ram_block.v, addr_generator.v, fill_ram.v, pass_judge.v, coding_primitive.v and state_machine.v. The implementation of the encoding process generates a corresponding enable signal through the main state machine to activate the current module operation. When the operation is completed, the module generates an operation suspension signal to the main state machine, so that the encoding process proceeds to the next step. Some of the codes used for inter-module handshakes are listed below, with ellipsis representing some other control signals and some other states.

Case (cstate)

. . .


. . . Gene_layer_en = 1; fill_ram_en = 0; pass_judge_en = 0; . .

If (gene_layer_fin) nstate = fill_ram;else nstate = gene_layer;end


. . . Gene_layer_en = 0; fill_ram_en = 1; pass_judge_en = 0; . .

If (fill_ram_fin) nstate = pass_judge;else nstate = fill_ram;end

. . .


5. Experimental results

This design uses the modelsim tool for functional simulation, and uses the quartus [6] tool for logic synthesis. The comprehensive results are shown in Table 3.

The following is a comparison of the time of encoding several standard images (size 512×512 grayscale images) using jasper software and this hardware.

6 Conclusion

This paper analyzes the storage scheme of bit-plane encoder in JPEG2000, and designs an efficient storage structure and corresponding control circuit. The design uses verilog[4] language description, which can be synthesized by quartus[6] software logic. Completing a 512×512 grayscale image encoding in 0.1s, the encoding time is only about 30% of the jasper[3] software implementation. Due to the characteristics of the block encoder, the coding of each code block is independent and can be performed in parallel, and according to the comprehensive result, the IP core of multiple bit plane encoders can be integrated in one chip using EP1C12Q240C8, each block code The cores can be executed in parallel, so the speed of the encoder can be further increased, making it possible to process images in real time. In addition, the design can be customized as an ASIC, and the final product can be applied to digital cameras and image monitoring. The market prospect is very broad.

When you get into trouble, for example, dark, emergency or outdoor activities. You must be need a  Cob Aluminum Flashlight. Our products have 1-5 modes, the modes of every COB Aluminum Flashlight can customized. Usually, people prefer 3 modes;

Our products equipped with new COB LED technology, some products have a zoomable feature, you can ajust its focal distance according to your demand;

The anodized aluminum body and recessed LED housing ensures that the COB Aluminum Flashlight can handle any situation.

All of the flashlight have simple on/off push button operation;

COB Aluminum Flashlight for emergency events, camping, outdoor activities and indoor, due to the products have high power, they can irradiate long distance;

Our products are saled with factory price, and the quality can guarantee, lastly we provide warranty for 1 year.

COB Aluminum Flashlight

18650 Tactical Flashlight,Cob Aluminum Flashlight ,Cob Aluminum Torch,T6 Tactical Flashlight

Ningbo Henglang Import & Export Co.,Ltd ,

Posted on