基于burg算法的谱估计研究及其matlab实现毕业设计论文(编辑修改稿)内容摘要:

ock frequency approximately m times higher and a munication bottleneck with the divider. The clockrate of the divider can be increased to a similar maximum rate as the multiplier by pipelining the carries in each individual CAS stage. However, this means that each output bit is then available only once in every m clock cycles. There is also the problem that data streams must be reversed between multipliers and dividers. One possibility is to use registers and extra control logic to reorder the bit stream from the divider but the operation time is still limited. The efficiency of the divider with the pipelined carry can be greatly improved by using the redundant slots between the output of successive bits to perform other separate divisions. The bitserial/wordparallel divider shown in[3]allows m+1 individual divisions to be performed simultaneously or interleaved. This decreases the mean division operation time to achieve similar performance to a bitserial multiplier but there is still the problem of data stream matching when interfacing such devices. One way to tackle this problem is to redesign the multiplier so that it works on a MSB first data stream, rather than storing and reordering the divider outputs which increases latency and control requirement[10]. MSB first multiplication, first demonstrated by McCanny et al.[11], shows it is possible to perform multiplication on positive numbers by summing partial products(PPs)in reverse order to the norm. This also requires inclusion of an MSB first addition unit to ensure that output carries from the PPs are added into the final product. LarssonEdefors and Marnane[12], extend the concept of MSB first multiplication to the two’s plement number system and show bitserial architectures for this application. In order to match the divider bitstreams exactly to the multiplier bitstreams it is then just a matter of inserting extra delays along the FA sum pipeline so that the addition of PPs from a number of different multiplications can be performed simultaneously as shown by Bellis et al[13]. Study of the bitserial interleaved divider and multiplier reveals that both architectures show a large degree of similarity. Both work in load/operational phases。 the loading works for the divisor and multiplier both consist of m+1 delay feedback SISO registers and the FA sum/carry pipelines are alike. Both designs also require MSB first, half adder(HA)cell, addition stages。 the divider requires m PEs, for 1’s plement error correction which occurs for negati ve dividends, and the multiplier requires m1 HA PEs to add the output carries from the PPs. Therefore, it is possible to bine the two designs to make a programmable bitserial device which allows m+1 putations to be simultaneously interleaved, as shown in figure 1. The processor has two mode selection inputs DIVi and SUBi, which control four modes of operation ii YZX /0  or iii YXZZ 0 where iZ and 0Z are both double precision. Ldi is the load/operational mode select signal for the storage of iY and iZ over the first m(m+1) clock cycles. Ldi switches into operational mode over the next m(m+1) clock cycles where the remaining data is input and the bulk of the putation is performed in the FA array. All control signals are fully pipelined similarly to the data, allowing the shortest possible block pipeline period of 2m(m+1) clock cycles and continuous input/output of data(. while one block set of m+1 putations are being output, the next block set may be loaded in). The pipeline also allows independent functionality between each of the separate interleaves and on the same interleave a division may immediately follow an inner step product putation and viceversa. 4. INTERLEAVED PROCESSOR BASED MODIFIED COVARIANCE SYSTEM Costbenefit analysis on systolic array implementation of the CMR and Cholesky sections of the MC spectral estimator shows that a 12 bit fixed point wordlength is sufficient for these putations[7]. Using the bitserial processor with a 12 bit wordlength results in the capacity for interleaving 13 putations. On interleaves 0 to 4 the CMR multiplications are performed over N consecutive block sets, such that the products inn xx  are produced on interleave )40( ii and blockset )10(  Nnn . A bitserial systolic array provides the correct input data sequencing from consecutive Doppler signal samples and a separate MSB first double precision accumulator, whose architecture is similar to that of HA section in figure1, putes the covariance matrix elements, which are then stored in RAM. The system for puting the CMR calculation is shown in figure 2. The entire Cholesky, forward elimination, back substitution and WNV putations are performed on interleave 5 on the system shown in figure 3. Here division and inner product step putation are necessary. Once the covariance matrix elements are stored in the dual port RAM after block set N the Cholesky deposition can mence on interleave 5 while in parallel the CMR putation on the next set of data can be processed on interleaves 0 to 4. A ROM block controls the addressing of the dual port RAM for retrieval of stored data to go onto the processor inputs and storage of the processor results. To achieve good dynamic resolution for the low wordlength used, a systolic array scaling module is included between the RAM and the processor, whose scaling factors are also produced by the ROM controller along with the mode control. Overall timing in the system is controlled by three counters, qi(range 0 to 12),qb(range 0 to 23) and qw(range 0 to N)corresponding to the interleaves, bitposition and input word. A zero padded point DFT is puted on interleaves 6, 7, 8 and 9. This is basically amatrix vector multiplication and is puted by using the processor in inner product step mode. The system for this sectio。
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。 用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。