Multi-core DSP architecture

picoArray for demanding signal processing applications

Events
  • Sorry, there are no upcoming events.
Quick links
  • Careers

    Find out about opportunities to work within our multi-talented team

  • Support site

    Comprehensive resources for registered customers and developers

  • Newsroom

    Information for media and journalists

  • Register for updates

    Keep informed about Mindspeed

The picoArray is a massively parallel, multiple instruction multiple data (MIMD) architecture designed for demanding signal processing applications.

picoArray ConceptField-proven in wireless infrastructure deployments around the world, picoArray devices provide ten times the performance of a legacy DSP, at a cost/performance ratio below $1/GMAC in volume.

A picoArray device is composed of processing elements of various types linked together by the patented picoBus interconnect. The basic building blocks of the array are 16-bit Harvard architecture processors, each with three-way long instruction word (LIW) and local memory. To these are added a variety of on-chip functional acceleration units (FAUs) and hardwired execution blocks designed to speed up specific tasks. Devices may also be equipped with a complete ARM9 processor subsystem.

The signal processing elements are supported by a rich variety of I/O and communication facilities, allowing the device to interface with a host processor, other picoArray chips, or real world inputs. Internally, each device is organized in a two-dimensional grid, with communication via a network of 32-bit buses (the picoBus) and programmable bus switches.

picoArray devices are available in a number of configurations:

  • The PC202 cost-optimized baseband processor integrates an ARM9 subsystem
  • The PC203 PHY processor provides high levels of processing power and scalability for compute-intensive applications
  • The PC205 combines scalability and an ARM9 processor with industrial environmental specifications

Despite deploying a rich diversity of processing elements, the picoArray is exceptionally easy to program. Most importantly, code is written in ANSI C, providing a familiar, industry-standard design methodology via the robust, mature picoTools design suite.

DSP designers who have experienced programming with the picoArray are consistently impressed with its ease of use, finding that programs written in C yield high performance, without the need for hand-tweaking or the intervention of a “DSP expert”. For an independent discussion of this, see this presentation from Cambridge Consultants.

Meanwhile, engineers used to designing with FPGAs also reap benefits. Like an FPGA, the picoArray structure is defined at design-time (not run-time); tasks are distributed “physically” in space; and deterministic, cycle-accurate simulations are possible. But, unlike an FPGA, timing closure is not an issue; design and build time is measured in minutes and seconds, not hours; development is in C or assembler; and task granularity is at the word (or sample) level, so implementation is more efficient and programming is inherently easier.

One of the keys to this ease of use and programming is the efficiency and flexibility of the picoBus interconnect. Providing terabit internal bus bandwidths, the picoBus allows processors to communicate totally deterministically while executing without interdependencies – so that the overall task can be partitioned into independent sub-processes, each of which runs on an individual execution unit.

Array elements interface with the bus via put and get instructions within the instruction set. The inter-processor communication protocol is based on a time division multiplexing (TDM) scheme, in which data transfers between processor ports occur during time slots, scheduled in software, and controlled using the bus switches. Communication paths are configured at compile time, so there is no run-time arbitration required, eliminating one of the greatest obstacles to delivering the potential benefits of multi-core systems.

The picoArray beats the processing power of any of today’s leading DSPs by more than a factor of ten. But just as importantly, the efficiency of the picoBus and picoTools means that more than 90% of that theoretical computing power can be used in a real system: even with the complex mix of control and datapath processing that is typical of today’s advanced wireless systems.

An analogy to the multi-core approach is the factory assembly line: each of the processors in the picoArray operates efficiently on one, simple task, while the system as a whole delivers great outputs. As each processor is very straightforward, it is easy to program. The innovation comes from the interconnection network (a true any:any mesh) and the programming model which enables developers to easily use all the performance via a familiar environment and standard languages. Indeed, the deterministic programming style eases debugging, integration, verification and test, significantly accelerating development.

Heterogenous multi-core computing

Each picoArray device contains a blend of processor units that allows it to deliver optimum performance for its target applications. These include standard, memory and control units, an industry standard ARM9 core, and various function accelerators:

Type Description Memory (bytes)
16-bit Harvard Architecture processor: three-way VLIW
StandardFor datapath operations. Includes dedicated MAC unit and application specific instructions for CDMA such as complex spread and despread768
MemoryFor local control and buffering. Includes larger data memory for buffering and a multiply unit8704
ControlFor global control and buffering. Includes large data and instruction memory and multiply unit65536
Supporting hardware
Functional accelerator unit (FAU)Flexible hardware engine for correlation and path metric calculation. Performs up to 20 billion correlations per second.N/A
Convolutional turbo code (CTC) blockAllows bit error rate enhancements. Supports IEEE 802.16e, TTA WiBRO and all 3GPP turbo code requirementsN/A
FFT/IFFT acceleratorSupports complex FFT sizes up to 1024 points (up to 2048 points in software). On-the-fly FFT/IFFT, 16-bit input and output with scaling, bit reversal, self-flushingN/A
Viterbi acceleratorOptimized for WiMAX and HSDPA. Configurable constraint lengths of 2 to 9; rates of 1/2, 1/3, 1/4, 1/5, 1/6, 1/7 or 1/8. Multiple puncturing rates. Block sizes from 16 to 1024N/A
Reed Solomon engine8Mbit/s Reed-Solomon block coding acceleratorN/A
Cryptographic engineSupports AES, DES, 3DES; NIST FIPS PUB 197, PUB 46-3, 800-38C, SP 800-38AN/A
ARM 9 processorIndustry-standard computing core with memory architecture and peripheralsN/A