CTIA Wireless
Mar 22, 2010 - Mar 25, 2010
NGN & Basestations Conf
Apr 19, 2010 - Apr 22, 2010
LTE World Summit
May 18, 2010 - May 19, 2010
Femtocells World Summit
Jun 22, 2010 - Jun 24, 2010
-
Careers
Find out about opportunities to work within our multi-talented team
-
Support site
Comprehensive resources for registered customers and developers
-
Newsroom
Information for media and journalists
-
Register for updates
Keep informed about picoChip
The picoArray is a massively parallel, multiple instruction multiple data (MIMD) architecture designed for demanding signal processing applications.
Field-proven in wireless infrastructure deployments around the world, picoArray devices provide ten times the performance of a legacy DSP, at a cost/performance ratio below $1/GMAC in volume.
A picoArray device is composed of processing elements of various types linked together by the patented picoBus interconnect. The basic building blocks of the array are 16-bit Harvard architecture processors, each with three-way long instruction word (LIW) and local memory. To these are added a variety of on-chip functional acceleration units (FAUs) and hardwired execution blocks designed to speed up specific tasks. Devices may also be equipped with a complete ARM9 processor subsystem.
The signal processing elements are supported by a rich variety of I/O and communication facilities, allowing the device to interface with a host processor, other picoArray chips, or real world inputs. Internally, each device is organized in a two-dimensional grid, with communication via a network of 32-bit buses (the picoBus) and programmable bus switches.
picoArray devices are available in a number of configurations:
- The PC202 cost-optimized baseband processor integrates an ARM9 subsystem
- The PC203 PHY processor provides high levels of processing power and scalability for compute-intensive applications
- The PC205 combines scalability and an ARM9 processor with industrial environmental specifications
Despite deploying a rich diversity of processing elements, the picoArray is exceptionally easy to program. Most importantly, code is written in ANSI C, providing a familiar, industry-standard design methodology via the robust, mature picoTools design suite.
DSP designers who have experienced programming with the picoArray are consistently impressed with its ease of use, finding that programs written in C yield high performance, without the need for hand-tweaking or the intervention of a “DSP expert”. For an independent discussion of this, see this presentation from Cambridge Consultants.
Meanwhile, engineers used to designing with FPGAs also reap benefits. Like an FPGA, the picoArray structure is defined at design-time (not run-time); tasks are distributed “physically” in space; and deterministic, cycle-accurate simulations are possible. But, unlike an FPGA, timing closure is not an issue; design and build time is measured in minutes and seconds, not hours; development is in C or assembler; and task granularity is at the word (or sample) level, so implementation is more efficient and programming is inherently easier.
One of the keys to this ease of use and programming is the efficiency and flexibility of the picoBus interconnect. Providing terabit internal bus bandwidths, the picoBus allows processors to communicate totally deterministically while executing without interdependencies – so that the overall task can be partitioned into independent sub-processes, each of which runs on an individual execution unit.
Array elements interface with the bus via put and get instructions within the instruction set. The inter-processor communication protocol is based on a time division multiplexing (TDM) scheme, in which data transfers between processor ports occur during time slots, scheduled in software, and controlled using the bus switches. Communication paths are configured at compile time, so there is no run-time arbitration required, eliminating one of the greatest obstacles to delivering the potential benefits of multi-core systems.
The picoArray beats the processing power of any of today’s leading DSPs by more than a factor of ten. But just as importantly, the efficiency of the picoBus and picoTools means that more than 90% of that theoretical computing power can be used in a real system: even with the complex mix of control and datapath processing that is typical of today’s advanced wireless systems.
An analogy to the multi-core approach is the factory assembly line: each of the processors in the picoArray operates efficiently on one, simple task, while the system as a whole delivers great outputs. As each processor is very straightforward, it is easy to program. The innovation comes from the interconnection network (a true any:any mesh) and the programming model which enables developers to easily use all the performance via a familiar environment and standard languages. Indeed, the deterministic programming style eases debugging, integration, verification and test, significantly accelerating development.
Heterogenous multi-core computing
Each picoArray device contains a blend of processor units that allows it to deliver optimum performance for its target applications. These include standard, memory and control units, an industry standard ARM9 core, and various function accelerators:
| Type | Description | Memory (bytes) |
| 16-bit Harvard Architecture processor: three-way VLIW | ||
| Standard | For datapath operations. Includes dedicated MAC unit and application specific instructions for CDMA such as complex spread and despread | 768 |
| Memory | For local control and buffering. Includes larger data memory for buffering and a multiply unit | 8704 |
| Control | For global control and buffering. Includes large data and instruction memory and multiply unit | 65536 |
| Supporting hardware | ||
| Functional accelerator unit (FAU) | Flexible hardware engine for correlation and path metric calculation. Performs up to 20 billion correlations per second. | N/A |
| Convolutional turbo code (CTC) block | Allows bit error rate enhancements. Supports IEEE 802.16e, TTA WiBRO and all 3GPP turbo code requirements | N/A |
| FFT/IFFT accelerator | Supports complex FFT sizes up to 1024 points (up to 2048 points in software). On-the-fly FFT/IFFT, 16-bit input and output with scaling, bit reversal, self-flushing | N/A |
| Viterbi accelerator | Optimized for WiMAX and HSDPA. Configurable constraint lengths of 2 to 9; rates of 1/2, 1/3, 1/4, 1/5, 1/6, 1/7 or 1/8. Multiple puncturing rates. Block sizes from 16 to 1024 | N/A |
| Reed Solomon engine | 8Mbit/s Reed-Solomon block coding accelerator | N/A |
| Cryptographic engine | Supports AES, DES, 3DES; NIST FIPS PUB 197, PUB 46-3, 800-38C, SP 800-38A | N/A |
| ARM 9 processor | Industry-standard computing core with memory architecture and peripherals | N/A |
