Furlong J, Felch A, Nageswaran J, Dutt N, Nicolau A, Veidenbaum A, Chandrashekar A, Granger R (2007). Novel brain-derived algorithms scale linearly with number of processing elements. PARCO, 767-776.

Furlong J, Felch A, Nageswaran J, Dutt N, Nicolau A, Veidenbaum A, Chandreshekar A, Granger R.


Algorithms are often sought whose speed increases as processing elements are added, yet attempts at such parallelization typically result in little speedup, due to serial dependencies intrinsic to many algorithms. A novel class of algorithms have been developed that exhibit intrinsic parallelism, so that when processing elements are added to increase their speed, little or no diminishing returns are produced, enabling linear scaling under appropriate conditions, such as when flexible or custom hardware is added. The algorithms are derived from the brain circuitry of visual processing10, 17, 8, 9, 7. Given the brain’s ability to outperform computers on a range of visual and auditory tasks, these algorithms have been studied in attempts to imitate the successes of real brain circuits. These algorithms are slow on serial architectures, but as might be expected of algorithms derived from highly parallel brain architectures, their lack of internal serial dependencies makes them highly suitable for efficient implementation across multiple processing elements. Here, we describe a specific instance of an algorithm derived from brain circuitry, and its implementation in FPGAs. We show that the use of FPGAs instead of general-purpose processing elements enables significant improvements in speed and power. A single high end Xilinx Virtex 4 FPGA using parallel resources attains more than a 62x performance improvement and 2500x performance-per-watt improvement. Multiple FPGAs in parallel achieve significantly higher performance improvements, and in particular these improvements exhibit desirable scaling properties, increasing linearly with the number of processing elements added. Since linear scaling renders these solutions applicable to arbitrarily large applications, the findings may provide a new class of novel approaches for many domains, such as embedded computing and robotics, that require compact, low-power, fast processing elements

Richard Granger