Fpga-based CNN Inference Accelerator Synthesized From Multi-threaded C Software | Awesome LLM Papers Add your paper to Awesome LLM Papers

Fpga-based CNN Inference Accelerator Synthesized From Multi-threaded C Software

Jin Hee Kim, Brett Grady, Ruolong Lian, John Brothers, Jason H. Anderson . 2017 30th IEEE International System-on-Chip Conference (SOCC) 2018 – 50 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Scalability

A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.

Similar Work