Skip to main content

TODO AFTER FIRST CLASS

  • Form a group (3-4 students) to pick a topic for presentation (topics are listed below under group topics).
  • Each student will be expected to present for 15-20m.

Syllabus (Google collabotary)

  • Familiarize yourself with tensorflow and how to create DNNs
  • Tensor flow tutorials
  • Clone notebooks to your drive before running them
  • To run a notebook:
    • Reset all runtimes.
    • Run all
    • Select TPU in Runtime->Change runtime type). Do you notice an improvement in speed?
    • What is a TPU? (You will learn about it in class).

Syllabus (Slides)

Lecture notes

Links

  • Deep Learning for Computer Architects (chap 1-3) Link
  • Efficient Processing of Deep Neural Networks: A Tutorial and Survey Link

Lecture notes

Links

  • Google Collaboraratory Slides (Linear Model and Convolution layer)
  • Link

Lecture notes

Links

  • Google Collaboraratory Slides (Keras Model and Inception layers)
  • Link

Lecture notes

Links

  • FFT Convolutions are Faster than Winograd on Modern CPUs, Here’s Why Link
  • Optimizing CNN Model Inference on CPUs Link
  • Optimizing NN on ARM CPUs

Lecture notes

Links

  • Introduction to DNN accelerators Link
  • A primer on DNN Dataflows Link
  • MAESTRO Analytical Model Link

Lecture notes

Links

  • DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning Link
  • Google TPU Link
  • Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Link
  • MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects Link

Lecture notes

Links

  • Cambricon: An Instruction Set Architecture for Neural Networks Link
  • Learning to Optimize Tensor Programs (Auto TVM) Link
  • SERVING RECURRENT NEURAL NETWORKS EFFICIENTLY WITH A SPATIAL ACCELERATOR Link
  • VTA: An Open Hardware-Software Stack for Deep Learning Link
  • From High-Level Deep Neural Models to FPGAs (DNN weaver) Link

Group Presentation Topics

Lecture notes

Links

  • UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition Link
  • EIE: efficient inference engine on compressed deep neural network Link
  • Cnvlutin: Ineffectual-neuron-free deep neural network computing Link
  • SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks Link
  • Fine-grained accelerators for sparse machine learning workloads Link
  • A Sparse Matrix Vector Multiply Accelerator for Support Vector Machine Link

Lecture notes

Links

  • Timeloop: A Systematic Approach to DNN Accelerator Evaluation Link
  • Tangram: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators Link
  • Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Link

Lecture notes

Links

  • Stripes: Bit-Serial Deep Neural Network Computing Link
  • Mixed Precision Training Link
  • Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent Link
  • Gist: Efficient Data Encoding for Deep Neural Network Training Link

Lecture notes

Links

  • Survey Link
  • Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. Link
  • A Case for Neuromorphic ISAs Link
  • IBM True North Link
  • Demonstrating Advantages of Neuromorphic Computation: A Pilot Study Link

Lecture notes

Links

  • DRISA: a DRAM-based Reconfigurable In-Situ Accelerator Link
  • ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars Link
  • Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Link