Computer Vision

Giving computers the ability to interpret images and videos as well as humans do.

Introduction

Computer Vision is studying algorithms and methods of giving computers the ability to interpret images and videos as well as humans do. I perceive Computer Vision to branch into 2 main categories: Traditional Computer Vision and Deep Learning.

Learning

While most of my Computer Vision knowledge came from university courses, there were some resources that I definitely found useful to bolster my learning:

  • OpenCV documentation and tutorials cover the majority of popular traditional Computer Vision techniques.

  • For Deep Learning, Stanford's CS231n course on Convolutional Neural Networks is excellent for understanding deep learning and CNNs. Preliminary basic machine learning is encouraged. I watched the Winter 2016 term available on YouTube as it was taught by world-class AI researcher Andrej Karpathy, accessible here. Andrej Karpathy himself writes blogs and teaches in a very intuitive and understandable way.

Tools

  • I have primarily used the Python programming language for Computer Vision. C++ is a popular alternative that has a steeper learning curve, but can produce more efficient algorithms.

  • For Traditional Computer Vision, OpenCV is the most popular library. The scikit-image library also offers a collection of image algorithms if you cannot find your desired algorithm in OpenCV. Numpy is a necessity for fast and efficient matrix operations.

  • For Deep Learning, Tensorflow is the main bare-bones library to construct and train neural networks for deep learning. Alternatively, there are many abstraction libraries built on top of Tensorflow aimed for quick productivity. I mainly use Keras, but there is also PyTorch. Other helpful libraries include Theano, CNTK, and scikit-learn.

Last updated

Was this helpful?