Case studies
Research
How-to Guides
Our Learning Community
Opportunities
Blog
About Us
Who we are
Contact us
Perception & Scene Understanding
Perception & Scene Understanding
Seeing and understanding the world.
Author
Pantelis Monogioudis
Syllabus
Syllabus
Foundations
Rules, rule the world
AI Agents
The four approaches towards AI
Data Science 360
The Learning Problem
Linear Regression
Optimization Algorithms
Entropy
Maximum Likelihood Estimation of a marginal model
Maximum Likelihood Estimation of Gaussian Parameters
Maximum Likelihood (ML) Estimation of conditional models
Introduction to Classification
Logistic Regression
Deep Neural Networks
Introduction to Backpropagation
Backpropagation in Deep Neural Networks
Backpropagation DNN exercises
Fashion MNIST Case Study
Regularization in Deep Neural Networks
Regularization Workshop
Fusion of Statistical Learning Theory, Information Theory and Stochastic Optimization
Perception & Scene Understanding
Introduction to Convolutional Neural Networks
CNN Layers
CNN Example Architectures
Using convnets with small datasets
Visualizing what convnets learn
Feature Extraction via Residual Networks
Introduction to Scene Understanding
Object Detection
Object Detection and Semantic Segmentation Metrics
Region-CNN (RCNN) Object Detection
Fast and Faster RCNN Object Detection
Object Detection & Semantic Segmentation Workshop
Mask R-CNN Semantic Segmentation
Mask R-CNN Demo
Mask R-CNN - Inspect Training Data
Mask R-CNN - Inspect Trained Model
Mask R-CNN - Inspect Weights of a Trained Model
Detectron2 Beginner’s Tutorial
Introduction to Transfer Learning
Transfer Learning for Computer Vision Tutorial
Recursive State Estimation
Discrete Bayes Filter
Localization and Tracking
Kalman Filters
Large Language Models
Introduction to Recurrent Neural Networks (RNN)
Simple RNN
The Long Short-Term Memory (LSTM) Architecture
Time Series Prediction using RNNs
Introduction to NLP Pipelines
Tokenization
Word2Vec Embeddings
Word2Vec from scratch
Word2Vec Tensorflow Tutorial
Language Models
CNN Language Model
Simple RNN Language Model
LSTM Language Model from scratch
RNN-based Neural Machine Translation
Character-level recurrent sequence-to-sequence model
NMT Metrics - BLEU
Attention in RNN-based NMT
Transformers and Self-Attention
Single-head self-attention
Multi-head self-attention
Positional Embeddings
Logical Reasoning
Automated Reasoning
World Models
Logical Inference
Logical Agents
Planning without Interactions
Automated Planning
Planning Domain Definition Language (PDDL)
The Unified Planning Library
Logistics Planning in PDDL
Manufacrturing Robot Planning in PDDL
Planning with Search
Forward Search Algorithms
The A* Algorithm
Interactive Demo
Motion Planning for Autonomous Cars
Acting - Markov Decision Processes
Markov Decision Processes
Introduction to MDP
Bellman Expectation Backup
Policy Evaluation (Prediction)
Bellman Optimality Backup
Policy Improvement (Control)
MDP Dynamic Programming Algorithms
Policy Iteration
Value Iteration
MDP Workshop
Cleaning Robot - Deterministic MDP
Cleaning Robot - Stochastic MDP
The recycling robot.
Acting - Reinforcement Learning
Reinforcement Learning
Monte-Carlo Prediction
Temporal Difference (TD) Prediction
Model-free control
Generalized Policy Iteration
\(\epsilon\)
-greedy Monte-Carlo (MC) Control
The SARSA Algorithm
SARSA Gridworld Example
Math Background
Math for ML Textbook
Probability Basics
Linear Algebra for Machine Learning
Calculus
Resources
Your Programming Environment
Training Keras with the SLURM Scheduler
NYU JupyrterHub Environments
Submitting Your Assignment / Project
Learn Python
Assignments
aiml-common/assignments/mle/linear-regression/index.ipynb
aiml-common/assignments/object-detection/video-search.ipynb
aiml-common/assignments/object-tracking-kalman/drone.md
Project
Finetuning Language Models - Can I Patent This?
Categories
All
(24)
CNN Example Architectures
This is a very high level view of practical structures of CNNs before the advent of more innovative architectures such as ResNets.
CNN Layers
In the convolutional layer the first operation a 3D image with its two spatial dimensions and its third dimension due to the primary colors, typically Red Green and Blue is…
Fast and Faster RCNN Object Detection
Fast-RCNN is the second generation RCNN that aimed to accelerate RCNN. Apart from the complex training of RCNN, its inference involved a forward pass for each of the 2000…
Feature Extraction via Residual Networks
In the figure below we plot the evolution of depth in CNN architectures. Notice the big jump due to the introduction of the ResNet architecture.
Introduction to Convolutional Neural Networks
(content:cnn-intro)= # Introduction to Convolutional Neural Networks
Introduction to Scene Understanding
In the previous chapters we have treated the perception subsystem mainly from starting the first principles that govern supervised learning to the deep learning…
Introduction to Transfer Learning
Transfer Learning is a foundational approach to learning.
Localization and Tracking
In the recursive state estimation section we have seen the formulation of the Bayes filter and its application in a simple problem of trying to maintain a latent (internal…
Mask R-CNN Semantic Segmentation
The semantic segmentation approach described in this section is Mask R-CNN based on this paper paper.
Mask R-CNN
is an extension of Faster R-CNN that adds a mask head to the…
Object Detection
In the introductory section, we have seen examples of what object detection is. In this section we will treat the detection
pipeline
itself, summarized below:
Object Detection & Semantic Segmentation Workshop
Object Detection and Semantic Segmentation Metrics
NOTE: The following example is based on here and its corresponding implementation
Recursive State Estimation
In the scene understanding chapter we started putting together the perception pipelines that resulted in us knowing where are the objects of interest in the image coordinate…
Region-CNN (RCNN) Object Detection
We can think about the detection problem as a classification problem of
all possible portions
(windows/masks) of the input image since an object can be located at any…
No matching items
Back to top