Case studies
Research
How-to Guides
Our Learning Community
Opportunities
Blog
About Us
Who we are
Contact us
Acting - Reinforcement Learning
Model-free control
Syllabus
Syllabus
Foundations
Rules, rule the world
AI Agents
The four approaches towards AI
Data Science 360
The Learning Problem
Linear Regression
Optimization Algorithms
Entropy
Maximum Likelihood Estimation of a marginal model
Maximum Likelihood Estimation of Gaussian Parameters
Maximum Likelihood (ML) Estimation of conditional models
Introduction to Classification
Logistic Regression
Deep Neural Networks
Introduction to Backpropagation
Backpropagation in Deep Neural Networks
Backpropagation DNN exercises
Fashion MNIST Case Study
Regularization in Deep Neural Networks
Regularization Workshop
Fusion of Statistical Learning Theory, Information Theory and Stochastic Optimization
Perception & Scene Understanding
Introduction to Convolutional Neural Networks
CNN Layers
CNN Example Architectures
Using convnets with small datasets
Visualizing what convnets learn
Feature Extraction via Residual Networks
Introduction to Scene Understanding
Object Detection
Object Detection and Semantic Segmentation Metrics
Region-CNN (RCNN) Object Detection
Fast and Faster RCNN Object Detection
Object Detection & Semantic Segmentation Workshop
Mask R-CNN Semantic Segmentation
Mask R-CNN Demo
Mask R-CNN - Inspect Training Data
Mask R-CNN - Inspect Trained Model
Mask R-CNN - Inspect Weights of a Trained Model
Detectron2 Beginner’s Tutorial
Introduction to Transfer Learning
Transfer Learning for Computer Vision Tutorial
Recursive State Estimation
Discrete Bayes Filter
Localization and Tracking
Kalman Filters
Large Language Models
Introduction to Recurrent Neural Networks (RNN)
Simple RNN
The Long Short-Term Memory (LSTM) Architecture
Time Series Prediction using RNNs
Introduction to NLP Pipelines
Tokenization
Word2Vec Embeddings
Word2Vec from scratch
Word2Vec Tensorflow Tutorial
Language Models
CNN Language Model
Simple RNN Language Model
LSTM Language Model from scratch
RNN-based Neural Machine Translation
Character-level recurrent sequence-to-sequence model
NMT Metrics - BLEU
Attention in RNN-based NMT
Transformers and Self-Attention
Single-head self-attention
Multi-head self-attention
Positional Embeddings
Logical Reasoning
Automated Reasoning
World Models
Logical Inference
Logical Agents
Planning without Interactions
Automated Planning
Planning Domain Definition Language (PDDL)
The Unified Planning Library
Logistics Planning in PDDL
Manufacrturing Robot Planning in PDDL
Planning with Search
Forward Search Algorithms
The A* Algorithm
Interactive Demo
Motion Planning for Autonomous Cars
Acting - Markov Decision Processes
Markov Decision Processes
Introduction to MDP
Bellman Expectation Backup
Policy Evaluation (Prediction)
Bellman Optimality Backup
Policy Improvement (Control)
MDP Dynamic Programming Algorithms
Policy Iteration
Value Iteration
MDP Workshop
Cleaning Robot - Deterministic MDP
Cleaning Robot - Stochastic MDP
The recycling robot.
Acting - Reinforcement Learning
Reinforcement Learning
Monte-Carlo Prediction
Temporal Difference (TD) Prediction
Model-free control
Generalized Policy Iteration
\(\epsilon\)
-greedy Monte-Carlo (MC) Control
The SARSA Algorithm
SARSA Gridworld Example
Math Background
Math for ML Textbook
Probability Basics
Linear Algebra for Machine Learning
Calculus
Resources
Your Programming Environment
Training Keras with the SLURM Scheduler
NYU JupyrterHub Environments
Submitting Your Assignment / Project
Learn Python
Assignments
aiml-common/assignments/mle/linear-regression/index.ipynb
aiml-common/assignments/object-detection/video-search.ipynb
aiml-common/assignments/object-tracking-kalman/drone.md
Project
Finetuning Language Models - Can I Patent This?
On this page
Model-free control
Report an issue
Model-free control
Back to top
Temporal Difference (TD) Prediction
Generalized Policy Iteration