Virtual classroom.
Course staff.
Instructor: Matus Telgarsky.
TA: Ziwei Ji.
Please do not email/DM us technical questions, please use public discord channels or go to office hours.
Evaluation.
80% is across 4 homeworks. (See below for policies.)
20% is a paper reading project. (See below for details.)
All work must be typed (however you wish, but typed), and submitted at gradescope.
No late work.
Do not cheat in this class, just drop it and take an easier one. No one cares whether you did or did not take this class.
Readings and prerequisites.
Basic math and proof writing; use the lecture notes and hw1 (out 8/31) to help determine readiness.
Basic machine learning, for instance the material in my CS 446 course.
"Understanding Machine Learning, by Shai ShalevShwartz and Shai BenDavid can be downloaded from that page, is free for personal use. I think this book is a wonderful resource, I find its presentation very clear, direct, and minimal.
Schedule will be continuously updated.
Lecture notes (html, pdf) are also continuously updated: all material we have not yet covered is subject to major revision!
Date  Topic  Assignments 

8/24 
Course introduction. Approximation: intro. Notes section 1, tablet notes 1. 

8/26 
Folklore multivariate approximation; classical universal approximation. Notes section 2, tablet notes 2. 

8/31 
Classical universal approximation; Infinitewidth approximation and Barron norms.. Notes sections 23, tablet notes 3. 
(8/31) hw1 tex, pdf. 
9/2 
Infinitewidth approximation and Barron norms. Notes section 3, tablet notes 4. 

9/7 9/9 9/14 
Approximation near initialization and the NTK. Notes section 4, tablet notes 5, tablet notes 6, tablet notes 7. 

9/14 9/16 9/21 
Benefits of depth: approximation lower bounds; Sobolev ball approximation Notes section 5, tablet notes 7, tablet notes 8, tablet notes 9. 
(9/22) hw1 due. 
9/23 9/28 9/30 
Optimization: intro; semiclassical convex optimization. Notes sections 67, tablet notes 10, tablet notes 11, tablet notes 12. 
(9/22) hw2 tex, pdf. 
9/30 10/5 10/7 
Strongconvexity style NTK optimization proof Notes section 8.1, tablet notes 12, tablet notes 13, tablet notes 14. 

10/12 10/14 10/19 10/21 
Nonsmoothness, positive homogeneity, and margin maximization. Notes sections 9&10, tablet notes 15, tablet notes 16, 
(10/20) hw2 due / hw 3 out. 
10/26 10/28 11/2 11/4 11/9 11/11 11/16 11/18 
Generalization: VC, Rademacher, and covering number bounds for deep networks. (Interpolation?)  (?/?) project info released? (11/17) hw3 due / hw4 out. 
11/30 12/2 12/7 
Extra topics? Interpolation? KolmogorovArnold? Smoothnessbased NTK analysis? Requests?  (?/?) project due? (12/15) hw4 due. 
Four homeworks, each 20% of course grade.
Homework must be typeset (albeit however you wish: latex, markdown, etc), and submitted in gradescope.
No late homework.
Academic integrity.
Discord discussions should be high level.
You may discuss in more detail with at most three other students, list their ID numbers on the first page of the homework.
All submitted homework must be in your own words; keep your discussions sufficiently high level to prevent claims of academic integrity violations.
I don’t expect you to be able to find homework solutions online (I don’t use online resources to come up with problems); if you do rely upon external resources, cite them properly, and still write your solutions in your own words. Please see Jeff Erickson’s discussion of academic integrity.
When integrity violations are found, they will be submitted to the department’s evaluation board.
This will be a paperreading project, due near the end of the semester, also worth 20% of total course grade. You’ll probably have two handins: something preliminary for us to discuss which paper you cover, and then at the end some succinct (23 page) writeup. Details will be posted before the end of October.
(These are out of date, please feel free to ping me to update them.)
Learning theory classes (not specifically deep learning).
Lieven Vandenberghe @ UCLA. This is not a learning theory course, it’s part 3 of a long optimization course, covering material not in the standard BoydVandenberghe book. The lectures links are to slides; the proofs there are incredibly clean, indeed this is my favorite resouce for many of these methods.
Textbooks and surveys. Again, there are many others, but here are a key few.
Boucheron, Bousquet, and Lugosi have two excellent surveys: one is a gentler start, whereas the other even gets to some advanced topics. I read the easier one immediately upon entering grad school (despite having taken a good learning theory course in undergrad) and really liked it.
Shai ShalevShwartz and Shai BenDavid have an excellent textbook from 2014: “Understanding Machine Learning: From Theory to Algorithms”.
Shai ShalevShwartz also has an excellent monograph covering online learning.
Devroye, Györfi, and Lugosi have a wonderful book from 1996, “A Probabilistic Theory of Pattern Recognition”, which in addition to covering standard topics in statistical learning theory, has many nice results which rarely enter courses, for instance Stone’s original argument for the consistency of knn in finitedimensional Euclidean space.
Martin Anthony and Peter Bartlett have a wonderful 1999 book, “Neural Network Learning: Theoretical Foundations”, which is again in the setting of statistical learning theory, and is the only listed reference with extensive VC dimension