Discussions: Fridays, 4:00-5:50pm, Kinsey Pavilion 1200B

### Recent Announcements

**May 26:**Homework 5 has been posted. It is due at the start of the last discussion section, which will serve as a review session for the final.**May 23:**Reading assignments for the rest of the quarter have been posted. The final exam will cover material from Bertsekas & Tsitsiklis Chapters 1, 2, 3, 4.2, 5.1, 5.2, 5.4, 7.1-7.4, 8.1, 8.2, and 9.1 up to page 468. You are responsible for this material, as well as the additional material covered in class (Naive Bayes, PageRank, and so on).**May 19:**Alternative grading policy: If it is in your best interest, we will count the final exam for 45% of your final grade and the midterm for 15% instead of the standard 30% and 30%. Remember, the final will be a cumulative exam.**May 17:**We've added a new script, getTweets.rb, to the homework 4 files on courseweb. You can use this script to download additional Twitter data for your classifier.**May 12:**Homework 4 has been posted. The C++ skeleton files you will need for this assignment are available under "Assignments" on courseweb.**April 27:**Homework 3 has been posted below. The problems on this assignment are good practice for the midterm.**April 25:**Next week's midterm will cover Chapters 1-3 of the texbook, except for 3.3. Catch up on your reading now! You may bring one double-sided**hand-written**sheet of notes to the exam, but no other notes, books, laptops, cell phones, calculators, etc. are allowed.**April 14:**Homework 2 has been posted below.**April 13:**Ethan will be traveling next week, so there will be no Monday office hours on April 18 and no discussion section on April 22. Ethan will go over homework 2 in section this Friday.**April 7:**Solutions to today's quiz are up on courseweb. In the future, this is where you should look for any solutions.

### Course Overview

Don't be confused by the title! This course is designed to help
students develop the mathematical reasoning skills necessary to solve
problems that involve *uncertainty*. The foundational problem solving
skills you will learn in this class are crucial for many exciting
areas of computer science that inherently involve uncertainty,
including artificial intelligence and machine learning, data mining,
financial modeling, natural language processing, bioinformatics, web
search, algorithm design, cryptography, system design, network
analysis, and more. These skills will also help you analyze the
uncertainty in your day-to-day life.

The first half of the course will cover the basics of probability, including probabilistic models, conditional probability, discrete and continuous random variables, expectation, mean and variance, the Central Limit Theorem, and the Law of Large Numbers. The second half of the course will focus on Markov chains and statistical inference.

Statistics 100A or 110A is required for this course. Although we will review all of the basics of probability in class, we will go through some of this material very quickly. If you are not familiar with basic concepts like random variables and expectation, the first half of the course will be more challenging and require extra effort from you.

### Staff and Office Hours

**Instructor:** Prof. Jenn Wortman Vaughan

Office Hours: Wednesdays, 1:00-3:00, 4532H Boelter Hall

Contact: jenn at cs

**TA:** Ethan Schreiber

Office Hours: Mondays, 11:30-1:30, 2432 Boelter Hall

Contact: ethan at cs

**Grader:** Brian Geffon

Contact: briangeffon at gmail

### Breakdown of Grades

Grades will be based on the following components:

**Homework Assignments (30%):**There will be 5 homework assignments, typically due in class on Tuesdays.**No late homeworks will be accepted.**Most problems will be of the pencil-and-paper variety, though there will be a few small programming components too. For each assignment, a subset of the problems will be graded in detail; you may or may not know in advance which problems will be graded. Your solutions will be graded on both**correctness and clarity**. If you cannot solve a problem completely, you will get more partial credit if you identify the gaps in your argument than you will if you try to cover them up.**In-class Quizzes (10%):**Occasional quizzes will be given at the start of class. Missed quizzes cannot be made up, but your lowest quiz grade will be dropped.**Midterm (30%):**An in-class midterm will be given on Tuesday, May 3. The midterm will be closed book, but one page of double-sided**hand-written**notes is allowed. Calculators and cell phones may not be used during the exam.**Final Exam (30%):**A cumulative final exam will be held Thursday, June 09, 3:00-6:00pm. The same rules apply as for the midterm.

*UPDATE:* If it is in your best interest, we will count the
final exam for 45% of your final grade and the midterm for 15% instead
of the standard 30% and 30%. Remember, the final will be a
*cumulative* exam.

### Schedule & Readings

The required textbook for this course is Introduction to
Probability (2nd Edition) by Dimitri P. Bertsekas and John
N. Tsitsiklis. We will cover Chapters 1-3, parts of Chapters 4 and 5,
and parts of Chapters 7-9. Assigned readings will be posted here
throughout the quarter. To get the most out of class, you should
complete the required reading **before** each lecture.

Slides will be posted here after each lecture for your convenience, but reading the slides is not a good substitute for coming to class. In particular, the slides generally do not contain the details of the proofs and examples that we will go over in class.

**Tuesday, March 29**(slides)

Overview of course logistics. Review of some basic concepts of probability theory, including sample spaces and events, the axioms of probability, and conditional probability.

Reading: B&T pages 1-28

Handouts: course overview and logistics, academic honesty policy**Thursday, March 31**(slides)

Examples of conditional probability, independence, total probability, Bayes' rule.

Reading: B&T pages 28-43**Tuesday, April 5**(slides)

Counting problems, permutations, combinations, the binomial distribution.

Reading: B&T pages 44-52

See also: this discussion of the Waffle House problem and this investigation into burger options**Thursday, April 7**(slides)

Discrete random variables, probability mass functions, examples of common random variables, expectation.

Reading: B&T pages 72-81 (actually went through page 83)

**Tuesday, April 12**(slides)

Expectation and variance. Using expectation to make decisions. Power law distributions.

Reading: B&T pages 81-97 (only got up to page 92)

Optional reading: this tutorial on Zipf's law and power law distributions, if you're curious**Thursday, April 14**(slides)

Joint PMFs, conditional PMFs, independence of random variables.

Reading: B&T pages 97-118 and 217-222 (will do 217-222 next time)**Tuesday, April 19**(slides)

Covariance and correlation. Continuous random variables, PDFs and CDFs, exponential random variables.

Reading: B&T pages 140-158 (only went through 153)**Thursday, April 21**(slides)

Comparison of exponential and geometric random variables, joint PDFs.

Reading: B&T pages 159-183**Tuesday, April 26**(slides)

More discussion of continuous random variables and joint PDFs. Review of the Total Expectation Theorem and an application to searching sorted linked lists.

Catch up on your reading before the midterm next week! By now you should have read all of Chapters 1-3 (though we have not yet discussed 3.3 in class).**Thursday, April 28**(slides)

Bounds on probabilities, the Law of Large Numbers.

Reading: B&T pages 264-273**Tuesday, May 3**

*In class midterm!!***Thursday, May 5**(slides)

Normal random variables, the Central Limit Theorem. Start of statistical inference: maximum likelihood and MAP.

Reading: B&T pages 153-158 and 273-279

The reading on inference is listed under the next few lectures.**Tuesday, May 10**(slides)

Continuation of intro to inference. Maximum likelihood, parameter estimation.

Reading: B&T pages 460-468**Thursday, May 12**(slides)

Naive Bayes classifiers.

Reading: Pages 1-6 (not 2.4) of Tom Mitchell's chapter on Generative and Discriminative Classifiers**Tuesday, May 17**(slides)

Finishing up ML parameter estimation. Bayesian inference, priors and posteriors, and MAP parameter estimation.

Reading: B&T pages 412-429 (though our focus will be a little different from the book...)

*Please note that the schedule and readings have changed a bit. We will skip the material from pages 468-485.***Thursday, May 19**(slides)

Graphical models, Bayesian networks.

Reading: Sections 1-2 only of Adnan Darwiche's overview of Bayesian networks**Tuesday, May 24**(slides)

More examples of Bayesian networks, d-separation.

Reading: Judea Pearl's d-Separation Without Tears**Thursday, May 26**(slides)

Discrete time Markov chains, n-step transitions, characterizations of states.

Reading: B&T pages 340-351**Tuesday, May 31**(slides)

Long term behavior of Markov chains, balance equations, Google's PageRank algorithm.

Reading: B&T pages 352-361

Optional reading: How Google Finds Your Needle in the Web's Haystack, though it is not necessary to understand the details of the power method and the final method of approximating I**Thursday, June 2**(slides)

Absorption probabilities in Markov chains, the gambler's ruin.

Reading: B&T pages 362-369**Thursday, June 9**

*Final exam, 3:00-6:00pm!!*

### Homework Assignments

Homework assignments will be posted here throughout the quarter.

- Homework 1: Due Tuesday, April 12
- Homework 2: Due Tuesday, April 26
- Homework 3: Due Tuesday, May 10
- Homework 4: Due Tuesday, May 24
- Homework 5: Due Friday, June 3

(Clarifications: In problem 1, the*bias*is the probability of heads. In problem 5, assume the states are labeled 1, 2, ..., 6.)

### Academic Honesty Policy

Collaboration on the homework assignments is encouraged! Discussing the problems with others can help you learn. Students are free to discuss the homework problems with anyone in the class under the following conditions:

- Each student must write down his or her solutions independently, and must understand the solutions he or she writes down. Talking over solutions is fine, but reading or copying another student's answers is not acceptable!
- Each student must write a list of all of his or her collaborators at the top of each assignment. This list should include anyone with whom the assignment was discussed.

These policies are described in the Academic Honesty Policy that must be signed by every student in the class. The Dean of Students also has a a guide to Academic Integrity.