Foundations of Machine Learning
Understand the Concepts, Techniques and Mathematical Frameworks Used by Experts in Machine Learning
About This Course
Bloomberg presents "Foundations of Machine Learning," a training course that was initially delivered internally to the company's software engineers as part of its "Machine Learning EDU" initiative. This course covers a wide variety of topics in machine learning and statistical modeling. The primary goal of the class is to help participants gain a deep understanding of the concepts, techniques and mathematical frameworks used by experts in machine learning. It is designed to make valuable machine learning skills more accessible to individuals with a strong math background, including software developers, experimental scientists, engineers and financial professionals.
The 30 lectures in the course are embedded below, but may also be viewed in this YouTube playlist . The course includes a complete set of homework assignments, each containing a theoretical element and implementation challenge with support code in Python, which is rapidly becoming the prevailing programming language for data science and machine learning in both academia and industry. This course also serves as a foundation on which more specialized courses and further independent study can build.
Please fill out this short online form to register for access to our course's Piazza discussion board. Applications are processed manually, so please be patient. You should receive an email directly from Piazza when you are registered. Common questions from this and previous editions of the course are posted in our FAQ .
The first lecture, Black Box Machine Learning , gives a quick start introduction to practical machine learning and only requires familiarity with basic programming concepts.
Highlights and Distinctive Features of the Course Lectures, Notes, and Assignments
- Geometric explanation for what happens with ridge, lasso, and elastic net regression in the case of correlated random variables.
- Investigation of when the penalty (Tikhonov) and constraint (Ivanov) forms of regularization are equivalent.
- Concise summary of what we really learn about SVMs from Lagrangian duality.
- Proof of representer theorem with simple linear algebra, emphasizing it as a way to reparametrize certain objective functions.
- Guided derivation of the math behind the classic diamond/circle/ellipsoids picture that "explains" why L1 regularization gives sparsity (Homework 2, Problem 5)
- From scrach (in numpy) implementation of almost all major ML algorithms we discuss: ridge regression with SGD and GD (Homework 1, Problems 2.5, 2.6 page 4), lasso regression with the shooting algorithm (Homework 2, Problem 3, page 4), kernel ridge regression (Homework 4, Problem 3, page 2), kernelized SVM with Kernelized Pegasos (Homework 4, 6.4, page 9), L2-regularized logistic regression (Homework 5, Problem 3.3, page 4),Bayesian Linear Regession (Homework 5, problem 5, page 6), multiclass SVM (Homework 6, Problem 4.2, p. 3), classification and regression trees (without pruning) (Homework 6, Problem 6), gradient boosting with trees for classification and regression (Homework 6, Problem 8), multilayer perceptron for regression (Homework 7, Problem 4, page 3)
- Repeated use of a simple 1-dimensional regression dataset, so it's easy to visualize the effect of various hypothesis spaces and regularizations that we investigate throughout the course.
- Investigation of how to derive a conditional probability estimate from a predicted score for various loss functions, and why it's not so straightforward for the hinge loss (i.e. the SVM) (Homework 5, Problem 2, page 1)
- Discussion of numerical overflow issues and the log-sum-exp trick (Homework 5, Problem 3.2)
- Self-contained introduction to the expectation maximization (EM) algorithm for latent variable models.
- Develop a general computation graph framework from scratch, using numpy, and implement your neural networks in it.
The quickest way to see if the mathematics level of the course is for you is to take a look at this mathematics assessment , which is a preview of some of the math concepts that show up in the first part of the course.
- Solid mathematical background , equivalent to a 1-semester undergraduate course in each of the following: linear algebra, multivariate differential calculus, probability theory, and statistics. The content of NYU's DS-GA-1002: Statistical and Mathematical Methods would be more than sufficient, for example.
- Python programming required for most homework assignments.
- Recommended: At least one advanced, proof-based mathematics course
- Recommended: Computer science background up to a "data structures and algorithms" course
- (HTF) refers to Hastie, Tibshirani, and Friedman's book The Elements of Statistical Learning
- (SSBD) refers to Shalev-Shwartz and Ben-David's book Understanding Machine Learning: From Theory to Algorithms
- (JWHT) refers to James, Witten, Hastie, and Tibshirani's book An Introduction to Statistical Learning
GD, SGD, and Ridge Regression
SVM and Sentiment Analysis
Multiclass, Trees, and Gradient Boosting
Computation Graphs, Backpropagation, and Neural Networks
Other tutorials and references
- Carlos Fernandez-Granda's lecture notes provide a comprehensive review of the prerequisite material in linear algebra, probability, statistics, and optimization.
- Brian Dalessandro's iPython notebooks from DS-GA-1001: Intro to Data Science
- The Matrix Cookbook has lots of facts and identities about matrices and certain probability distributions.
- Stanford CS229: "Review of Probability Theory"
- Stanford CS229: "Linear Algebra Review and Reference"
- Math for Machine Learning by Hal Daumé III
David S. Rosenberg
MGMT 4190/6560 Introduction to Machine Learning Applications @Rensselaer
Assignment 3 ¶
Save your working file in Google drive so that all changes will be saved as you work. MAKE SURE that your final version is saved to GitHub.
Before you turn this in, make sure everything runs as expected. First, restart the kernel (in the menu, select Kernel → Restart) and then run all cells (in the menubar, select Cell → Run All). You can speak with others regarding the assignment but all work must be your own.
This is a 30 point assignment. ¶
You may find it useful to go through the notebooks from the course materials when doing these exercises.
If you attempt to fake passing the tests you will receive a 0 on the assignment and it will be considered an ethical violation.
Exercises - For and If and Functions ¶
(1). Create a function list_step that accepts 3 variables ( start , stop , step ). The function returns a list starting at start , ending at stop , and with a step of step .
list_step(5, 19, 2)
[5, 7, 9, 11, 13, 15, 17]
(2). Create a function list_divisible that accepts 3 variables ( start , stop , divisible ). Use a for loop to create a list of all numbers from start to stop which are divisible by divisible .
list_divisible(1, 50, 7)
[7, 14, 21, 28, 35, 42, 49]
(3). Create a function list_divisible_not that accepts 4 variables ( start , stop , divisible , not_divisible ). Use a for loop to create a list of all numbers from start to stop which are divisible by divisible but not divisible by not_divisible .
list_divisible_not(1, 100, 4, 3)
[4, 8, 16, 20, 28, 32, 40, 44, 52, 56, 64, 68, 76, 80, 88, 92]
The following exercises will use the titanic data from Kaggle. I’ve included it in the input folder just like Kaggle.
(4) What is the key difference between the train and the test?
(5) Create a new column family that is equal to the SibSp * Parch for both the train and the test dataframes. DON’T use a for loop.
(6). While we can submit our answer to Kaggle to see how it will perform, we can also utilize our training data to assess accuracy. Accuracy is the percentage of predictions made correctly-i.e., the percentage of people in which our prediction regarding their survival is correct. In other words, accuracy = (#correct predictions)/(Total # of predictions). Create a function generate_accuracy which accepts two Pandas series objects ( predicted , actual ) and returns the accuracy.
For example, when a and b are two different Pandas Series: generate_accuracy(predicted, actual)
For the sample data below, the data should retun 50.0 (i.e., a percentage).
(7) Create a column PredEveryoneDies which is equal to 0 for everyone in both training and testing datasets.
(8) Find the accuracy of PredEveryoneDies in predicting Survived using the function generate_accuracy that you created earlier and assign it to the AccEveryoneDies variable.
(9) In both the training and testing datasets, create the column PredGender that is 1 – if the person is a woman and 0 – if the person is a man. (This is based on the “women and children first” law of shipwrecks). Then set AccGender to the accuracy of the PredGender in the Train dataset.
(10). Create a generate_submission function that accepts a DataFrame, a target column, and a filename ( df , target , filename ) and writes out the submission file with just the passengerID and the Survived columns, where the Survived column is equal to the target column.
For example: submitdie = generate_submission(train, 'PredEveryoneDies', 'submiteveryonedies.csv')
Should return a dataframe with just passengerID and the Survived column.
Make sure your submission file prediction for Survived is an integer and not at float. If you submit a float it may not work.
(11). To use the women and children first protocol, we will need to use the age field. This has some missing values. We are going to replace null values in the train and test set with the median value for each.
For this particular question:
Set the variables AgeMissingTrain and AgeMissingTest using the count of the number of missing values in the train and test sets, respectively.
Set the variable AgeMedianTrain and AgeMedianTest using the median age of the train and test sets, respectively.
(12) For rows in which the age value is missing, set the age to the appropriate median value for the train/test set.
(13). In our initial calculation of the PredGender column, we made our prediction based on whether the individual was male or female. In accordance to the women and children first protocol, we hypothesize that our model could be improved by including whether the individual was a child in addition to gender. We also have a question, what age to use to determine “child”? (People weren’t likely to check for IDs.) We will check 2 ages…<13 and <18 (somewhat arbitrary but have to start somewhere) and see which yields a better accuracy.
Specifically, create 2 predictions as follows:
train['PredGenderAge13'] should be the prediction incorporating both Gender (women survive) and Age (Children Age<13 survived while Age>=13 died) train['PredGenderAge18'] should be the prediction incorporating both Gender (women survive) and Age (Children Age<18 survived while Age>=18 died)
The analysis assumes that you have addressed missing values in the earlier step and you should do it for both the train and test dataframes
(14). Calculate the accuracy for your new predictions. Use PredGenderAge13 in the training set to calculate AccGenderAge13 (you can use your function again!) and PredGenderAge18 to calcuate AccGenderAge18 .
(15). You should find that the accuracy is higher when using 13 as a designation for a child than 18. What does this tell you about the role of age in surviving a shipwreck?
(16) Create a prediction file for the “women and children first” model in using the test dataset and upload it to Kaggle. Go through the process of uploading it to Kaggle. Put your Kaggle username so we can verify your prediction occued.
Make sure your submission file prediction is an integer and not at float. If you submit a float it may not work.
Coursera Machine Learning
Coursera machine learning by prof. andrew ng, machine learning by prof. andrew ng.
Table of Contents
Breif intro, video lectures index, programming exercise tutorials, programming exercise test cases, useful resources, extra information.
- Online E-Books
The most of the course talking about hypothesis function and minimising cost funtions
A hypothesis is a certain function that we believe (or hope) is similar to the true function, the target function that we want to model. In context of email spam classification, it would be the rule we came up with that allows us to separate spam from non-spam emails.
The cost function or Sum of Squeared Errors(SSE) is a measure of how far away our hypothesis is from the optimal hypothesis. The closer our hypothesis matches the training examples, the smaller the value of the cost function. Theoretically, we would like J(θ)=0
Gradient descent is an iterative minimization method. The gradient of the error function always shows in the direction of the steepest ascent of the error function. Thus, we can start with a random weight vector and subsequently follow the negative gradient (using a learning rate alpha)
Differnce between cost function and gradient descent functions
Bias and variance.
When we discuss prediction models, prediction errors can be decomposed into two main subcomponents we care about: error due to “bias” and error due to “variance”. There is a tradeoff between a model’s ability to minimize bias and variance. Understanding these two types of error can help us diagnose model results and avoid the mistake of over- or under-fitting.
Hypotheis and Cost Function Table
Regression with pictures.
- Linear Regression
- Logistic Regression
Week 1 - Due 07/16/17:
- Welcome - pdf - ppt
- Linear regression with one variable - pdf - ppt
- Linear Algebra review (Optional) - pdf - ppt
- Lecture Notes
Week 2 - Due 07/23/17:
- Linear regression with multiple variables - pdf - ppt
- Octave tutorial pdf
- Programming Exercise 1: Linear Regression - pdf - Problem - Solution
- Program Exercise Notes
Week 3 - Due 07/30/17:
- Logistic regression - pdf - ppt
- Regularization - pdf - ppt
- Programming Exercise 2: Logistic Regression - pdf - Problem - Solution
Week 4 - Due 08/06/17:
- Neural Networks: Representation - pdf - ppt
- Programming Exercise 3: Multi-class Classification and Neural Networks - pdf - Problem - Solution
Week 5 - Due 08/13/17:
- Neural Networks: Learning - pdf - ppt
- Programming Exercise 4: Neural Networks Learning - pdf - Problem - Solution
Week 6 - Due 08/20/17:
- Advice for applying machine learning - pdf - ppt
- Machine learning system design - pdf - ppt
- Programming Exercise 5: Regularized Linear Regression and Bias v.s. Variance - pdf - Problem - Solution
Week 7 - Due 08/27/17:
- Support vector machines - pdf - ppt
- Programming Exercise 6: Support Vector Machines - pdf - Problem - Solution
Week 8 - Due 09/03/17:
- Clustering - pdf - ppt
- Dimensionality reduction - pdf - ppt
- Programming Exercise 7: K-means Clustering and Principal Component Analysis - pdf - Problems - Solution
Week 9 - Due 09/10/17:
- Anomaly Detection - pdf - ppt
- Recommender Systems - pdf - ppt
- Programming Exercise 8: Anomaly Detection and Recommender Systems - pdf - Problems - Solution
Week 10 - Due 09/17/17:
- Large scale machine learning - pdf - ppt
Week 11 - Due 09/24/17:
- Application example: Photo OCR - pdf - ppt
- Linear Algebra Review and Reference Zico Kolter
- CS229 Lecture notes
- CS229 Problems
- Financial time series forecasting with machine learning techniques
- Octave Examples
Online E Books
- Introduction to Machine Learning by Nils J. Nilsson
- Introduction to Machine Learning by Alex Smola and S.V.N. Vishwanathan
- Introduction to Data Science by Jeffrey Stanton
- Bayesian Reasoning and Machine Learning by David Barber
- Understanding Machine Learning, © 2014 by Shai Shalev-Shwartz and Shai Ben-David
- Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman
- Pattern Recognition and Machine Learning, by Christopher M. Bishop
- What are the top 10 problems in deep learning for 2017?
- When will the deep learning bubble burst?
- HMM - Hidden Markov Model
- CRFs - Conditional Random Fields
- LSI - Latent Semantic Indexing
- MRF - Markov Random Fields
- SIGIR - Special Interest Group on Information Retrieval
- ACL - Association for Computational Linguistics
- NAACL - The North American Chapter of the Association for Computational Linguistics
- EMNLP - Empirical Methods in Natural Language Processing
- NIPS - Neural Information Processing Systems
CS 335: Machine Learning
Lectures: Tues, Thurs 11:30am-12:45pm Fourth Hour: Fri 8:30am-9:20am Room: Clapp Laboratory 206 Office hours: Tues 1-3pm, Thurs 9:15-11:15am, Clapp 200 Piazza : https://www.piazza.com/mtholyoke/spring2020/cs335/home Gradescope : https://www.gradescope.com/courses/76996 Moodle : https://moodle.mtholyoke.edu/course/view.php?id=17913
- Understand the general mathematical and statistical principles that allow one to design machine learning algorithms.
- Identify, understand, and implement specific, widely-used machine learning algorithms.
- Learn how to apply and evaluate the performance of machine learning algorithms.
- Derive analytical solutions for mathematical fundamentals of ML (probability, matrix and vector manipulation, partial derivatives, basic optimization, etc.).
- Derive and implement learning algorithms.
- Identify and evaluate when an algorithm is overfitting and the relationships between regularization, training size, training accuracy, and test accuracy.
- Identify real-world problems where machine learning can have impact.
- Implement machine learning tools on real data and evaluate performance.
- Produce proficient oral and written communication of technical ideas and procedures.
- Homeworks (4) — 40%
- "Celebrations of learning" (2) — 20%
- Project — 30%
- Class engagement — 10%
- Idea proposal — 2%
- Paper and group selection — 2%
- Literature review — 5%
- Weekly reports (4) — 8%
- Final report — 13%
- An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani: an accessible undergraduate machine learning textbook with statistics focus.
- Course handouts from Stanford CS 229 by Andrew Ng
- Google's Python class
- Norm Matloff’s Fast Lane to Python
- Stanford CS 231 Python Numpy Tutorial
- Stanford CS 231 IPython tutorial
- Organize study groups.
- Clarify ambiguities or vague points in class handouts, textbooks, assignments, and labs.
- Discuss assignments at a high level to understand what is being asked for, and to discuss related concepts and the high-level approach.
- Refine high-level ideas/concepts for projects (i.e. brainstorming).
- Outline solutions to assignments with others using diagrams or pseudocode, but not actual code.
- Walk away from the computer or write-up to discuss conceptual issues if you get stuck.
- Get or give help on how to operate the computer, terminal, or course software.
- Get or give limited debugging help. Debugging includes identifying a syntax or logical error but not helping to write or rewrite code.
- Submit the result of collaborative coding work if and only if group work is explicitly permitted (or required).
- Look at another student’s solutions.
- Use solutions to same or similar problems found online or elsewhere.
- Search for homework solutions online.
- Turn in any part of someone else's work as your own (with or without their knowledge).
- Share your code or written solutions with another student.
- Share your code or snippets of your own code online.
- Save your work in a public place, such as a public github repository.
- Allow someone else to turn in your work as their own. (Be sure to disconnect your network drive when you logout and remove any printouts promptly from printers.)
- Collaborate while writing programs or solutions to written problems. (But see above about specific ways to give or get debugging help.)
- Write homework assignments together unless it is specified as a group assignment.
- Collaborate with anyone outside your group for a group assignment.
- Use resources during a quiz or exam beyond those explicitly allowed in the quiz/exam instructions. (If it is not listed, don’t use it. Ask if you are unsure.)
- Submit the same or similar work in more than one course. (Always ask the instructor if it is OK to reuse any part of a different project in their course.).
Inclusion and Equity
Accommodations, communication policy, acknowledgments.
Instantly share code, notes, and snippets.
gatoytoro / Assignment: Machine Learning Prediction
- Star 0 You must be signed in to star a gist
- Fork 0 You must be signed in to fork a gist
19Fall CS6316 - Machine Learning
MoWe 2:00PM - 3:15PM @ MEC 205
Information of Assignments and Final Project for 2019 Fall UVa CS 6316 Machine Learning
Six assignments (60%).
- Post in collab
You will receive grading of each HWs within 10 day of each due time. A release email will send to you about the grading. (If you don’t receive such emails in time, please do email to to [email protected] .
- Please submit all extension requests, questions, and late assignments to [email protected] .
About ten in-class Quizzess (20%)
- Quizz dates will show on the schedule page
- Mostly quizzes will be on Mondays
- Each quizz includes contents we cover in the previous week
We will use your top-10 quizzes to calculate the 20%.
- Here is the potential paper list:
About Final Project (20%)
- Each team includes up to four students
To understand, reproduce and present one cutting-edge machine learning paper
- 5 Points: A powerpoint file (Due in Collab on Oct 20th) summarizing the paper via a template
- 3 Points: The updated powerpoint file (Due in Collab on Nov 30th) summarizing the paper via a template and describing the results you reproduce
- 7 Points: A iPython Jupyter notebook (Due in Collab on Dec 7th) to present the code, data visualization, and to obtain the results and analysis through step by step code cell run. Your team will go through and show the notebook at the final project presentation meeting to the instructors.
- 5 Points: A formal presentation to the instructors (in the last week of the semester), presenting your slides and your iPython notebook.
- A Jupyter iPython template is shared to help you structure the project code.
- Please read the following papers and then make your IPython Jupiter notebook: Ten Simple Rules for Reproducible Research in Jupyter Notebooks
- By Week3, we will use a google spreadsheet to coordinating team forming and paper bidding.
- Please discuss with your fellow classmates, forming potential teams ASAP.
- We allow self-selected papers.
Please share questions and concerns to to [email protected] .
Teach with GitHub Classroom
Learn how to set up your classroom and assignments.
You can create and manage a classroom for each course that you teach using GitHub Classroom.
You can use individual or group assignments in GitHub Classroom to teach students and grade their work.
Use the Git and GitHub starter assignment
You can use the Git & GitHub starter assignment to give students an overview of Git and GitHub fundamentals.
Create an individual assignment
You can create an assignment for students in your course to complete individually.
Create a group assignment
You can create a collaborative assignment for teams of students who participate in your course.
Editing an assignment
You can edit existing assignments in your course.
Extending an assignment's deadline for an individual or group
You can grant individual students and groups extensions to allow them more time to submit an assignment.
Monitor students' progress with the assignment overview page
You can use the assignment overview page to track the progress of each student or team on an assignment.
Reuse an assignment
You can reuse existing assignments in more than one classroom, including classrooms in a different organization.
Create an assignment from a template repository
You can create an assignment from a template repository to provide starter code, documentation, and other resources to your students.
Leave feedback with pull requests
You can leave feedback for your students in a special pull request within the repository for each assignment.
You can automatically provide feedback on code submissions from your students by configuring tests to run in the assignment repository.
Using GitHub Classroom with GitHub CLI
You can use gh , the GitHub command line interface, to work with GitHub Classroom directly from your command line.
Register a learning management system with GitHub Classroom
You can configure an LTI-compliant learning management system (LMS) with GitHub Classroom.
Connect a learning management system course to a classroom
You can configure an LTI-compliant learning management system (LMS) course to connect to GitHub Classroom so that you can import a roster for your classroom.
Each task should have its report and IPython Notebook. Once again, we emphasize the report; it should contain all your questions and your proper statistical answers. Use figures, pictures, and tables. DO NOT PUT ANY CODE IN THE REPORT.
Table of contents
- Assignment 1
- Assignment 2
- Assignment 3
- Final Project