# How to Do Mathematics Easily – Intro to Deep Learning #4

Hey are you okay? Siraj, show them the math behind deep learning. Totally. hello world it’s SIraj and let’s learn about the math needed to do deep learning. Math is in everything not just every field of engineering and science. It’s between every note in a piece of music and hidden in the textures of a painting. Deep learning is no different. Math helps us define rules for our neural network so we can learn from our data. if you wanted to, you could use deep learning without ever knowing anything about math. There are a bunch of readily available APIs for tasks like computer vision and language translation, but if you want to use a library like TensorFlow to make a custom model to solve a problem knowing what math terms mean when you see them pop up is helpful and if you want to advance the field through research, don’t even trip! You definitely need to know the math. Deep learning mainly pulls from three branches of math: linear algebra, statistics and calculus. if you don’t know any of the topics, i recommend a cheat sheet of the important concepts and I’ve linked to one for each in the description so let’s go over the four-step process of building a deep learning pipeline and talk about how math is used at each step once we’ve got a dataset that we want to use we want you process it we can clean the data of any empty values, remove features that are not necessary but these steps don’t require math. A step that does, though, i s called normalization. This is an optional step that can help our model reach convergence, which is that point when our prediction gives us the lowest error possible, faster, since all the values operate on the same scale. This idea comes from statistics. You have a 17.4 percent chance of making a straight. there are several strategies to normalize data although a popular one is called min max scaling. If we have some given data we can use the following equation to normalize it. We take each value in the list and subtract the minimum value from it, then divide that result by the maximum value minus the min value. we then have a new list of data within the range of 0 to 1 and we do this for every feature we have so they’re all in the same scale after normalizing our data we have to ensure that it’s in a format that our neural network will accept. This is where linear algebra comes in. There four terms in linear algebra that show up consistently. Scalars, vectors, matrices and tensors. A scalar is just a single number. A vector is a one-dimensional array of numbers. A matrix is a two-dimensional array of numbers. And a tensor is an N dimensional array of numbers. So a matrix, scalar, vector and spectre, wait not spectre, can all be represented as a tensor. Want to convert data, whatever form it’s in, be that images, words, videos, into tensors, where n is the number of features are data has and defines the dimensionality of our tensor. Let’s use a three-layer feed-forward neural network capable of predicting a binary output given an input as our base example to illustrate some more math concepts going forward When do we use math and deep learning?

When we normalize during processing. Learn a model parameters by searching.

And random weights be initializing. Tensors flow… From input to out

Then measure the error to measure the doubt It gives us what’s real and what’s expected.

Back propogate to get cost corrected. We’ll import our only dependency, Numpy, then initialize our input data and output data as matrices. Once our data is in the right format, we want to build our deep neural network. Deep nets have what are called hyperparameters. These are the high level tuning knobs of the network that we define and they help decide things like how fast our model runs, how many neurons per layer, how many hidden layers. Basically the more complex your neural network gets, the more hyperparameters you’ll have. You can tune these manually using knowledge you have about the problem you’re solving to guess probable values and observe the result. Based on the result, you can tweak them accordingly and repeat that process iteratively. But another strategy you could use is random search. You can identify ranges for each, then you can create a search algorithm that picks values from those ranges at random from a uniform distribution of possibilities which means all possible values have the same probability of being chosen. This process repeats until it finds the optimal hyperparameters. Yay for statistics! We only have number of epochs as our hyperparameter, since we have a very simple neural network We’ll use probability to decide our weight values, too. One common method is randomly initializing samples of each weight from a normal distribution with a low deviation, meaning values are pretty close together. We’ll use it to create a weight matrix with a dimension of three by four, since that’s the size of our input. So every node in the input layer is connected to every node in the next layer. The weight values will be in the range from -1 to 1. Since we have three layers, we’ll initialize two weight matrices. The next set of weights has a dimension four by one which is the size of our output. As data propagates forward in a neural network each layer applies its own respective operation to it. transforming it in some way,

until it eventually outputs a prediction This is all linear algebra. It’s all tensor math. We’ll initialize a for loop to train our network 60,000 iterations Then we’ll want to initialize our layers. The first layer, our input, gets input data. The next layer computes the dot product of the first layer and the first weight matrix. When we multiply two matrices together, like in the case of applying weight values to input data, we call that the dot product. Then it applies a non-linearity to the result which we decided it’s going to be a sigmoid. It takes a real value number and squashes it into a range between 0 and 1. So that’s the operation that occurs in layer 1, and the same occurs in the next layer. We’ll take that value from layer 1 and propagate it forward to layer 2, computing the dot product of it and the next weight matrix, then squashing it into output probabilities with our non-linearity. Since we only have three layers, this output value is our prediction. The way we improve this prediction, the way our network learns, is by optimizing our network over time. So how do we optimize it? Enter calculus. The first prediction our model makes will be inaccurate. To improve it, we first need to quantify

exactly how wrong our prediction is. We’ll do this by measuring the error, or cost. The error specifies how far off the predicted output is from the expected output. Once we have the error value we want to minimize it because the smaller the error the better our prediction. Training a neural network means minimizing the error over time. We don’t want to change our input data but we can change our weights to help minimize this error. If we just brute forced all the possible weights to see what gave us the most accurate prediction, it would take a very long time to compute. Instead, we want some sense of direction for how we can update our weights such that in the next round of training our output is more accurate. To get this direction we’ll want to calculate the gradient of our error with respect to our weight values. We can calculate this by using what’s called the derivative in calculus. When we set deriv to true for our nonlin function, it’ll calculate the derivative of a sigmoid. That means the slope of a sigmoid at a given point, which is the prediction values we give it from l2. We want to minimize our error as much as possible, and we can intuitively think of this process as dropping a ball into a bowl where the smallest error value is at the bottom of the bowl. Once we drop the ball in, we’ll calculate the gradient at each of those positions, and if the gradient is negative, we’ll move the ball to the right. If it’s positive, we’ll move the ball to the left. And we’re using the gradient to update our weights accordingly each time. We’ll keep repeating the process until eventually the gradient is zero, which will give us the smallest error value. This process is called gradient descent, because we are descending our gradient

to approach zero and using it to update our weight values iteratively. I understand everything now. Still understand everything. So to do this programmatically, we’ll multiply the derivative we calculated for our prediction by the error. This gives us our error waited derivative which we’ll call l2_delta This is a matrix of values, one for each predicted output, and gives us a direction. We’ll later use this direction to update this layer’s associated weight values. This process of calculating the error at a given layer and using it to help calculate the error weighted gradient so that we can update our weights in the right direction will be done recursively for every layer starting from the last back to the first. We are propagating our error backwards after we’ve computed our prediction by propagating forward. This is called back propagation. So we’ll multiply the l2_delta values

by the transpose of its associated weight matrix to get the previous layer’s error, then use that error to do the same operation as before, to get direction values to update the associated layers’ weights. so error is minimized. Lastly, we’ll update the weight matrices

for each associated layer by multiplying them by their respective deltas. When we run our code we can see that the error values decreased over time, and our prediction eventually became very accurate. So, to break it down. Deep learning borrows from three branches of math, linear algebra, statistics and calculus. A neural net performs a series of operations on an input tensor to compute a prediction and we can optimize a prediction by using gradient descent to back propagate our errors recursively, updating our weight values for every layer during training. The coding challenge winner from the last video is Jovian Lin. Jovian tried out a bunch of different models to predict sentiment from a dataset of video game reviews.

Wizard of the week! And the runner-up is Vishal Batchu. He tested out several different recurrent nets and eloquently recorded his experiment in his ReadMe. The coding challenge for this video is to train a deep neural net to predict the magnitude of an earthquake and use a strategy to learn the optimal hyperparameters. Details are in the ReadMe. Post your GitHub link in the comments, and I’ll announce the winner next video. Please subscribe if you wanna see more videos like this. Check out this related video, and for now I got to get my math turned up to a million. So thanks for watching!

Great video!

Tujh se milna he bhai…

आप बतायें कि गणित के प्रश्नों में 6+10=66 या 6#10=66 में से कौन सा उचित है ।

Damn, am i only one, who understand that all until 8:10?

By the way, if you wanna test this neural net, use this function

def think(input, syn0, syn1):

return nonlin(np.dot(np.dot(input, syn0), syn1))

print(int(think(np.array([<UR VALUES>]), syn0, syn1)))

Back propagate to get cost corrected.!!!!

u r boss siraj…

gradient decent = PID loop? then err is the feed back as well as minimizing noise?

Mozart no.40 <3 Dope music choice!

Bill Nye of the new age

great links in description for learning deep learning math from basic level! thank you

Siraj – the work you do is very meaningful to me – you help me as a math teacher make it clear to my students why math is awesome and abolish questions like "why should we learn math?".

Keep it up!

Linear algebra, statistics, and calculus? My three favorites lol – although I am only just learning lin now, and only took lower division calculus – I suppose I should look up the proofs that said "the proof for this is outside the scope of this course", because advanced calculus basically just does all those proofs.

I dont have GitHub

Why not:

def nonlin(x,deriv=False):

if(deriv==True):

return nonlin(x)*(1-nonlin(x))

return 1/(1+np.exp(-x))

???

I get all the maths! But how do you actually programm backpropagation into a CNN??!?! I want to programm my own CNN in Processing (java) and I cannot get the Convolution layer to work… I figured out how to backpropagate to that layer, but it is only ever giving me the same 2 outputs, both near 1. Sometimes one drops to -.999 but I never get any right results… What possible things could I have done wrong? I might be hasting trough all this, because I only started 5 days ago with neural networks, but I really have the motivation to at least make a working CNN…

Man. You're good.

Won't you need to know geometric and algebraic topology, if you are doing visual machine learning.

Hey Siraj! When I tried your demo code in Python 3 it's always showing me error, can you tell me why??¿

Starting music link plz ?

Sir, you should provide links of ur memes 😀

Do we need to find l2_delta and l1_delta first and then update the syn1, syn0 or can we update syn1 ryt after finding l2_delta @ 8:50 ??

I have a question. After normalizing the data wouldn't the output be normalized format as well. How do we turn it back?

I finally understand math behind neural networks…Thanks siraj

where do you get all those memes, Siraj?

How to acquire fluent communicating skills like you do?

I like how fast and to the point you are and not wasting time bit I wish I understand what is all this

Brain Explosion ………….literally 😉

Thank you for this nice introduction! makes things much clearer.

One note so, an integer is not proper python variable.

So

11 = 'something'

will most likely produce a syntax error!

you can use _11 if you like.

As, I see u corrected this in the Github link;)

Like Immortal Technique

Amazing videos! You're a great teacher! 😀

That memes Fell me connected to ur videos 🙂

Not for dummies. I was lost in first 5 seconds. I feel dumb.

Please send me back to my planet. Feel like I'm not from here:(.

Siraj, you bloody rock.

Do I need to know this when I'm just importing libraries?

哦哦哦

absolutely amazing video , thanks and keep making more

why my error is ~ 0.25. I used python 3 (range instead of xrange). Thanks for the very clear tutorial.

Have you a link to the source code of the tutorial?

I still didnt know why we use the match like we use it but I understand now how to use it 😀 So my next step will be to understand why we multiplay specific thinks like using a transposed matrix. Thanks for that video!

Maths is the godfather of all sciences

WolfAlpha.

This is better than expected, you should rename it "what math you need to know for neural networks"

Are you ADHD?

Great

1:46 database normalization != data normalization

damn that linear algebra cheat sheet was complicated :/// Not giving up on the three month challenge though !!

Hey Siraj Raval, I am interested in deep learning, but I am only in the 8th (going to 9th) grade, therefore I do not currently have the mathematics abilities to pursue this passion. Moreover, I was wondering how I can learn the necessary math to pursue this passion in an accelerated pace (e.g. 3-6 months). Are there any good and easy-to-understand resources to look at, and if so, can you please reply. I also found that I can learn new concepts in a quicker manner than most, as over the past 7 months I have been learning about computer science and I have taught myself multiple langauges such as C, C++, Java, JavaScript, Python, and more. I have also learned a lot about algorithms and algorithm design, as well as computer architecture and low-level manipulation, such as dynamic memory allocation.

P.S. I am currently only doing honors algebra and geometry, so I need to learn A LOT of math in a short period of time

You have no idea how I'm enjoying your videos. Thanks man, you're amazing

Dude… The humor and memes in your videos are on point and SO educational. Please keep making these!

I’m phenomenally bad at maths, no content has ever explained how this works, as well as broken down the areas of study I need. Thank you!

@4:32, how is that a normal distribution?

Watch it for the 3rd time, You Rock Siraj!!

U r awesome …

Can you make a video on Kriging (KG) Method for parameter estimation?

Are you basically Indian? ?

Yo Siraj, I get an error "ValueError: shapes (4,1) and (4,3) not aligned: 1 (dim 1) != 4 (dim 0)" I don't get what's really the problem. I did it same as you did but yet! and btw I use python 3.

" backpropogate to get cost corrected".. loved it

Dont get it. But one day I will

Do you use fl studio for the beats

lol song love it..

OMG！You got a Rap song for Backpropagation, Amazing ….

lets video play in background while surfing description links…as alwaysThis video is so funny

You rap better than Eminem

Man, those memes are so fkn distracting! XD

that thumbnail wasnt a click bait.

THE FIRST EVER VIDEO where I needed to play it in half the speed to get the smallest grip. Damn MATHS!

Of course you're talking correctly maths important in AI but how much time it will consume and I think only on mathematician can do this another talents is wast, you just discourage me

Good intro for neural network keepit up brother.

The memes are distracting.

Your videos are a great summary of the course Machine Learning from Prof. Ng, which I'm currently taking. They give me back the quick overview, when I need it.

1. Buy a new mic

2. Slow down in your videos

THIS IS SO FUN! 😀

"Hello world! It's siraj", best part of your vids xD

I feel as though I need a Ph.D in mathematics in order to understand the concepts that you presented in this video!

Thanks for all the resources you are providing. I find it useful.Can you please add me in you github network because i am really interested to know how these guys predicted the earthquake using Neural Networks.

Thanks in advance

Learner

Do you ever hold Machine Learning Meetups in SF? Also, I haven't checked yet but do you have any recommendations on where to look for the latest in content/CF recommendation engines? The best paper I've found is the "Collaborative filtering for implicit feedback data sets" paper written by Koren. I'm very interested in a paper which factors in negative implicit interactions. Great videos!

Siraj you are brilliant! You have truly found your calling. You explain things clearly in an amusing and dynamic way – and your singing is 😂 I've not met anyone make maths so engaging. Thank you!

When you sung the mechanism of a neural network in a few verses I lost my shit. Seriously funny and informative

😵 what…😲😬

Whats the intro music?

Is anyone else thinking: "Man…this is heavy"

This really helped me understand neural networks. Thanks!

The cheat sheet redirects are not working if Firefox, but they work in chrome – so it must be browser-specific and not my OS.

Play at .75, thank me later 😉 But you'll lose the fun 🙁

I UNDERSTAND EVERYTHING NOW

sorry I think the title is really off from the content.

I don't know what you were thinking with that

I have completed course in machine learning and looking forward to deep learning can you provide me a good source for it that ill give me the entire knowledge

This is the one video on you tube that changes everything!

I once wrote and performed a song poking fun at a number of Australian politicians, using Richard Clayderman's "Les Premiers Sourires De Vanessa" as Karaoke-style backing, but Mozart's Symphony in G minor K550 is taking it to the next level 😀 Siraj literally radiates awesomeness and makes me look like a beginner

Please stop rapping

My CAT Decided What I ATE for 24 HOURS (And This Is What Happended…)

https://chrome.google.com/webstore/detail/threelly-ai-for-youtube/dfohlnjmjiipcppekkbhbabjbnikkibo

I just love this guy videos

if you were smart you would use these short songs and post them somewhere like on spotify or somewhere people can litsen to it, cause i would, recall is important.

Why you are using Jupyter notebook on first video you said sublime text editor

I learn so much from you. probably going to change my life watching your channel.

2+2 = plagiarism

backpropagate to get cost corrected 😀