# Lecture 6: Search 2 – A* | Stanford CS221: AI (Autumn 2019)

Okay. So, Hi, everyone. So, uh, our plan for today is to continue talking about search. So, so that’s, uh, what we’re going to start doing, finish off some of the stuff we started talking about last time, and then after that, uh, switch to some of the more interesting topics like learning. So a few announcements. Um, so the solutions to the old exams are online now. So if you guys wanna start studying for the exam, you can do that. So, so start looking at some of those problems, I think, that would be useful. Um, actually, let me start with the Search 2 lecture because I think that might be, like, that has a, a review of some of the topics we’ve talked about. So it might be easier to do that. Also, I’m not connected to the network, so we’re not gonna do the questions, uh, or show the videos because I have, I have a hard time connecting to the network in this room. Okay. All right. So, so let’s start- continue talking about search. Uh, so if you guys remember, uh, we had this, this city block problem. So let’s go back to that problem and let’s just try to do a review of some of the, some of the search, search algorithms we talked about last time. So, uh, so suppose you want to travel from City 1 to City n only going forward, and then from City n you wanna go backwards, so and back to City 1 going only backwards, okay? So, so you- so the problem statement is kind of like this. You’re starting in City 1, you’re going- you’re going forward and you’re getting to some City n. So maybe we’re doing that on this. And then after that, you wanna go backwards and get to, get to City 1 again. So you go into some of these cities, okay? So, so that’s the goal, and then the cost of going to- from any city i to city j is equal to cij, okay? So, so that’s it. So, the question is: What- which one of these following algorithms could you use to solve this problem? And it could be multiple of them. So- so we have depth-first search, breadth-first search, dynamic programming, and uniform cost search. And these were the algorithms we talked about last time. So, uh, maybe just talk to your neighbors for a minute and then we can do votes on each one of these. Yes, question? Just needed to ask [inaudible] The [OVERLAPPING]? Okay. Let me check that again. Thank you. Thank you for. [BACKGROUND] All right, so let’s maybe start talking about this. So how about depth-first search like how many people think we can use depth-first search? How many people think we can’t use depth-first search? There’s -a very like good split. [LAUGHTER] So, some of the people think we can’t use depth-first search, what, what are some reasons maybe just like call it out. The depth first-search, the assumption was that based upon the cost is zero. Yes, that’s right. Yeah, so here we are basically going from City 1 to city n. Each one of these edges had a cost of cij. I’m just saying cij is greater than or equal to 0. That’s the only thing I’m saying about cij. But if you remember depth-first search, you really wanted the cost to just be equal to 0 because if you remember that whole tree, like the whole point of depth-first search was I could just stop whenever I could find a solution. And we were assuming that the costs of all the edges is just equal to zero. So we can’t really use depth first search here, because, because our cost is not 0. So assuming, like now that you know that reasoning, how about breadth first-search? Can we use breadth-first search? Yes? All of that moving from one city to city n that is not the city n. So that’s a good point. So, so what suggesting is can we think about the problem as going from City 1 to City n? And then after that, like introduce like a whole new problem that continues that and starts from City n and goes to City 1. Let me get back to that point like in a second, because like you could potentially think about that -actually like that might be an interesting way of thinking about it. But, but irrespective of that I can’t use depth first-search. So I’m -so far I’m just talking about depth first-search. Irrespective of how I’m looking at the problem, the costs are gonna be uh, non-zero. So because the costs are going to be non-zero, I can’t use depth-first search. So, so let’s talk about that first. So how about breadth first-search? Can I use breadth-first search? [inaudible] That’s exactly right. So we cannot use breadth-first search here because for breadth first-search. If you remember, you really wanted all the costs to be the same. They didn’t need to be 0, but they needed to be the same thing because then you could just go over the levels. And here I’m not- like I’m not saying I’m not putting any restrictions on cij being the same thing. Okay? So now let’s talk about dynamic programming. How about dynamic programming? Can we use dynamic programming? All right, so that looks right, right you like we could use dynamic programming here. Everything looks okay, cij’s are positive, looks fine. Um, how about, um, actually one question? So, so don’t they have cycles here? We kind of, briefly talked about this already. So, don’t I have like this cycle here? Uh, we can think about possibly going from one to n and then n to one. Yes, so this is a suggestion that, that we have already like heard twice. So we could actually use dynamic programming here even if it kinda looks like we have a cycle and the reasons we can kinda use this trick were we can basically draw this out again. And for going forward basically go all the way here, and then after that we’re going backwards, kind of include the directionality too. So all I’m doing is I’m extending the state, the state space to not just be the city but be the city in addition to that, it would be direction that we’re going. So if I’m in City 4 here, it’s City 4 going forward. And if at some point in the future I’m in City, I don’t know, 4 again, it’s City 4 going backwards. So I’ll keep track of both the city and the directionality. And when I do that then I’m kind of breaking the cycle. Like I’m not putting any cycles here and I can actually use dynamic programming, okay? Does that make sense? And then uniform cost search. That, that also sounds good too, right? Like Uniform cost search, you could actually use that. Doesn’t matter if you have cycles or not. And then we have positive, positive, non-negative costs. So we could use uniform cost search. Okay? All right, so this was just a quick review of some of the things we talked about last time. And, um, another thing we talked about last time was this notion of state. Okay, so, so we started talking about tree search algorithms and at some point, uh, we switched to dynamic programming and uniform cost search where we are, uh, like we don’t need to- like we don’t need to have this exponential blow up. And the reason behind that was we have memoization. And in addition to that we have this notion of state. Okay? And so, what is a state? A state is a summary of all past actions that are sufficient for us to choose the future optimally. So, so we need to be really careful about choosing our state. So in this previous question, uh, we looked at past actions. So if you look at like all cities that you go over it can be in City 1, then 3, then 4, 5, 6 and city 3 again. So in terms of state, the things that you wanna keep track of is what city you are in. But in addition to that, you wanna have the directionality because you, you need to know like where you are and how you’re getting back. Okay? So, and we did a couple of examples around that trying to figure out what is, what is like a specific notion of state for various problems. All right. So, so we started last time talking about search problems and, and we started formalizing it. So if you remember our paradigm of modeling and inference and learning we started kind of modeling search problems using this formalism where we defined a starting state, that’s s start. And then we talked about the actions of s, which is a function over our states which returns all possible actions. And then we talked about the cost function. So the cost function can take a state and action and tell us what is the cost of that, that, that, that edge. And then we talked about the successor function which takes a state and action and tells us where we end up at. And again, we had this end function that was just checking if you’re in an end state or not. So these were all the things that we needed to, to define a search problem and we kind of tried that and a couple of examples to try an example. The City example, all of that. Okay? And then after talking about these, these different ways of, um, thinking about search problems, um, we started talking about various types of inference algorithms. So we talked about tree search. So depth first search, breadth first search, depth first search with iterative deepening, um, backtracking search. And then after that we talked about some of these graph search type algorithms like, uniform cost search an- and, uh, dynamic programming. So last time we did an example of, um, uniform cost search but we didn’t get to prove the correctness of it. So I want to switch to some of the last, er, last, last time’s, um, slides to just go over this, this quick theorem and then after that just switch back to, to this lecture. Okay. So uniform cost search. Like, if you remember what we were doing in uniform cost search, we had three different sets. We had an export set which was basically the set of states that we have visited, and we are sure how to get to them, and we know the optimal path, and we know everything about them. We had this frontier set which was a set with, with a set of states that we have got to them, but we’re not sure if, if the cost that we have the best cost, cost. There might be a better way of getting to them and you don’t know it. Like you’re not sure yet. And then we have the unexplored, er, set of states which are basically states that we haven’t seen yet. So we did this example where we started with all the states in the unexplored set and then we moved into the frontier and then from the frontier, we move them to the explored set. So, so this was the example that we did on the board. Okay? And, and we realized that, like, even if we have cycles, we can actually do this algorithm and then we, we ended up finding the best path being from A to B to C to D and that costs 3. So, uh, let’s actually implement uniform cost search, uh, so I think we didn’t do this last time. So going back to, um, our set of, ah, so, so we started writing up these algorithms for search problems. So we have, we have written dynamic programming already and backtracking search. So now we can, we can try to kind of implement uniform cost search. And for doing so, we need to have this priority queue data structure. So this is in a util file. I’m just showing you what it like what functions it has, it has an update function, and it has a remove min function. So, so it’s just a data structure that I’m gonna use for my frontier. Because like, my frontier I’m popping off things off my frontier. So I’m going to use this data structure. All right. So let’s go back to uniform cost search. So we’re going to define this frontier, where we are adding states to- from unexplored sets, you’re adding states to the frontier. Okay? And it’s going to be a priority queue so, so we have that data structure because we’ve just imported util. And you’re going to basically add the start state with a cost of 0 to the frontier. So that’s the first thing we do. And then after that, like, while the frontier is not empty. So while true, what we’re going to do is, uh, we’re going to remove the minimum, uh, past cost element from the frontier. So, so basically just pop off the frontier that the best thing that exists there, and just move that to the explored set. Okay. So when I pop off the thing from the frontier, basically I get this past cost and I get the state. Okay? All right. So, so if, if, er, you’re in an end-state, then you’re just going to return that past cost with the history. I’m not putting the history here for now, I’m just returning the cost. Okay. So after popping off this state from the frontier, the thing we were doing was you were adding the children of that. So, um, the way we do that is we’re gonna use this successor and cost function that we defined last time. So we can basically iterate over action new state and costs and this successor and cost function. And, and basically update our frontier, by adding these new states to it. Okay. And then the cost that you are going to add is cost plus past cost if, if that is better. So, um, so that’s what the update function of the frontier does. And that’s pretty much it. Like that is uniform cost search. You add stuff to the frontier, you pop off stuff from the frontier. And, and that way you explore and remove things from unexplored set to the explored set. So let’s just try that out. Looks like it is doing the right thing. So it got the same value as dynamic programming. So, er, looks like it kinda works okay. [NOISE] So, um, this code is also online. So if you want to take a look at it, um, later, actually it’s not what I wanted. Um, yeah. Okay. All right. So, so that was- and here’s also the pseudo-code of uniform cost search. Okay? Okay. So we have- is there a question right there? What’s the runtime of uniform cost search [inaudible]. That’s a good point. So so what’s- the question is what’s the runtime of uniform cost search? So the runtime of uniform cost search is order of n log n, where the log n is because of, like, the bookkeeping of, of the priority queue, uh, and you’re going over all the edges. So, so if you can think of n here as the edges and worst-case scenario if you have a fully connected graph, it’s technically n squared log n. But in practice, er, we have [inaudible] graph so people usually refer to that just n log n where n is the number of states that you have explored. And it’s actually not all of the states. It’s the states that you have explored. Okay? And dynamic programming, it’s order of n. So technically, like, dynamic programming is slightly better but really depends. Yeah, certainty. Actually go first and then I’ll get you back. Is the only difference between this and Dijkstra’s is that you just don’t have all [inaudible] beginning? That wasn’t- the question is what’s the difference between this and Dijkstra’s algorithm, they’re very similar, the only difference is, this is trying to solve a search problem. So you’re not like exploring all the states. When you get to the solution, you get to the solution and then you just return that Dijkstra, you’re going from- you’re basically exploring all of, all of the states in the- in your graph. What’s your question? [inaudible]. All right. Sounds good. Okay. So, uh, I just want to quickly, er, talk about this correctness theorem. So, so for uniform cost search we actually have a correctness theorem which basically says uniform cost search does the right thing. So, uh, what basically this theorem says is, if you have a state that you are popping off the frontier and removing it from the frontier to the explored, then it’s priority, that value which is equal to past cost of s is actually the minimum cost of getting to, to, to the state s. So what this is saying is, let’s say that this is my explored set. So this is my explored set, and then right here is my frontier, and I have a start state, okay? And then I have some state s, that right now I have decided that I am popping off s from the frontier to explored because that is the best thing that has the best past cost. So what the theorem says is, this, this path that I have from s_start to s, is the shortest path possible to get to get to the state s. Okay. So the way to prove that is to show that the cost of this path is lower than any other path, paths that go from s_start to s. So let’s say there is some other path, this green one, that goes from s_start to s some other way. And, and the way that it goes to s is it should probably leave the, the explored set of states from some state called t maybe to some goes- go to some other state u and then from u go to s. u and s can be the same thing. But what the point of it is, if I have this other path that goes through- to s, it needs to leave the explored set from some state t. Okay. So what I want to show is I want to show that the, the cost of the green line, I want to show that that is greater than the cost of the black line. Okay. All right. So the cost of the green line, what is the cost of the green line? It’s gonna be the cost to here, and then cost of t to u, and the cost of u to s. So I can say well, this cost is actually greater than or equal to, um, priority of t, because that is the cost of getting to t, plus cost of t to u. And I’m just dropping this, this last part. The u to s, I’m just dropping it. Okay. So cost of green is at least equal to priority of t plus cost of t. t, t to u. Okay. Well, what does that equal to? Priority is just a number, right? It’s just a number that you are getting off the, the, the, priority queue. So that is actually equal to past cost of T, plus cost of t to u. Okay. And, and this value is going to actually be greater than or equal to priority of u. Well, why is that? Because if u is in my frontier, I’ve, I’ve visited u. So I already have some priority value for u. And, and the value that I’ve assigned for the priority of u, is either equal to this past cost of t plus cost of t, t to u, because I’ve like, seen that using my explored, using my frontier. So I’ve definitely seen this or it is something better that, that I don’t know what it is. Right? So, so priority of u is going to be less than or equal to this past cost of t plus cost of t to u. Okay. And well, what do I know in terms of priority of u and priority of s? Well, I know priority of u is going to be greater than or equal to priority of s. Well, why is that? Because I already know I’m popping off s next, I’m not popping off U, like, like I’ve- I know I’m popping off the, the thing that has the least amount of priority, and the least value here, and that’s s, and well, that is equal to, er, cost of the black line, black line. Okay. All right. So that was just a quick, like proof of why the uniform cost search always returns kind of the best minimum cost path type [NOISE]. All right. So let’s go to the slides again. So, um, just a comparison, quick comparison between dynamic programming of uniform cost search. So, uh, we talked about dynamic programming. We know it doesn’t allow cycles, but in terms of, uh, action cost, it can be anything like, like you can have negative costs, you can have positive costs. And, er, in terms of, um, complexity is order of n, and then uniform cost search, you can have cycles. So that is cool. But the problem is, the costs need to be non-negative, and into order of n log n. And if you have- if you end up in a situation where you have cycles and your costs are actually negative, there is this other algorithm called Bellman-Ford, that we are not talking about in this class, but you could actually like have a different algorithm that addresses those sort of the things. Okay. All right, how am I doing on time? Okay. So that was, that was this idea of inference. Right now we have like a good series of ways of going about doing inference, uh, for search problems, you have to formalize them. And now the plan for this lecture is, is to think about learning. So how are we going to go about learning when we have search problems? [NOISE] And when our search problem is not fully specified, and there are things in the search problems that are not specified and you want to learn what they are, like the costs, okay. So, uh, so that’s going to be the first part of the lecture, and then towards the end of the lecture, we’re going to talk about a few other algorithms that make things faster. So, so smarter ways of making things faster. We’re going to talk about A star and some sort of relaxation type strategies, okay. All right. So, um, so let’s go back to our transportation problem. So, so this was our transportation problem where, er, we had a start state and we can either walk, and by walking we can go from state s to state s plus 1, and that costs one, or we can take a tram, a magic tram that takes us from state s to state 2s, and that costs 2, okay, and we want to get to state n. So, uh, we can formalize that as a search problem. We can like we saw it- we saw this last time, we can actually try to find what is the best path to get from state 1 to any state n like we saw- like path- like walk walk, tram tram tram, walk tram tram. This is one potential like optimal path that one can get, okay? But the thing is, uh, the world is not perfect like, like modeling is actually really hard, like it’s not that we always have this nice model with everything. And we could end up in scenarios where we have a search problem, and, and we don’t actually know what the costs of our actions are. So we don’t actually know what the cost of walking is, or what the cost of tram is. But maybe we actually have access to, to this optimal path. Like, maybe I know the optimal path is walk walk tram tram tram, walk tram tram, but I don’t know what the costs are. So the point of learning is, is to go about learning what these cost values are based on this, this optimal path that we have. So, so I want to actually learn the costs of walking is 1, and the cost of tram is 2. And this is actually a common problem that we have like in machine learning in general. So like for example, um, you might have data from, uh, how a person does something or like how a person, let’s say, like grasps an object. And I, I have no idea what was the cost that the person who was optimizing to grasp an object, right, but I have like the trajectory I know like what, what the path they took when they picked up an object. So what I can do is, if I have access to that path of how they picked up an object, then from that I can actually learn what was the cost function that they were optimizing, because then I can put that cost function maybe on a, on a robot that does the same thing. Question? [inaudible] like five or something. That’s a good question. So the question is, is it possible to have multiple solutions here? Yes, so we are gonna actually see that like later, like what sort of the solutions that we gonna get, are there, ther- there could be cases where we have multiple solutions. The ratio of it is the thing that matters. So if you have like, walk is 1, tram is 4, if you get to an 8, you kind of get the same sort of behavior. Uh, and then it also depends on what sort of data you have. Like if your data allowed you to actually recover the, the, the true solution. So, so we’re gonna actually talk about all these cases, okay? All right. Okay. So if you think about it, when the way- the search problem we were trying to solve, this, this was the inference problem, was when you are, you are given kind of a search formulation and you are given a cost, and, and our goal was to find the sequence of actions, this optimal sequence of actions, that was the shortest path or the best path and, and some path or some way, and this is a forward problem. So search is this forward problem, where you’re given a cost and you want to find a sequence of actions, okay. So it’s interesting because learning in some sense is, is an inverse problem. It’s the inverse of, of search. So the inverse of search is, if you give me that sequence of actions, the, the best sequence of actions that you’ve got, then can you figure out what the cost is? So, so in some sense you can think of learning as this inverse problem of, of search and, and we are going to kind of address that. So I’m going to go over one example to, to talk about, er, learning. Um, and I’m actually going to use the notation of, uh, the machine lea- learning lectures that we had, um, at the beginning of like last week basically. So, um, let’s say that we have, ah, maybe I can draw this. [NOISE] Um, yeah, I will just draw the scheme. So let’s say we have a search problem without costs, and, and that’s our input. So if- so, so we are kind of framing this problem of learning as a prediction problem. And if you remember prediction problems, in prediction problems we had, ah, an input. So our input was x, okay. And in, in this case you are saying our input is a search problem, search problem without costs, okay? So that is my input. And then we have outputs. And in this case my, my output y is this optimal sequence of actions that one could get- gets, so it’s the solution path, so it’s a solution path, okay. And what I wanna do is, I wanna- like, like if you remember machine learning, the idea was, I would wanna find this predictor, this f function, f that we take an input, f of x, and then it would basically return the solution path in other settings and it would generalize. So, so that was kind of the idea that we explored in machine learning, and you kinda wanna do the same thing in here. So, uh, let’s start with- um, I’m going to draw that here. So let’s start with an example where we are in city 1, and then maybe we walk to city 2, so we can walk to city 2. And then from there, maybe I have two options. I can keep walking to get to city 4. So I can do walk walk walk. Or maybe I can take the tram and end up in city 4, okay? And, and the thing is I don’t actually know what the costs of these, these actions are, I don’t know what the cost of do- uh, walk is, what the cost of tram is. Okay? But one thing I know is that my, my solution path, my y is equal to walk, walk, and walk. So, um, so one way to go about this is to actually start with some initialization of, of these costs. So the way we’re defining these costs are going to be, uh, I’m going to use the word, um, I’m gonna write here maybe. I’ll just write up here. I’m going to use w like, because I want to use the same notation as as the learning lectures. So w is going to be the weights that o- of, of each one of my actions. I have two actions. In this case I can either walk or I can take the tram so I’m going to call them action 1. So w of action 1 is w of walking. And then w of action 2 is w of taking the tram. So action 2 is taking the tram. So I’m defining these w values, and the way I’m defining these weights is just as a function of actions. This could technically be a function of state and actions but right now I’m just simplifying this and I’m saying the w’s is this values, the costs of walking just depend- the cost of going from 1-2 just depends on my action. It doesn’t depend on what state I’m in. You could imagine settings where it actually depends on like what city you are in too, okay? So, so then under that scenario what is the cost of, cost of y? It is going to be w walk, plus w walk, plus w walk. Okay? So what I’m suggesting is let’s just start with something. Let’s just start with- yeah, like let’s just start with these weights. So I’m gonna say walking costs 3. And it’s always going to cost 3. Again, the reason it’s always going to cost 3 is I’m basically saying my weights only depend on the action, they don’t depend on state. So it’s always going to cost three. And I’m going to say well why not let’s just say, the tram takes the cost of 2. Okay? So this doesn’t like look right but like let’s just say I assume this is the right solution, okay? So now what I wanna do is I want to be able to update these weights, update these values in a way that I can get this optimal path that I have, this, this walk, walk, walk. Okay? So how can I do that? So I started with these random initializations of what the weights are. Okay? So now that I’ve done that I can, I can try to figure out what is the optimal, optimal path here based on these weights. So what is my prediction, so that is y prime. That is my prediction based on these weights that I’ve just set up in terms of like what the optimal path is. Well, what is that? That is walk tram because this costs 5 and this costs 9. So with these weights, these random weights that have just come up with I’m going to pick walk and tram. And that is my prediction. Okay? So now what we wanna do is you want to update our w’s based on the fact that our true label is walk, walk, walk and our prediction is walk, tram. Okay? And, and the algorithm that kind of does this, this does like the most like silliest thing possible. So, so what it does is it’s going to first look at the truth value of W. Okay? So it’s going to look at- so, so, so the weights are starting from- so I decided that this guy is 3 and I decided that this guy is 2, and I’m gonna update them. So I’m going to look at every action in this path. And for every action in this path I’m going to down-weight the, the weight of that. Well why am I going to do that? Because I- I don’t want to penalize that, right? This is the true thing. I want the weight of the true thing to be small. So I see walk. I’m like okay so I see walk. The weight of that was 3. I’m going to down-weight that by 1. I’m gonna make that two. I see walk again. So I’m gonna bring that with 1. I see walk again, I’m going to subtract one again. I end up at 0. Okay? Now I’m gonna go over my prediction and then for every action I see here I’m going to bring it up, bring the cost, uh, the, the weight up by 1. So I see you walk again here, I’m going to bring it up by 1. So, so, these were subtract, subtract, subtract, bring it up by one because it’s over my y prime. And then I see tram. And then because I see tram, I’m going to bring this up by 1. And that ends up in 3. So my new weights here are going to be three- the, the, the, the weight of walk just became 1 and then the weight of tram just became 3. Okay? And, and now I can kind of repeat doing this and see if that gets me this, this optimal solution or not. So I’m going to try running my search algorithm. If I run my search algorithm this path, this path costs 3, this path costs 4. So I’m actually going to get this path and this path. So my new prediction is just going to be walk, walk, walk. They’re going to be the same thing. My weights are not gonna change. I’m going to converge. Yes. Is it always one? So I’m talking about a very simplified version of this but yeah it is always one. So the very simplified version of this is this version where I’m saying the w’s just depend on, on actions. If you, if you make the weights depend on state and actions, there is a more generalized form of this. This is called the stru- er, the structure pe- er, perceptron algorithm, we’ll talk about- briefly talk about the, the version where there is a state action too, but for this case we are just depending on action. You’re literally just bring it up by one or by whatever like by whatever you bring it up here, you gotta bring it down by the same thing. So, so it’s plus and minus a whatever a is. There’s a question. [inaudible] why we do the plus 1 after we do all the minus 1s? So why am I doing the minus 1s? So I’ll get to that. So, so when I look at y here, right? Like this is the thing that I really wanted. So if I- so when I see walk I realize that walking was a good thing, so I need to bring down the weight of that. But if, if the weights that I already had like knew that walking is pretty good then like the weights that I already had knew that walking is pretty good, I should like cancel that out. So, so that’s why we are doing the plus 1 because like at this stage like I knew walking is pretty good up here like like my prediction also said walk. So if, if I’m subtracting it, I should add it to, to kind of like get them cancel that. But like right here, like I didn’t know walking is good so I’m going to bring down the weight of that and then bring up the weight of, uh, tram. [inaudible]. Yeah. So, so I mistakenly thought tram, uh, is the way to go. So to avoid that next time around, I’m going to make the cost of tram higher so I don’t take that route anymore. And there’s a question there. So here- only like the only reason why [inaudible] in the second- in the, the y prime is because we know the y prime is different from y. Yes. But then, like what if like we have like a long sequence and y prime is only different in like one small location and like would that change the weights sufficiently? Yeah. So if, if, er, so you’re asking. Okay, if my y and y prime, prime are kinda like the same thing walk, walk, walk or something and then at the very end this last one they’re going to be different. Yeah. So like we were just and for that last one we are just adding one, right? So, so it does like weighted, er, it does actually address that and it just run- you can run it until you get the sequences to be exactly the same thing so you don’t have any mistakes. Yeah. There’s a question back there. Does it matter if our new cost become negative? Uh, does it matter if our new costs become- it depends on what, sort of, search algorithm you are using. Uh, at the end of the day it’s fine if you’re using dynamic programming so I can have like a negative cost here and I’m just calling, uh, like dynamic programming at the end of the day with that and that is fine. Yeah, it’s fine if the cost becomes negative. There’s a question. In this problem we want to find the true cost for walk and tram, but we ended up converging to 1. So this becomes a problem. Sorry, did not supposed- Just like the end result for this algorithm we got is 1 for walk and 3 for tram. And the real result, like in the previous example was 1 and 2. 1 and 2. Right, yes. Yeah. So the, so the question is, er, we got here 1 and 3. Is this actually right? Like, like if you remember like when we define this tram problem, we said walking costs 1 and tram costs 2 but we never got that. Well, the reason we never got that is the solution we are going to get here is just based on our, our training data. So if my training data is just walk, walk, walk, this is like the best thing I can get and I can kind of like converge to this solution where, where the two end up being equal. I don’t have any mistakes on this. If I have more like data points then I’m going to do this longer and actually try it out on other training data and, and then I might converge to a different thing. Is there any rule for as far as initializing the weight? Is- I, I, I, I, I’m assuming when- the fu- uh, further when we are from the actual truth, the longer it’s going to take to, uh, actually converge. It’s- o- okay so the question is how do we initialize? So in na- in a natural algorithm you’re just initializing with 0. So we’re initializing everything by 0. It’s actually not that bad because you just, you just basically have this sequence and in the- for the more general case you’re computing a feature value that you just compute the full thing and you just do one single subtraction. So it is not that costly actually to do this. Yeah. [inaudible] know the path for a given cost. If you have that input can we incorporate that into the algorithm? So, you’re saying if we have some prior knowledge about the cost can we incorporate it? Yeah. Um, that is interesting. So, uh, in this current format. So if you have some prior algorithm maybe you’ll like then your prediction is going to be better, right? So if you have some knowledge about it maybe you’ll get a better prediction and then based on that you don’t update it as much. So maybe you can incorporate into the search problem. But again this is the most like general form of this algorithm. The simple- kind of, like the simplified version of it also like even like for the action. So not doing anything fancy. It’s not doing something that hard either, honestly. Are we worried about overfitting at all? [BACKGROUND] Yeah. So it is going to- it can too- you’re- yeah, so I’ll show some examples on this. Like we are going to code this up and then we’ll see overfitting, kind of, situations. So- so I’ll get back to that actually. All right. All right. So, um, all right, so let’s move on. Ah, okay. So- so this is just like the things that are on the slides are what I’ve already talked about. So, uh, yeah, so here’s an example. So we start with, 3 for walk and 2 for tram. And then the idea is like how are we going to change the costs so we get the- the solution that we’re hoping for. Um, and- and as I was saying, well, we can assume that the costs only depend on the action. So I’m assuming cost of s, a is just w of a, and in the most general form it- it can depend on- on the state too. Um, okay. So then if you take any candidate output path, then what would be the cost of the path? It would just be the sum of these W values over- over all the edges. So it would just be W of a_1 plus W of a_2 plus W of a_3. And as you’ve seen in this example, the cost of a path is just W of walk, plus W of walk, plus W of walk, or W of walk plus W of tram. So- so that’s all this slide is saying. So- so that’s how we compute the cost. All right, so- so now, uh, let’s actually look at this algorithm like running in practice. Um, okay, let me actually go over the pseudocode. So- so, you start initializing W has to be equal to 0. And then after that we’re going to iterate for some amount of T and then we have a training set of examples. It might not be just one here. I just showed this one example like- like, the only training example I had was- was that walk, walk, walk is a good thing, but you can imagine having multiple training examples for a search problem. And then what you can do is you can compute your prediction so that is y prime given that you have some W and the-then you can start with this W equal to zero and then-then just compute your prediction y prime, and then basically, you can do this plus and minus type of action. So for each action that is in your true y that is in your true label, you’re going to subtract 1. So to decrease the cost of true y. And then for each action that is in your prediction you’re going to add- add one to- to, kind of, increase the cost of the predicted y. Okay. All right. So let’s look at implementing this one. And let’s try to look at some examples here. All right. So let’s go back to the tram problem. So this is again the same tram problem. We just want to use the same, sort of, format. Uh, I actually went back and wrote up the history here. If you remember the last time I was saying I’m not returning the history. Now we have a way of returning history of each one of these algorithms cause we are going to call dynamic programming and we need the history. All right. So let’s go back to our transportation problem. So we had a cost of 1 and 2 for walking and tram, but what we wanna do is we wanna put parameters there. So you wanna actually put this weight and we can give that to our transportation problem. So in addition to the number of blocks, now I’m going to actually give like the weight of different actions. Okay. All right. So then walking has a weight and, um, [NOISE] tram has a weight. So now I have updated my transportation problem to generally take different weight values. So- so, now we wanna be able to generate some- some training examples. So that’s what I wanna do. I wanna generate different types of training examples that- that we can call so we can get these true labels. So let’s assume that the true weights for our training example is just 1 and 2. So- so that is what we really want. Okay. And- and we’re going to just wri- write this prediction function that we can call up later to- to- to get different values of y. So the prediction function is going to get the number of blocks. So- so it’s going to get, um, N, the number of blocks here. And it is going to act with this path that we want. So it’s going to output these- these y values, this different path. Okay. So, all right, so the whole point of prediction is- is basically, like running this f of x function. Um, and we can define our transportation problem with- with n, n weights. And the way we are going to get this is by calling dynamic programming. So someone asked you earlier could the costs be negative? Well, yes because now I’m calling dynamic programming and if like this problem has negative cost, that is fine too. Um, So and the history is going to get and the action new state and- and costs, right? So but the thing that I actually wanna return from my predict function is a sequence of actions. So I’ll just get the action out of this history that I get from dynamic programming. So I’m calling dynamic programming on my problem that is going to return a history or get the sequence of actions from that, and that is my predict function and I can just call that later. So let’s go back to generating examples. So, um- [NOISE] so, I’m just going to go for, uh, try out n to go from 1-10. So 1 block to 10 blocks and we are calling the predict function on these true weights to get the true y values. So these are my true labels, okay? And those are my examples. So my examples are just calling generate examples here. Okay. So let’s just print out our examples. See how it looks like. We haven’t done anything like in terms of like the algorithm or anything. We’re- we’re just creating these training examples, um, by calling this predict function on- on the true weights. I have a typo here, [LAUGHTER] generate examples and I need parentheses, oh, fix the typo. Okay, so that kinda looks right, right? So that’s my training example 1 through 9. And then what is- what is the path that you would wanna do if- if you have these two weights, the 1 and 2. Okay. So now I have my examples. So I’m- I’m ready to write this structured Perceptron algorithm. It gets my examples. It gets the training examples which are these paths. Um, and then we’re going to iterate for some range. And then, um, we can, um, basically go over all the examples that we have in our true- true y values. And then we can- we can basically go and update our weights based on- based on that and based on our predictions. So let’s initialize the weights to just be 0. So that’s for walking and tram, they’re just 0. And, uh, prediction actions, this is when we’re calling predict based on the- the current weights. So if my current weights are 0 then pred actions is just that y prime. So pred actions is y prime, true actions is y, like the things that we had on the slides. If- okay, and- and I wanna count the number of mistakes I’m making too. So if the two are not equal to each other then I’m going to just keep a counter for number of mistakes. If- if the two become equal then- then my number of mistakes is zero. I’m going to break then maybe I’m happy then. Okay. So I make a prediction. And then after that I’m going to update the weight values. Okay. So how do I update? Well, basically subtract. If you’re in true actions which is y, the labels that I’ve created from my training examples and then, uh, do plus 1 if you’re in prediction actions based on the current weight values. And- and that’s pretty much it. Like- like that is structured perceptron. Okay. So let’s just print things nicely so we can print the iteration and number of mistakes we have and what is actually the weight values that we have. And I’m just breaking this, um, whenever I have like no mistakes. So if number of mistakes is 0, I’ll- I’ll just break this. Okay. Okay. That sounds good. So if number of mistakes is 0, then I’ll break. [NOISE] Okay. So all good. Uh, I’m gonna run this, it’s not gonna do anything because I didn’t call it. So I’ll go back and actually call it. I have another typo here, I don’t know if you guys can guess, like where is my typo. This is gonna give an error [LAUGHTER]. Well, I called it weights, not weight. [LAUGHTER] So, I’ll go and fix that. Okay, this should run. Okay. So and then- then, this is what we get. So let’s actually look at this. So what we got is the first iteration number of mistakes was 6, and then, uh, we ended up actually, at the fir- first iteration, we ended up converging to 1, 2. So then the second iteration, the number of mistakes just became 0, and then we just got 1, 2, which is- which is the- the weights that we were hoping for. Okay? So that kind of, looks okay to me, that’s my training data. Everything looks fine. There’s a question actually. [inaudible] more like integers. Is that right? Yeah. So in this case, yeah, we are summing all the weights as integers, and you’re adding them. Given our update model as well, Well, we’re- we’re assuming that the number of walks and the number of trams were different. What if tram was in a different location but the number of walks to the tram can be correct? You would still- So- so I see what you’re asking. No. It should- it- like, it should figure- figure it that out. So, um, we- we- we can go over an example after- after the class and I’ll show you like how- how it actually does it. All right. So- okay. So let’s try 1 and 3. So with 1 and 3 takes a little bit longer, and, uh, but it does recover. So 1 and 4 is actually the interesting one, because it does recover something. It does recover 2, 8. It doesn’t recover 1 and 4. But like given my data, actually, 2, 8 is- is like- like, there is no reason for me to get 1- 1 and 4. Like the ratio of them is the thing that that I actually care about. So even if I get 2 and 8, like- like that is a reasonable set of weights that one could get. Um, I’m gonna try a couple of more things. So let’s try 1 and 5. So I’m gonna try 1 and 5, and this is what I get. So I get the weight of walk to be minus 1, and the weight of tram to be 1. Now, my mistake is 0. So why is this happening? Yeah. Your training data is all walking. So it’s learning to just walk. Yeah, that’s right. So- so what’s happening here is, if you look at my training data up here, my training data is just has like walk, like all walks. It hasn’t seen tram ever, so it has no idea like what the cost of tram is with respect to the cost of walk. So it’s not going to learn that. So we’re gonna fix that. Like one way to fix that is to go and change the training data and actually like get more data. So, uh, we can kind of do that. Um, so like just one thing to remember is, this is just going to fit your training data, whatever it is. Um, so yeah. So when we fix that, then walk becomes two and tram becomes 9, which is not 1 and 5. But it- it is getting there, like it’s a better ratio. Uh, a number of mistakes is still 0. So it really depends on what you’re looking for. Like if you’re trying to like match your data and your number of mistakes is 0, and you’re happy with this, you can just go with this. Um, and even though like it hasn’t like actually recovered the exact value, the ratios, that’s fine. Or maybe you’re looking for the exact ratios and you should like run it longer. More iteration questions? Structured perceptron like suspect to getting stuck in local optima, like maybe, all we need is different initializations? Sorry. Like I was looking at the- can you repeat that? Oh, sorry. Um, does the, uh, structured perceptron, like, have a risk of getting stuck in local optimum, like k-means, so we need different initializations? Um, that is a good question. So in, um, actually, lemme think about that. Um, do you see this in NLP? Do you actually know if this gets into local optima? I haven’t experienced it personally, but I feel like there’s [inaudible] There is reasons for it to do this. It’s still in this kind of- I mean, let me think about this. I’ll think about this, because even in the more general form of it, uh, it’s commonly used in like- like the matching, like sentence- like words and sentences. So I haven’t experienced that either but, um, I can look into that and back to you. Question? I was gonna ask, are you just being at all of the optimal paths, currently? Yes. Yeah, yeah, yeah. But if we do figure all the optimal paths then technically, it should be complex, right? Because like you just match paths. Um, if you’re feeding it all the optimal paths, uh, it should- you- you’re just matching path, you’re saying is- [inaudible] Yeah. So- so in terms of- okay- so, yeah. So in terms of like bringing down the number of mistakes then- then it should always match it. But if you have some true like weights that you are looking for, and it’s not represented in your dataset, then it’s not necessarily like- like learning that. So- so in those settings, you could find the local optima. So kind of like a- another version of this is, uh, when you are doing like reward learning and- and you- you actually have this true reward you wanna find. Like in those settings, you can totally fall into like local optima because you want to find what your reward function is. But you’re right, like if you’re just matching, uh, the data. Just in the reward function, you are on the scaling two, you still get like the optimal policies. So the scaling would be a different problem, right? So the scaling is kinda- yeah, so you can have reward shaping, so you can have different versions of the rewards function, and if you get any of them, that is fine. Uh, but, uh, but you might still get into local optima that’s not explained by reward shaping. So okay. So that we- we can talk about these things offline. Maybe, I should just move on to the next topics because we have some more stuff going on. Okay, so I was actually going to skip these slides because we have stuff coming up, but this is a more general form of it. So remember I was saying, this w is a function of a. Ah, but, um, [NOISE] um, you could- you could have a more general form, ah, where your cost function is not just w as a function of a, it is actually w times the set of features. Ah, and then the cost of a path is w times the features of a path. Uh, and that’s just the sum of features over the edges. So- so you can have this more general form. Go over this slides later on, maybe, because we’ve gotta move to the next part. But just real quick to update here is- is this more general form of updates which is update your w based on subtracting the features over your- your true- true path plus the features over your predicted path. So- so a more general form of this is called Collins’ algorithm. So Mike Collins was working on this in- in natural language processing. He was actually interested in it in the setting of part of speech tag- er, tagging. So- so you might have like a sentence, uh, and- and you wanna tag each one of the- each one of the labels here as- as a noun, or a verb, or a determiner, Or a noun again. So- so he was think- he was basically looking at this problem as a search problem. Uh, and he was using like similar type of algorithms to- to try to figure out like- like match what- what the value, like match noun, or like each one of these, um, part of speech tags to the sentence. So he has some scores and then based on the scores and his dataset, he goes like up and down. He moves the scores up and down which uses the same idea. You can use the same idea again in machine translation. So you can have, like if you have heard of like Beam Search. Um, and you can have multiple types like- like a bunch of translations of- of some phrase and then you can up-weight and down-weight them based on your training data. Okay? All right. Okay. So now let’s move to ai’s- ai’s- a star, not ai star. A star search. All right. So, um, okay. So we’ve talked about this idea of learning costs, right? So we have talked about, uh, search problems in general doing inference and then doing, uh, learning on top of them. And then now, I wanna talk a little bit about, um, kind of making things faster using smarter ideas and smarter heuristics. There’s a question. [inaudible] see what is the loss from [inaudible] in this structure? In this structure? So, so in, in- this is, this is a prediction problem, right? So, so in that prediction problem, we are trying to basically figure out what w- w’s are as closely as possible as we are matching these w, w- this y prime to y, right? So, so basically, like, like the way we are solving this is, is not necessarily as an optimization, the way that we have solved other types of learning problems. The way we are solving it what- is by just like tweaking these weights to try to match my y as closely as possible to, to y, okay? All right. Okay. So let’s get- talk, talk about a A-star. So I don’t have internet so I can’t show these. Um, but I think the link for this should work if- when you go to the, to the file. So the idea is, if you go back to uniform cost search, like in uniform cost search, what we wanted to do was, we want to get from a point to some solution, but we would uniformly, like increase, uh, explore the states around us until we get to some final state. The idea of A-star is to basically do uniform cost search, but do it a little bit smarter and move towards the direction of the goal state. So if I have a goal state, particularly like in that corner, maybe I can, I can move in that direction in a smarter way, okay? So here is like an example of that pictorially. So I can start from S-start, and, and if I’m using uniform cost search, again I’m uniformly kind of exploring all the states possible until I hit my S-end. And then I’m happy, I’m done, I’ve solved my search problem, everything is good. But the thing is, I’ve done all these, like wasted effort on this site which is, which is not that great, okay? So uniform cost search in, in that sense has this problem of just exploring a bunch of states for no good reason, and what we wanna do is we want to take into accounts that we’re just going from S-start to S-end, so we don’t really like need to do all of that. We can actually just try to get the- to get to the end state, okay? So, um, so going back to maybe, um, I’m going to go on this side. So, um, [NOISE] going back to how these search problems work, the idea is to start from S-start and then get to some state S, and then we have this S-end, okay? And what uniform cost search does is, it basically orders the states based on past cost of s, okay? And then explore everything around it based on past cost of F- S until it reaches S-end, okay? But when you are in state S, like there is also this thing called future cost of s, right? And ideally, when I’m in state S, I don’t wanna explore other things like this side. I actually want to- wanna move in the direction of kind of reducing my, my future cost and getting to my, to my end state, okay? So, so the cost of me getting from S-start to S-end is really just like past cost of s plus future cost of s. And if I knew what future cost of s was, I would just move in that direction. But if I knew what future cost of s is, well the problem was solved, right? Like I had the answer to my search problem. Like I’m, I’m solving a problem still. So in reality, I don’t have access to future cost, right? I have no idea what future cost is. But I do have access to some- like I can potentially have access to something else and I’m gonna call that h of s. And that is an estimate of future cost. So I’m going to add a function called h_s, and this is called a heuristic, and the- and this heuristic could estimate what future cost is. And if I have access to this heuristic, maybe I can update my cost to be something as what the past cost is. In addition to that, like I can add this heuristic and that helps me to be a little bit smarter when I’m running my algorithm, okay? So, so the idea is, ideally like what I would wanna do is, I wanna explore in the order of past cost plus future cost. I don’t have future cost or if I had future cost, I had the answer to my search problem. Instead, what A-star does is it’s- it explores in the order of past cost plus some h_s, okay? So remember uniform cost search, it, it explores just in the order of past cost. So in uniform cost search, um, like we don’t have that h_s, okay? And h_s is, is a heuristic, it’s an estimate of the future cost. All right. So what does A-star do? Actually that’s something really simple. So, so a A-star basically just does uniform cost search. So all it does is uniform cost search with a new cost. So before I had this blue costs costs of s and a, this was my cost before. Now I’m going to update my cost to be this cost prime of s and a, which is just cost plus the heuristic, over the successor of s and a minus the heuristic. So, so that is the new cost and I can just run uniform cost search on this new cost. So, so I’m gonna call it cost prime of s and a. Well, what does that equal to? That is equal to cost of s and a, which is what we had before when we were doing uniform cost search, plus heuristic over successor of s and a, minus heuristic over s. So why do I want this? Well, what this is saying is, if I’m at some state S, okay, and there is some other state, successor of s and a, so I can take an action a and end up in successor of s and a, and there is some S-end here that I’m really trying to get to. Remember h was my estimate of future cost. What this is saying is, my estimate of future cost for getting from successor to S-end, minus my estimate of, er, getting from, er, future costs of S to S-end should be the thing I’m adding to my cost function. I should penalize that. And, and what this is really enforcing is, it basically makes me move in the direction of S-end. Because, because if I end up in some other state that is not in the direction of S-end, then, then that thing that I’m adding here is basically going to penalize that, right? It’s going to be saying, “Well, it’s really bad that you’ve- you are going in that action. I’m going to put more costs on that so you never going that direction. You should go in the direction that goes goes towards your S-end.” And that all depends on like what your H function is and how good, like of an H function you have and how you’re designing your, your heuristics. But that’s kind of the idea behind it. So here is an example actually. So let’s say that we have this example where we have A, B, C, D, and E and we have cost of 1 on all of these edges. And what we wanna do is we wanna go from C to E. That’s our plan, okay? So if I’m running uniform cost search, well what would I do? I’m at C, I’m going to explore B and D because they have a cost of 1, and then after that, I’m going to explore A and E. And then finally, I get to, get to E. But why did I spend all of that time looking at A and D? I shouldn’t have done that, right? Like A and B are not in the direction of getting to S-end. So instead, what I can do is if someone comes in and tells me, well, I have this heuristic function, you can evaluate it on your state and this heuristic function is going to give you 4, 3, 2, 1, and 0 for each one of these states, then you can update your cost and maybe you’ll have a better way of getting to S-end. So this heuristic, in this case, is actually perfect because it’s actually equal to future cost. Like the point of the heuristic is to get as close as possible to the future cost. This is exactly equal to future cost. So with this heuristic, what’s going to happen is my new cost is going to change. How is it going to change? Well, it’s going to become the cost of whatever the cost of the edge was before, which was 1, plus h of- in the case of, for example, the cost of going from C to B. If you look at C to B, it’s the old cost, which was 1, plus heuristic at B, which is 3, minus heuristic at C, which is 2. So that ends up giving me 1 plus 3 minus 2, that is equal to 2. And then similarly, you can compute like all these, like new cost values, the purple values and, and that has a cost of two for going in this direction and cost of zero for going towards E. And, and if I just run uniform cost search again here, then I can get to E like much easier, okay? Yes. Does an A-star like kinda result in greedy approaches, where you put these opportunities, like go back with [inaudible]. Does A-star result in- Like greedy approaches. Like where you sort of- greedy. Greedy? Yes. Um, yeah. So okay. So, so in all, ah, so, so the question is, is A-star like causing greedy approaches? So, no. Actually, we are going to talk about that a little bit. A-star, depend- depends on the heuristic you are choosing. So depending on the heuristic you are choosing, A-star is actually going to be like returned to optimal value. But yeah, it does depend on the heuristic. So it actually does the exact same thing as uniform cost search if you choose a good heuristic. Why is cost of CB 1 here? Uh, what- Why is cost of CB 1? Why is cost of C- CE 1? CB. CB. Hold on. [LAUGHTER]. I’m like, really bad, my ears are really bad, so speak up. So cost of CB. Oh because- oh, I see what you’re saying. That’s what we started with. So this is like the graph that I started with. So I started with the cost, like the blue costs being all 1, but now I’m saying those costs are not good, I’m going to update them based on this heuristic so I can get closer to the goal, like as fast as possible. [inaudible]. You return like the actual cost of not, like you wouldn’t count the heuristic in there, because it can be like wrong. That’s, that’s right. So, so the question is what costs are you going to return at the end? And you do want to return the actual cost. So you’re returning the actual cost, but you can run your algorithm with this heuristic thing added in because that allows you to explore less things and just be more efficient. Okay. Oh, I gotta move on. All right. So, um. Okay. So a good question to ask is well, what is this heuristic? How does this heuristic look like? Like can any- does any heuristic like work well? So turns out that not every heuristic works. So here’s an example. So again, the blue things are the costs that are already given. These are the things that I already have, and I can just run my search algorithm with it. The red things are the values of the heuristic, someone gave them to me for now. In general we would want to design them. So someone comes in and gives me these, these heuristic values, and, uh, then what I wanna do is I wanna compute the new cost values. So the question is, is this heuristic good? So I get my new cost values. They look like this. Like does this work? We don’t have time so I am going to answer that. It’s not gonna work. [LAUGHTER] So the reason this is not gonna work is, uh, well we just got a negative edge there, right? So I’m running uniform cost search at the end of the day, like A_star is just uniform cost search. Um, and I can’t have negative edges. So, uh, I’m not- like that was just not a good heuristic to have here. So, so the heuristics need to have specific properties and, and you, you should think about what those properties are. So one property that you would want to have the heuristics to have is this idea of consistency, this is actually the most important property really. So, um, so when we talked about heuristics, I’m gonna talk about properties of them here. Heuristics h. They should be consistent. So a consistent heuristic has two conditions: The first condition is it’s going to satisfy the triangle inequality. And, and what that means is like the cost that- your, your updated cost that you have should be, should be non-negative. So, so this cost prime of s, s and a, this should be positive. So, so that means that the old constant s and a plus h of, um, successor I’m gonna use s prime for that minus h of s is greater than or equal to 0. Okay. So that is the first condition. And then the second condition that you are going to put is that, uh, future costs of s_end is going to be equal to 0, right? Because the future cost of the end state should be 0. So then the heuristic at the end state is also equal to 0. So, so these are kind of the properties that we would want to have if you want to talk about consistent heuristics. Okay. And they’re kinda like natural things that we would want to have, right? Like, like the first one is basically saying, well, the cost you are going to end up at should be, should be greater than or equal to 0 and you can run uniform cost search on it. But it’s really like talking about this triangle inequality that you want to have, right? Like, er, h of s is kind of an estimate of this future cost. So if I’m going to- from s take an action that cost of s and a that added up h of successor of s, s and a should be greater than just h of s, the estimate of future costs. So that’s, so, so that’s, that’s all it is saying. And then the last one also makes sense, right? I do want my future cost of s_end to be zero, right? So then the heuristic at s_end should also be equal to 0, because again heuristic is just an estimate of the future cost. Okay. All right. So, so what do I know about A_star beyond that? So one thing that we know is that, um, if, if h is consistent. So if I have this consistency property, then I know that A_star is correct. So that there is a theorem that says, A_star is going to be correct if h is consistent. And well, we can kind of look at that through an example. So, so let’s say that I am at s_0 and I take a_1 and I end up at s_1 and I take a_2 and end up at s_3 and, uh, a 0 at s_2, take a_3 and I end up at s_3. So let’s say that I have, I have kind of like a path that, that looks like this. Okay. So then, uh, if I’m looking at the cost of each, each one of these, right? I’m looking at cost of- cost prime of s_0 and a_1. Well, what is that equal to? That’s- that’s my updated cost. Updated cost is old cost, which is cost of s_0 and a, plus heuristic value at s_1 minus heuristic value at s_0. Heuristic value s_1 minus heuristic value at s_0. Okay. So, so that is the cost of going from s_0 and taking a_1. I’m gonna to just write all the costs for, for the rest of this to figure out what’s the cost of the path. The cost of the path is just the sum of these costs. So s_1, a_2 is cost of s_1, a_2 plus heuristic at, um, what is it? S_2 minus heuristic at s_1, so that is the new cost of this edge. And the new cost of the last edge which is cost prime of s_2, a_ 3, and that is equal to the old cost of s_2, a_3 plus heuristic at s_3 minus heuristic at s_2. Okay. So I just wrote up all these costs. If I’m talking about the cost of a path, then it’s just that these costs added up, right? So if I add up these costs, what happens? Bunch of things get canceled out. All right. This guy gets canceled out by this guy, this guy gets canceled out by this guy, right? And what I end up with is, is sum of these new costs, these cost primes of, um, s_i minus 1, a_i is just equal to sum of my old cost of s_i minus 1, a_i plus my heuristic, I guess last state whose end state minus heuristic at s_0. Okay. I’m saying my heuristic is a consistent heuristic. So what is a property of a consistent heuristic? The heuristic value at s end should be equal to 0. So this guy is also equal to 0. So what I end up with is is if I look at a path with the new cost, the sum of the new cost is just equal to the sum of the old cost minus some, some constant, and this constant is just the heuristic value at s_0. Okay. So, so why is this important because when we talk about the correctness, like remember we just proved at the beginning of this lecture that uniform cost search is correct, so the cost that it is returning is optimal. That is, that is this cost. A_star is just uniform cost search with a new cost. So A_star is just running on this new cost. But this new cost is the same thing that they have as old cost minus a constant. So if I’m optimizing the new cost, it’s the same thing as optimizing the old cost. So it is going to return the optimal solution. Okay. All right. So that is basically the same things on the slide like, like I basically did that. So, so that’s one property, right? So, so we talked about heuristics being consistent. We have now just talked about A_star being correct, because it’s uniform cost search. It’s, it’s correct only if the heuristic is consistent, right? Like only if we add that property. Because, because that consistency gets us, gets us the fact that this guy is equal to 0 and gets us the fact that these guys are going to be positive and I can run uniform cost search on them. Um, the next property that we have, uh, here for A_star is A_star is actually more efficient than uniform cost search, and we kind of have already seen this, right? Like, like the whole point of a A_star is to not explore everything and explore in a directed manner. So, um, if you remember uniform cost search like, how does it explore? Well, it explores all the states that have a past cost that are less than the past cost of s_end. So again, remember, uniform cost search, you’re exploring with the, with the order of past cost of states, and then we explore all those states that have past costs less than the end state. Okay. A_star like- the thing that A_star does is it explores less states. So it explores states that have a past cost less than past cost of the end state minus the heuristic. So, so if you kinda look at the right side, the right side just became- becomes smaller, right? Like, like the right side for uniform cost search was just past cost of s_end. Now it is past cost of s_end minus the heuristic, so it just became smaller. And then why did it become smaller? Because now I’m doing this more directed search. I’m not searching everything uniformly around me. And then that’s the whole point of the heuristic. Okay. And that makes it actually more efficient. So- and then kind of the interpretation of this is if h is larger then, then that’s better, right? Like if my heuristic is as large as possible, well that is better because then I am kind of exploring a smaller like area to, to get to the solution. Uh, the proof of- this is like two lines so I’m gonna skip that. So let me actually show, uh, how this looks like. So if I’m trying to get from s_star to s_end, again, if I’m doing uniform cost search, I’m uniformly exploring. So like all states around me, and that is equivalent to assuming that the heuristic is equal to 0, like it’s basically uniform cost search is A-star when the heuristic is equal to 0. So what is the point of the heuristic? The point of the heuristic is to estimate what the future cost is. If I know what the future cost is, then, then h of s is just equal to future cost. Uh, and then, that would be awesome and I only need to like explore that green kind of space. And then the thing I’m exploring is, is just the nodes that are on the minimum past cost and co- uh, cost path, and I’m not exploring anything extra, right? Like that’s the most, like efficient thing one can do. In practice, like I don’t have access to future costs, right? In, in practice if I had access to future costs, like the problem was solved. I have access to some heuristic that is some estimate of the future cost. It’s not as bad as uniform cost search, it’s getting close to future costs, like, like the value of future costs, and you’re kind of somewhere in between. So it is going to be more efficient than uniform cost search in some sense. Okay. All right. So, so basically the whole idea of A_star is it kind of distorts edge, edge costs and favors these end states. So I’m going to add here that A_star is efficient too. So that is the other thing that, that we have about A_star. Okay. All right. So, so these are all cool properties, um, one more property about heuristics and then after that, we can talk about relaxation. So um, so there’s also this other property called admissibility, which is something that we have kind of been talking about already, right? Like we’ve been talking about how this heuristic should get close to FutureCost and should be an estimate of the FutureCost. So an admissible heuristic is a heuristic where H of S is less than or equal to FutureCost. And then the cool thing is, if you already have consistency, then you have admissibility too. So if you already have this property, then you have admissibility too. So another property is admissible. Which means H of S is less than or equal to FutureCost of s, okay? All right. So the proofs of these are again like just one liners, so this one is more than one line but- [LAUGHTER] but it’s actually quite easy, it’s in the notes. So you can use induction here to prove, uh, to prove that if you have consistency, then you’re going to have admissibility too. Okay, so, so we’ve just talked about how A-star is a sufficient thing. We’ve talked about how we can come up with- we haven’t talked about how to come up with heuristics, but we have talked about consistent heuristics that are going to be useful and they are going to give us admissibility and they’re going to give us correctness and how like A-star is going to be this very efficient thing. But we actually have not talked about how to come up with heuristics. So let’s spend the next, yeah, couple minutes talking about, uh, talking about how to come up with heuristics. And in the main idea here, is just to relax the problem. Just relaxation. So, so what are- so, so the way we come up with heuristics is, we pick the problem and just make it easier and solve that easier problem. So, so that is kind of the whole idea of it. So remember the H of S is- is supposed to be close to FutureCost, um, and, and some of these problems can be really difficult, right? So the- so if you have a lot of constraints and it becomes harder to solve the problem, so if you relax it and we just remove the constraints, we are solving a much easier problem and that could be used as a heuristic, as a value of heuristic that estimates what the FutureCost is. so, um, so we want to remove constraints and when we remove constraints, the cool thing that happens is, sometimes we have closed form solutions, sometimes we just have easier search problems that we can solve and sometimes we have like independence of problems and we can find the solutions to them, and that gives us a good heuristic. So, so that is my goal, right? Like I would want to find these heuristics. So let me just go through a couple of examples for that. So, so let’s say I have a search problem and I want to get the triangle to get to the circle, and that is what I wanna do and I have all these like walls there and that just seems really difficult. So what is a good heuristic here? I’m going to just relax the problem. I’m gonna remove like all those walls, just knock down the walls and have that problem. That- that just seems much easier, okay? So- so well, like now, I actually have a closed form solution for getting the triangle, get to the- get to the circle. I can just compute the Manhattan distance and I can use that as a heuristic. Again, it’s not going to be the- like actually like what FutureCost is, but it is an approximation for it. So- so usually, you can think of the heuristics as, as these optimistic views of what the FutureCost is, like, like it’s an optimistic view of the problem. Like what if there was like no walls. Like if- if there are no walls here, then how would I get from one location to another location? The solution to that is going to give you this FutureCost- this estimate of FutureCost value which is- which is H of S. Okay? Or the tram problem, let’s say we have the tram problem but we have a more difficult version of it where we have a constraint. And this constraint says, “You can’t have more tram actions than walk actions.” So now this is my search problem, I need to solve this. This seems kind of difficult. Like we talked about how to come up with states word last time and even that seemed difficult, like I need to have the location, I need to have the difference between the walk and tram. That seems kind of difficult, like- like I have an order of N squared states now. So instead of doing that, well, let me just remove the constraint. I’m- I’m just gonna remove the constraint, relax it. And after relaxing it, then I have a much easier search problem I need to deal with. I only have this location, and then I can just go with that location and, and everything will be great. Okay? All right. So, so the idea here was like where, where, where this middle part is, if I- if I remove these constraints, I’m going to have these easier search problems, these relaxations. And I can compute the FutureCost of these relaxations using my favorite techniques like dynamic programming or uniform cost search. But- but one thing to notice is, I need to compute that for 1 through N. Because is heuristic is a function of state, right? So I actually need to compute FutureCost for this relaxed problem for all states from 1 through N. Uh, and that allows me to have like a better estimate of this. There are some, uh, like engineering things that you might need to do here. So, so for example, um, you might- so, so here we are looking for FutureCost, so if you plan to use uniform cost search for whatever reason, like maybe Dynamic Programming doesn’t work in this setting, you need to use uniform cost search, you need to make a few engineering things to make it work. Because if you remember, uniform cost search would only work on past costs, doesn’t work on FutureCost. So you need to like, create a reverse problem where- where you can actually compute FutureCost. So, so a few engineering things but beyond that, it is basically just running our search algorithms that we know, uh, on, on, uh, these relaxed problems. And that will give us a heuristic value, and we’ll put that in our problem and we will go and solve it. Okay? Um, and another cool thing that heuristics give us, is, is this idea of having independent subproblems. So, uh, so here’s another example. I want to solve this- this eight puzzle and I move blocks here and there and come up with this new configuration, um, that seems hard again. A relaxation of that is just assume that the tiles can overlap. So the original problem says, the tiles cannot overlap. I’m just gonna relax it and say, “Well, you can just go wherever and you can overlap.” Okay? So that is again much simpler and now I have eight independent problems for getting each one of these points from one location to another location and I have a closed form solution for that because that’s again just Manhattan distance. So that gives me a heuristic, that- that’s an estimate. That’s not perfect, it’s an estimate. And then I can use that estimate in my original search problem to solve the search problem. So here were- it was just some examples of this idea of removing cons- removing constraints and coming up with better heuristics. So like knocking down walls, like walk and tram freely, overlapping pieces, er, pieces and that allows you to kind of solve this new problem, uh, and, and the idea is you’re reducing these edge costs from infinity to some finite- finite cost. Okay? All right. So, um, yeah, so, so I’m gonna wrap up here, uh, and I guess we can always talk about these last few slides next time, uh, since we’re running late, uh, but I think you- you guys have got like the main idea. So let’s talk next time.